Overview Map: Open-Source AI Video & Image Models (January 2026)

This article describes the most powerful open-source models for video and image generation. You can install these models entirely locally, provided you have a suitable GPU. While 12 GB of VRAM is often considered the baseline for larger models, quantized versions (mostly found on Hugging Face) make it possible to run these models with significantly less memory (starting from as little as 8 GB).

1. Video Generation Models

Wan 2.1 / 2.2

Description: The current standard for open-source video. Highly stable in motion and anatomy.
Use case: High-end cinematography, realistic human actions.
GitHub (Model): Wan-Video/Wan2.1
ComfyUI Node: kijai/ComfyUI-WanVideoWrapper

Hunyuan Video (v1.5)

Description: Tencent's flagship model with 13 billion parameters, now in v1.5 with improved prompt adherence and a "distilled" version for speed.
Use case: Complex textual instructions and long scenes.
GitHub (Model): Tencent-Hunyuan/HunyuanVideo
ComfyUI Node: kijai/ComfyUI-HunyuanVideoWrapper

LTX Video (LTXV)

Description: A DiT-based model focused on real-time generation and efficiency on consumer GPUs.
Use case: Fast previews and real-time video-to-video transformations.
GitHub (Model): Lightricks/LTX-Video
ComfyUI Node: Lightricks/ComfyUI-LTXVideo

Kandinsky 5.0 Video (Lite & Pro)

Description: A suite of models based on the Cross-Attention Diffusion Transformer (CrossDiT). The Lite (2B) version is lightning fast and runs on consumer GPUs, while the Pro (19B) version generates cinematic 10-second clips with complex camera movements.
Use case: Versatile video creation (Text-to-Video and Image-to-Video) with strong support for various languages and artistic styles.
GitHub (Model): kandinskylab/kandinsky-5
ComfyUI Template: Kandinsky 5 Video Workflow (Official)

Specialized Video Fine-tunes

SCAIL: Studio-grade character animation (pose-to-video) with 3D consistency.
- GitHub: zai-org/SCAIL | ComfyUI: ComfyUI-SCAIL-Pose (part of WanVideoWrapper).
MoCha: The standard for seamlessly replacing characters in existing videos.
- GitHub: Orange-3DV-Team/MoCha | ComfyUI: Integrated into WanVideoWrapper.
Nexus 1.3B : Description: A specialized fine-tune of the Wan 1.3B architecture, developed within the Nexus project. This model is specifically trained on "Nexus data" for complex human movements. Use case: Dance, martial arts, and gym exercises where anatomical correctness is crucial. GitHub (Model): PKU-YuanGroup/OpenS2V-Nexus ComfyUI Node: Supported via the WanVideoWrapper.

2. Image Generation Models

FLUX.2

Description: The successor to FLUX.1. Offers photorealism and text rendering that rivals commercial models like Midjourney.
Use case: Everything from marketing materials to complex digital art.
GitHub (Model): black-forest-labs/flux2
ComfyUI Node: city96/ComfyUI-GGUF (for GGUF quantizations).

Qwen Image 2512

Description: An advanced model from Alibaba that excels in following instructions for image editing.
Use case: Layer-based editing and complex compositions.
GitHub (Model): QwenLM/Qwen-Image
ComfyUI Node: Use the ComfyUI-Manager and search for "Qwen2-VL".

Z-Image (Turbo)

Description: A 6B parameter model from Alibaba that delivers top-tier results in just 8 steps.
Use case: Real-time generation and systems with limited hardware.
GitHub (Model): Tongyi-MAI/Z-Image
ComfyUI Node: Integrated via WanVideoWrapper or directly via diffusers.