This article describes the most powerful open-source models for video and image generation. You can install these models entirely locally, provided you have a suitable GPU. While 12 GB of VRAM is often considered the baseline for larger models, quantized versions (mostly found on Hugging Face) make it possible to run these models with significantly less memory (starting from as little as 8 GB).
1. Video Generation Models
Wan 2.1 / 2.2
- Description: The current standard for open-source video. Highly stable in motion and anatomy.
- Use case: High-end cinematography, realistic human actions.
- GitHub (Model): Wan-Video/Wan2.1
- ComfyUI Node: kijai/ComfyUI-WanVideoWrapper
Hunyuan Video (v1.5)
- Description: Tencent's flagship model with 13 billion parameters, now in v1.5 with improved prompt adherence and a "distilled" version for speed.
- Use case: Complex textual instructions and long scenes.
- GitHub (Model): Tencent-Hunyuan/HunyuanVideo
- ComfyUI Node: kijai/ComfyUI-HunyuanVideoWrapper
LTX Video (LTXV)
- Description: A DiT-based model focused on real-time generation and efficiency on consumer GPUs.
- Use case: Fast previews and real-time video-to-video transformations.
- GitHub (Model): Lightricks/LTX-Video
- ComfyUI Node: Lightricks/ComfyUI-LTXVideo
Kandinsky 5.0 Video (Lite & Pro)
- Description: A suite of models based on the Cross-Attention Diffusion Transformer (CrossDiT). The Lite (2B) version is lightning fast and runs on consumer GPUs, while the Pro (19B) version generates cinematic 10-second clips with complex camera movements.
- Use case: Versatile video creation (Text-to-Video and Image-to-Video) with strong support for various languages and artistic styles.
- GitHub (Model): kandinskylab/kandinsky-5
- ComfyUI Template: Kandinsky 5 Video Workflow (Official)
Specialized Video Fine-tunes
- SCAIL: Studio-grade character animation (pose-to-video) with 3D consistency.
- GitHub: zai-org/SCAIL | ComfyUI: ComfyUI-SCAIL-Pose (part of WanVideoWrapper).
- MoCha: The standard for seamlessly replacing characters in existing videos.
- GitHub: Orange-3DV-Team/MoCha | ComfyUI: Integrated into WanVideoWrapper.
- Nexus 1.3B : Description: A specialized fine-tune of the Wan 1.3B architecture, developed within the Nexus project. This model is specifically trained on "Nexus data" for complex human movements. Use case: Dance, martial arts, and gym exercises where anatomical correctness is crucial. GitHub (Model): PKU-YuanGroup/OpenS2V-Nexus ComfyUI Node: Supported via the WanVideoWrapper.
2. Image Generation Models
FLUX.2
- Description: The successor to FLUX.1. Offers photorealism and text rendering that rivals commercial models like Midjourney.
- Use case: Everything from marketing materials to complex digital art.
- GitHub (Model): black-forest-labs/flux2
- ComfyUI Node: city96/ComfyUI-GGUF (for GGUF quantizations).
Qwen Image 2512
- Description: An advanced model from Alibaba that excels in following instructions for image editing.
- Use case: Layer-based editing and complex compositions.
- GitHub (Model): QwenLM/Qwen-Image
- ComfyUI Node: Use the ComfyUI-Manager and search for "Qwen2-VL".
Z-Image (Turbo)
- Description: A 6B parameter model from Alibaba that delivers top-tier results in just 8 steps.
- Use case: Real-time generation and systems with limited hardware.
- GitHub (Model): Tongyi-MAI/Z-Image
- ComfyUI Node: Integrated via WanVideoWrapper or directly via diffusers.

Discover ComfyUI: the modular visual AI engine for creative workflows
How to Install a Local LLM on Ubuntu 24.10 with LM Studio?
Run a Large Language Model (LLM) locally with Ollama