This article describes the most powerful open-source models for video and image generation. You can install these models entirely locally, provided you have a suitable GPU. While 12 GB of VRAM is often considered the baseline for larger models, quantized versions (mostly found on Hugging Face) make it possible to run these models with significantly less memory (starting from as little as 8 GB).

1. Video Generation Models

Wan 2.1 / 2.2

  • Description: The current standard for open-source video. Highly stable in motion and anatomy.
  • Use case: High-end cinematography, realistic human actions.
  • GitHub (Model): Wan-Video/Wan2.1
  • ComfyUI Node: kijai/ComfyUI-WanVideoWrapper

Hunyuan Video (v1.5)

LTX Video (LTXV)

  • Description: A DiT-based model focused on real-time generation and efficiency on consumer GPUs.
  • Use case: Fast previews and real-time video-to-video transformations.
  • GitHub (Model): Lightricks/LTX-Video
  • ComfyUI Node: Lightricks/ComfyUI-LTXVideo

Kandinsky 5.0 Video (Lite & Pro)

  • Description: A suite of models based on the Cross-Attention Diffusion Transformer (CrossDiT). The Lite (2B) version is lightning fast and runs on consumer GPUs, while the Pro (19B) version generates cinematic 10-second clips with complex camera movements.
  • Use case: Versatile video creation (Text-to-Video and Image-to-Video) with strong support for various languages and artistic styles.
  • GitHub (Model): kandinskylab/kandinsky-5
  • ComfyUI Template: Kandinsky 5 Video Workflow (Official)

Specialized Video Fine-tunes

  • SCAIL: Studio-grade character animation (pose-to-video) with 3D consistency.
  • MoCha: The standard for seamlessly replacing characters in existing videos.
  • Nexus 1.3B : Description: A specialized fine-tune of the Wan 1.3B architecture, developed within the Nexus project. This model is specifically trained on "Nexus data" for complex human movements. Use case: Dance, martial arts, and gym exercises where anatomical correctness is crucial. GitHub (Model): PKU-YuanGroup/OpenS2V-Nexus ComfyUI Node: Supported via the WanVideoWrapper.

2. Image Generation Models

FLUX.2

  • Description: The successor to FLUX.1. Offers photorealism and text rendering that rivals commercial models like Midjourney.
  • Use case: Everything from marketing materials to complex digital art.
  • GitHub (Model): black-forest-labs/flux2
  • ComfyUI Node: city96/ComfyUI-GGUF (for GGUF quantizations).

Qwen Image 2512

  • Description: An advanced model from Alibaba that excels in following instructions for image editing.
  • Use case: Layer-based editing and complex compositions.
  • GitHub (Model): QwenLM/Qwen-Image
  • ComfyUI Node: Use the ComfyUI-Manager and search for "Qwen2-VL".

Z-Image (Turbo)

  • Description: A 6B parameter model from Alibaba that delivers top-tier results in just 8 steps.
  • Use case: Real-time generation and systems with limited hardware.
  • GitHub (Model): Tongyi-MAI/Z-Image
  • ComfyUI Node: Integrated via WanVideoWrapper or directly via diffusers.