What is WAN 2.6's VBench score and how does it compare to other open-source video models?

WAN 2.6 achieves a VBench score of 85.22, which outperforms comparable open-source video generation competitors. This score reflects improvements over its predecessor WAN 2.1 across three key dimensions: motion quality, prompt adherence, and resolution support. The model supports up to 1280×720 (720p) resolution and 10-second clips at 16–24 fps, making it currently one of the highest-performing op

How long does WAN 2.6 take to generate a video via API and what hardware is required?

WAN 2.6 inference latency for a 720p, 5-second clip runs approximately 90–120 seconds on an A100 GPU. This means developers should architect their applications with asynchronous job queuing rather than synchronous HTTP requests, as generation times will exceed typical API timeout thresholds. For shorter clips (under 5 seconds) or lower resolutions, latency will be proportionally reduced. Self-host

What are the maximum resolution and video duration limits supported by the WAN 2.6 API?

WAN 2.6 supports a maximum resolution of 1280×720 (720p) and a maximum video duration of 10 seconds per clip. The model operates at 16 fps in standard mode and 24 fps in high-quality mode, meaning a 10-second clip at 24 fps produces 240 frames. Both text-to-video and image-to-video input modes are supported. Developers needing longer videos must implement client-side clip stitching, as single-infe

How many parameters does WAN 2.6 have and what does that mean for self-hosted deployment costs?

WAN 2.6 has approximately 14 billion parameters, which places significant VRAM demands on self-hosted deployments — typically requiring at least one A100 80GB GPU or equivalent. Inference latency on that hardware is 90–120 seconds per 720p/5s clip, translating to roughly 30–48 video seconds per GPU-hour. For teams comparing self-hosting versus third-party API pricing, the GPU compute cost per clip

WAN 2.6 API Guide: Alibaba’s Latest Video Generation Model

What’s New

WAN 2.6 is Alibaba’s most capable open-source video generation model to date, delivering significant improvements over WAN 2.1 across motion quality, prompt adherence, and resolution support. The model achieves a VBench score of 85.22, outperforming comparable open-source competitors, and supports video generation at up to 1280×720 resolution with durations extending to 10 seconds per clip. Alibaba released WAN 2.6 under an open-weight license, making it accessible via both self-hosted deployments and third-party API providers.

Key Specifications

Parameter	WAN 2.6
Max Resolution	1280 × 720 (720p)
Max Video Duration	10 seconds
Frame Rate	16 fps (standard), 24 fps (high quality)
Input Modes	Text-to-video, Image-to-video
Model Parameters	~14 billion
Inference Latency (720p, 5s)	~90–120 seconds (A100 GPU)
API Price (typical third-party)	~$0.06–$0.10 per video generation
Open-Weight License	Yes (Alibaba WAN License)
Languages Supported	Chinese + English prompt bilingual

Pricing note: Alibaba does not operate a direct consumer API for WAN 2.6 at time of writing. Pricing figures above reflect third-party inference providers. Always check your provider’s current rate card.

Comparison with Previous Version

Feature	WAN 2.1	WAN 2.6	Change
VBench Score	83.4	85.22	+2.2%
Max Resolution	1280 × 720	1280 × 720	Unchanged
Max Duration	5 seconds	10 seconds	+100%
Text-to-Video	✅	✅	Unchanged
Image-to-Video	✅	✅	Improved motion
Motion Smoothness	Good	Excellent	Improved
Bilingual Prompt	Partial	Full	Improved
Model Size	~14B	~14B	Unchanged
Estimated Inference Time (720p)	~60–80s	~90–120s	Higher (longer clips)
Open-Weight	Yes	Yes	Unchanged

The most meaningful upgrade in WAN 2.6 is the doubling of maximum output duration to 10 seconds, which directly enables use cases like short-form social content and product demos without manual clip stitching. Motion coherence across the full clip length is noticeably more stable compared to WAN 2.1.

API Quick Start

WAN 2.6 follows a standard REST inference pattern compatible with most inference platforms. The examples below use the WAN 2.6 endpoint as exposed by a compatible provider (adjust base_url and model slug for your chosen platform).

Python — Text-to-Video

import requests
import time
import os

# ── Configuration ──────────────────────────────────────────────────────────────
API_KEY = os.environ.get("WAN_API_KEY", "your-api-key-here")
BASE_URL = "https://api.your-provider.com/v1"  # replace with your provider's URL
MODEL_ID = "wan-2.6"                            # check your provider's model slug

HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

# ── Step 1: Submit generation job ───────────────────────────────────────────────
def submit_video_job(prompt: str, duration: int = 5, resolution: str = "1280x720") -> str:
    """
    Submit a text-to-video generation request to WAN 2.6.

    Args:
        prompt:     English or Chinese text description.
        duration:   Video length in seconds (1–10).
        resolution: Output resolution string.

    Returns:
        task_id:    String ID for polling job status.
    """
    payload = {
        "model": MODEL_ID,
        "prompt": prompt,
        "parameters": {
            "duration": duration,        # seconds; max 10 for WAN 2.6
            "resolution": resolution,    # "1280x720" or "720x1280" (portrait)
            "fps": 16,                   # 16 or 24
            "guidance_scale": 7.5,       # classifier-free guidance strength
            "num_inference_steps": 50,   # higher = better quality, slower
        }
    }

    resp = requests.post(f"{BASE_URL}/video/generations", headers=HEADERS, json=payload)
    resp.raise_for_status()  # raises HTTPError for 4xx / 5xx responses

    task_id = resp.json()["task_id"]
    print(f"[+] Job submitted. Task ID: {task_id}")
    return task_id


# ── Step 2: Poll for completion ─────────────────────────────────────────────────
def poll_job(task_id: str, poll_interval: int = 10, timeout: int = 300) -> str:
    """
    Poll the job status endpoint until the video is ready.

    Args:
        task_id:       Task ID returned from submit_video_job().
        poll_interval: Seconds between status checks.
        timeout:       Max total wait time in seconds.

    Returns:
        video_url: Direct URL to the generated video file.
    """
    elapsed = 0
    while elapsed < timeout:
        resp = requests.get(f"{BASE_URL}/video/generations/{task_id}", headers=HEADERS)
        resp.raise_for_status()

        data = resp.json()
        status = data.get("status")

        if status == "succeeded":
            video_url = data["output"]["video_url"]
            print(f"[✓] Video ready: {video_url}")
            return video_url
        elif status == "failed":
            raise RuntimeError(f"Generation failed: {data.get('error', 'unknown error')}")
        else:
            print(f"[…] Status: {status} — waiting {poll_interval}s ({elapsed}s elapsed)")
            time.sleep(poll_interval)
            elapsed += poll_interval

    raise TimeoutError(f"Job {task_id} did not complete within {timeout}s")


# ── Step 3: Download the result ─────────────────────────────────────────────────
def download_video(video_url: str, output_path: str = "output.mp4") -> None:
    """Download the generated video to a local file."""
    resp = requests.get(video_url, stream=True)
    resp.raise_for_status()

    with open(output_path, "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)

    print(f"[✓] Saved to {output_path}")


# ── Main execution ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
    PROMPT = (
        "A golden retriever runs across an autumn forest trail, "
        "sunlight filtering through the trees, cinematic slow motion, 4K"
    )

    try:
        task_id = submit_video_job(prompt=PROMPT, duration=5, resolution="1280x720")
        video_url = poll_job(task_id)
        download_video(video_url, output_path="wan26_output.mp4")
    except requests.HTTPError as e:
        print(f"[✗] HTTP error: {e.response.status_code} — {e.response.text}")
    except (RuntimeError, TimeoutError) as e:
        print(f"[✗] {e}")

Image-to-Video (Python)

import requests
import base64
import os

API_KEY = os.environ.get("WAN_API_KEY", "your-api-key-here")
BASE_URL = "https://api.your-provider.com/v1"
MODEL_ID = "wan-2.6"

HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

def image_to_video(image_path: str, prompt: str, duration: int = 5) -> str:
    """
    Animate a still image using WAN 2.6 image-to-video mode.

    Args:
        image_path: Local path to source image (JPEG or PNG).
        prompt:     Motion description to guide the animation.
        duration:   Output length in seconds (1–10).

    Returns:
        task_id for downstream polling.
    """
    # Encode image as base64
    with open(image_path, "rb") as img_file:
        image_b64 = base64.b64encode(img_file.read()).decode("utf-8")

    payload = {
        "model": MODEL_ID,
        "prompt": prompt,
        "image": f"data:image/jpeg;base64,{image_b64}",  # or image/png
        "parameters": {
            "duration": duration,
            "resolution": "1280x720",
            "fps": 16,
            "motion_strength": 0.7,  # 0.0 (subtle) – 1.0 (strong motion)
            "num_inference_steps": 50,
        }
    }

    resp = requests.post(f"{BASE_URL}/video/image-to-video", headers=HEADERS, json=payload)
    resp.raise_for_status()

    task_id = resp.json()["task_id"]
    print(f"[+] Image-to-video job submitted. Task ID: {task_id}")
    return task_id

cURL — Minimal Text-to-Video Request

# Submit a WAN 2.6 text-to-video job via cURL
curl -X POST "https://api.your-provider.com/v1/video/generations" \
  -H "Authorization: Bearer $WAN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan-2.6",
    "prompt": "Time-lapse of a city skyline transitioning from dusk to night, neon lights, cinematic",
    "parameters": {
      "duration": 5,
      "resolution": "1280x720",
      "fps": 16,
      "guidance_scale": 7.5,
      "num_inference_steps": 50
    }
  }'

# Poll job status (replace TASK_ID with the returned task_id)
curl "https://api.your-provider.com/v1/video/generations/TASK_ID" \
  -H "Authorization: Bearer $WAN_API_KEY"

Best Use Cases

Short-form social content (TikTok / Reels / Shorts): WAN 2.6’s 10-second output duration covers the minimum viable clip length for most short-video platforms without requiring stitching, cutting production pipeline steps significantly.
E-commerce product animation: The image-to-video mode is well-suited for animating static product photography — rotating a shoe, rippling fabric, or steaming a beverage — with the motion_strength parameter controlling how dramatic the effect is.
Concept visualization for creative teams: Designers and directors can use text-to-video to rapidly prototype scene compositions at 720p before committing to full production, reducing iteration cost by using the API’s ~$0.06–$0.10 per-clip pricing.
Bilingual content pipelines: WAN 2.6’s native Chinese–English bilingual prompt understanding means teams working across both languages don’t need to translate prompts before submission, preserving nuance in culturally specific descriptions.
B-roll generation for video editors: Editors can generate filler footage — weather transitions, abstract motion backgrounds, landscape pans — on demand without stock licensing fees, particularly useful for documentary and explainer video workflows.
Educational and training material production: Institutions can programmatically generate illustrative clips at scale (e.g., science visualizations, historical scene reconstructions) by looping API calls within a content management pipeline.

Access All AI APIs Through AtlasCloud

Managing API keys and integrations for multiple AI providers adds friction to your workflow. AtlasCloud provides unified API access to 300+ production-ready models — including all the models discussed in this article — through a single endpoint and one API key.

New users get a 25% bonus on first top-up (up to $100) at AtlasCloud.

# Access any model through AtlasCloud's unified API
import requests

response = requests.post(
    "https://api.atlascloud.ai/v1/chat/completions",
    headers={"Authorization": "Bearer your-atlascloud-key"},
    json={
        "model": "anthropic/claude-sonnet-4.6",  # switch to any of 300+ models
        "messages": [{"role": "user", "content": "Hello!"}]
    }
)

AtlasCloud bridges leading Chinese and international AI models — Kling, Seedance, WAN, Flux, Claude, GPT, Gemini and more — making it straightforward to compare and swap models without refactoring your integration.

Conclusion

WAN 2.6 represents a meaningful step forward for open-source video generation, with its doubled maximum clip duration, VBench score of 85.22, and robust bilingual prompt support making it one of the most versatile models available today. For developers building video pipelines, the async job submission pattern shown above handles the 90–120 second inference window cleanly without blocking your application thread.

If you need to compare WAN 2.6 against alternatives like Kling 2.0 or Seedance without managing separate API credentials for each, AtlasCloud’s unified endpoint is the fastest path to a provider-agnostic architecture.

References

Alibaba WAN 2.1 Model Card & Technical Report — Hugging Face: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
VBench Leaderboard (official benchmark for video generation models): https://huggingface.co/spaces/Vchitect/VBench_Leaderboard
Alibaba WAN GitHub Repository (architecture, license, training details): https://github.com/Wan-Video/Wan2.1

{
  "title": "WAN 2.6 API Guide: Alibaba's Latest Video Model (2025)",
  "description": "Complete WAN 2.6 API guide covering specs, pricing, Python code examples, and how it compares to WAN 2.1. Start generating 720p video in minutes.",
  "faq": [
    {
      "question": "What is the VBench score for WAN 2.6 and how does it compare to WAN 2.1?",
      "answer": "WAN 2.6 scores 85.22 on VBench, up from 83.4 for WAN 2.1 — a 2.2% improvement. VBench is the standard open benchmark for evaluating video generation quality across motion, fidelity, and prompt adherence. Source: VBench Leaderboard (https://huggingface.co/spaces/Vchitect/VBench_Leaderboard)."
    },
    {
      "question": "How much does the WAN 2.6 API cost per video?",
      "answer": "Alibaba does not currently offer a direct consumer API for WAN 2.6. Third-party inference providers typically charge $0.06–$0.10 per video generation at 720p. Pricing varies by provider, resolution, and clip duration, so always check your chosen platform's current rate card."

---

## Access All AI APIs Through AtlasCloud

Instead of juggling multiple API keys and provider integrations, [AtlasCloud](https://www.atlascloud.ai?ref=JPM683) lets you access 300+ production-ready AI models through a single unified API — including all the models discussed in this article.

New users get a **25% bonus on first top-up** (up to $100).

```python
# Access any model through AtlasCloud's unified API
import requests

response = requests.post(
    "https://api.atlascloud.ai/v1/chat/completions",
    headers={"Authorization": "Bearer your-atlascloud-key"},
    json={
        "model": "anthropic/claude-sonnet-4.6",  # swap to any of 300+ models
        "messages": [{"role": "user", "content": "Hello!"}]
    }
)

AtlasCloud bridges leading Chinese and international AI models — Kling, Seedance, WAN, Flux, Claude, GPT, Gemini and more — so you can compare and switch models without changing your integration.

WAN 2.6 API: Complete Guide to Alibaba's Latest Video Model

WAN 2.6 API Guide: Alibaba’s Latest Video Generation Model

What’s New

Key Specifications

Comparison with Previous Version

API Quick Start

Python — Text-to-Video

Image-to-Video (Python)

cURL — Minimal Text-to-Video Request

Best Use Cases

Access All AI APIs Through AtlasCloud

Conclusion

Frequently Asked Questions

Tags

Related Articles

Seedance 2.0 API Guide

AI Video Generation API Benchmark 2026: Kling vs Seedance vs WAN

Seedance 2.0 API Integration Guide: Text-to-Video with Python