Model Releases

WAN 2.6 API: Complete Guide to Alibaba's Latest Video Model

AI API Playbook · · 9 min read

WAN 2.6 API Guide: Alibaba’s Latest Video Generation Model

What’s New

WAN 2.6 is Alibaba’s most capable open-source video generation model to date, delivering significant improvements over WAN 2.1 across motion quality, prompt adherence, and resolution support. The model achieves a VBench score of 85.22, outperforming comparable open-source competitors, and supports video generation at up to 1280×720 resolution with durations extending to 10 seconds per clip. Alibaba released WAN 2.6 under an open-weight license, making it accessible via both self-hosted deployments and third-party API providers.


Key Specifications

ParameterWAN 2.6
Max Resolution1280 × 720 (720p)
Max Video Duration10 seconds
Frame Rate16 fps (standard), 24 fps (high quality)
Input ModesText-to-video, Image-to-video
Model Parameters~14 billion
Inference Latency (720p, 5s)~90–120 seconds (A100 GPU)
API Price (typical third-party)~$0.06–$0.10 per video generation
Open-Weight LicenseYes (Alibaba WAN License)
Languages SupportedChinese + English prompt bilingual

Pricing note: Alibaba does not operate a direct consumer API for WAN 2.6 at time of writing. Pricing figures above reflect third-party inference providers. Always check your provider’s current rate card.


Comparison with Previous Version

FeatureWAN 2.1WAN 2.6Change
VBench Score83.485.22+2.2%
Max Resolution1280 × 7201280 × 720Unchanged
Max Duration5 seconds10 seconds+100%
Text-to-VideoUnchanged
Image-to-VideoImproved motion
Motion SmoothnessGoodExcellentImproved
Bilingual PromptPartialFullImproved
Model Size~14B~14BUnchanged
Estimated Inference Time (720p)~60–80s~90–120sHigher (longer clips)
Open-WeightYesYesUnchanged

The most meaningful upgrade in WAN 2.6 is the doubling of maximum output duration to 10 seconds, which directly enables use cases like short-form social content and product demos without manual clip stitching. Motion coherence across the full clip length is noticeably more stable compared to WAN 2.1.


API Quick Start

WAN 2.6 follows a standard REST inference pattern compatible with most inference platforms. The examples below use the WAN 2.6 endpoint as exposed by a compatible provider (adjust base_url and model slug for your chosen platform).

Python — Text-to-Video

import requests
import time
import os

# ── Configuration ──────────────────────────────────────────────────────────────
API_KEY = os.environ.get("WAN_API_KEY", "your-api-key-here")
BASE_URL = "https://api.your-provider.com/v1"  # replace with your provider's URL
MODEL_ID = "wan-2.6"                            # check your provider's model slug

HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

# ── Step 1: Submit generation job ───────────────────────────────────────────────
def submit_video_job(prompt: str, duration: int = 5, resolution: str = "1280x720") -> str:
    """
    Submit a text-to-video generation request to WAN 2.6.

    Args:
        prompt:     English or Chinese text description.
        duration:   Video length in seconds (1–10).
        resolution: Output resolution string.

    Returns:
        task_id:    String ID for polling job status.
    """
    payload = {
        "model": MODEL_ID,
        "prompt": prompt,
        "parameters": {
            "duration": duration,        # seconds; max 10 for WAN 2.6
            "resolution": resolution,    # "1280x720" or "720x1280" (portrait)
            "fps": 16,                   # 16 or 24
            "guidance_scale": 7.5,       # classifier-free guidance strength
            "num_inference_steps": 50,   # higher = better quality, slower
        }
    }

    resp = requests.post(f"{BASE_URL}/video/generations", headers=HEADERS, json=payload)
    resp.raise_for_status()  # raises HTTPError for 4xx / 5xx responses

    task_id = resp.json()["task_id"]
    print(f"[+] Job submitted. Task ID: {task_id}")
    return task_id


# ── Step 2: Poll for completion ─────────────────────────────────────────────────
def poll_job(task_id: str, poll_interval: int = 10, timeout: int = 300) -> str:
    """
    Poll the job status endpoint until the video is ready.

    Args:
        task_id:       Task ID returned from submit_video_job().
        poll_interval: Seconds between status checks.
        timeout:       Max total wait time in seconds.

    Returns:
        video_url: Direct URL to the generated video file.
    """
    elapsed = 0
    while elapsed < timeout:
        resp = requests.get(f"{BASE_URL}/video/generations/{task_id}", headers=HEADERS)
        resp.raise_for_status()

        data = resp.json()
        status = data.get("status")

        if status == "succeeded":
            video_url = data["output"]["video_url"]
            print(f"[✓] Video ready: {video_url}")
            return video_url
        elif status == "failed":
            raise RuntimeError(f"Generation failed: {data.get('error', 'unknown error')}")
        else:
            print(f"[…] Status: {status} — waiting {poll_interval}s ({elapsed}s elapsed)")
            time.sleep(poll_interval)
            elapsed += poll_interval

    raise TimeoutError(f"Job {task_id} did not complete within {timeout}s")


# ── Step 3: Download the result ─────────────────────────────────────────────────
def download_video(video_url: str, output_path: str = "output.mp4") -> None:
    """Download the generated video to a local file."""
    resp = requests.get(video_url, stream=True)
    resp.raise_for_status()

    with open(output_path, "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)

    print(f"[✓] Saved to {output_path}")


# ── Main execution ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
    PROMPT = (
        "A golden retriever runs across an autumn forest trail, "
        "sunlight filtering through the trees, cinematic slow motion, 4K"
    )

    try:
        task_id = submit_video_job(prompt=PROMPT, duration=5, resolution="1280x720")
        video_url = poll_job(task_id)
        download_video(video_url, output_path="wan26_output.mp4")
    except requests.HTTPError as e:
        print(f"[✗] HTTP error: {e.response.status_code}{e.response.text}")
    except (RuntimeError, TimeoutError) as e:
        print(f"[✗] {e}")

Image-to-Video (Python)

import requests
import base64
import os

API_KEY = os.environ.get("WAN_API_KEY", "your-api-key-here")
BASE_URL = "https://api.your-provider.com/v1"
MODEL_ID = "wan-2.6"

HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

def image_to_video(image_path: str, prompt: str, duration: int = 5) -> str:
    """
    Animate a still image using WAN 2.6 image-to-video mode.

    Args:
        image_path: Local path to source image (JPEG or PNG).
        prompt:     Motion description to guide the animation.
        duration:   Output length in seconds (1–10).

    Returns:
        task_id for downstream polling.
    """
    # Encode image as base64
    with open(image_path, "rb") as img_file:
        image_b64 = base64.b64encode(img_file.read()).decode("utf-8")

    payload = {
        "model": MODEL_ID,
        "prompt": prompt,
        "image": f"data:image/jpeg;base64,{image_b64}",  # or image/png
        "parameters": {
            "duration": duration,
            "resolution": "1280x720",
            "fps": 16,
            "motion_strength": 0.7,  # 0.0 (subtle) – 1.0 (strong motion)
            "num_inference_steps": 50,
        }
    }

    resp = requests.post(f"{BASE_URL}/video/image-to-video", headers=HEADERS, json=payload)
    resp.raise_for_status()

    task_id = resp.json()["task_id"]
    print(f"[+] Image-to-video job submitted. Task ID: {task_id}")
    return task_id

cURL — Minimal Text-to-Video Request

# Submit a WAN 2.6 text-to-video job via cURL
curl -X POST "https://api.your-provider.com/v1/video/generations" \
  -H "Authorization: Bearer $WAN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan-2.6",
    "prompt": "Time-lapse of a city skyline transitioning from dusk to night, neon lights, cinematic",
    "parameters": {
      "duration": 5,
      "resolution": "1280x720",
      "fps": 16,
      "guidance_scale": 7.5,
      "num_inference_steps": 50
    }
  }'

# Poll job status (replace TASK_ID with the returned task_id)
curl "https://api.your-provider.com/v1/video/generations/TASK_ID" \
  -H "Authorization: Bearer $WAN_API_KEY"

Best Use Cases

  1. Short-form social content (TikTok / Reels / Shorts): WAN 2.6’s 10-second output duration covers the minimum viable clip length for most short-video platforms without requiring stitching, cutting production pipeline steps significantly.

  2. E-commerce product animation: The image-to-video mode is well-suited for animating static product photography — rotating a shoe, rippling fabric, or steaming a beverage — with the motion_strength parameter controlling how dramatic the effect is.

  3. Concept visualization for creative teams: Designers and directors can use text-to-video to rapidly prototype scene compositions at 720p before committing to full production, reducing iteration cost by using the API’s ~$0.06–$0.10 per-clip pricing.

  4. Bilingual content pipelines: WAN 2.6’s native Chinese–English bilingual prompt understanding means teams working across both languages don’t need to translate prompts before submission, preserving nuance in culturally specific descriptions.

  5. B-roll generation for video editors: Editors can generate filler footage — weather transitions, abstract motion backgrounds, landscape pans — on demand without stock licensing fees, particularly useful for documentary and explainer video workflows.

  6. Educational and training material production: Institutions can programmatically generate illustrative clips at scale (e.g., science visualizations, historical scene reconstructions) by looping API calls within a content management pipeline.


Access All AI APIs Through AtlasCloud

Managing API keys and integrations for multiple AI providers adds friction to your workflow. AtlasCloud provides unified API access to 300+ production-ready models — including all the models discussed in this article — through a single endpoint and one API key.

New users get a 25% bonus on first top-up (up to $100) at AtlasCloud.

# Access any model through AtlasCloud's unified API
import requests

response = requests.post(
    "https://api.atlascloud.ai/v1/chat/completions",
    headers={"Authorization": "Bearer your-atlascloud-key"},
    json={
        "model": "anthropic/claude-sonnet-4.6",  # switch to any of 300+ models
        "messages": [{"role": "user", "content": "Hello!"}]
    }
)

AtlasCloud bridges leading Chinese and international AI models — Kling, Seedance, WAN, Flux, Claude, GPT, Gemini and more — making it straightforward to compare and swap models without refactoring your integration.


Conclusion

WAN 2.6 represents a meaningful step forward for open-source video generation, with its doubled maximum clip duration, VBench score of 85.22, and robust bilingual prompt support making it one of the most versatile models available today. For developers building video pipelines, the async job submission pattern shown above handles the 90–120 second inference window cleanly without blocking your application thread.

If you need to compare WAN 2.6 against alternatives like Kling 2.0 or Seedance without managing separate API credentials for each, AtlasCloud’s unified endpoint is the fastest path to a provider-agnostic architecture.


References

  1. Alibaba WAN 2.1 Model Card & Technical Report — Hugging Face: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
  2. VBench Leaderboard (official benchmark for video generation models): https://huggingface.co/spaces/Vchitect/VBench_Leaderboard
  3. Alibaba WAN GitHub Repository (architecture, license, training details): https://github.com/Wan-Video/Wan2.1

{
  "title": "WAN 2.6 API Guide: Alibaba's Latest Video Model (2025)",
  "description": "Complete WAN 2.6 API guide covering specs, pricing, Python code examples, and how it compares to WAN 2.1. Start generating 720p video in minutes.",
  "faq": [
    {
      "question": "What is the VBench score for WAN 2.6 and how does it compare to WAN 2.1?",
      "answer": "WAN 2.6 scores 85.22 on VBench, up from 83.4 for WAN 2.1 — a 2.2% improvement. VBench is the standard open benchmark for evaluating video generation quality across motion, fidelity, and prompt adherence. Source: VBench Leaderboard (https://huggingface.co/spaces/Vchitect/VBench_Leaderboard)."
    },
    {
      "question": "How much does the WAN 2.6 API cost per video?",
      "answer": "Alibaba does not currently offer a direct consumer API for WAN 2.6. Third-party inference providers typically charge $0.06–$0.10 per video generation at 720p. Pricing varies by provider, resolution, and clip duration, so always check your chosen platform's current rate card."

---

## Access All AI APIs Through AtlasCloud

Instead of juggling multiple API keys and provider integrations, [AtlasCloud](https://www.atlascloud.ai?ref=JPM683) lets you access 300+ production-ready AI models through a single unified API — including all the models discussed in this article.

New users get a **25% bonus on first top-up** (up to $100).

```python
# Access any model through AtlasCloud's unified API
import requests

response = requests.post(
    "https://api.atlascloud.ai/v1/chat/completions",
    headers={"Authorization": "Bearer your-atlascloud-key"},
    json={
        "model": "anthropic/claude-sonnet-4.6",  # swap to any of 300+ models
        "messages": [{"role": "user", "content": "Hello!"}]
    }
)

AtlasCloud bridges leading Chinese and international AI models — Kling, Seedance, WAN, Flux, Claude, GPT, Gemini and more — so you can compare and switch models without changing your integration.

Try this API on AtlasCloud

AtlasCloud

Frequently Asked Questions

What is WAN 2.6's VBench score and how does it compare to other open-source video models?

WAN 2.6 achieves a VBench score of 85.22, which outperforms comparable open-source video generation competitors. This score reflects improvements over its predecessor WAN 2.1 across three key dimensions: motion quality, prompt adherence, and resolution support. The model supports up to 1280×720 (720p) resolution and 10-second clips at 16–24 fps, making it currently one of the highest-performing op

How long does WAN 2.6 take to generate a video via API and what hardware is required?

WAN 2.6 inference latency for a 720p, 5-second clip runs approximately 90–120 seconds on an A100 GPU. This means developers should architect their applications with asynchronous job queuing rather than synchronous HTTP requests, as generation times will exceed typical API timeout thresholds. For shorter clips (under 5 seconds) or lower resolutions, latency will be proportionally reduced. Self-host

What are the maximum resolution and video duration limits supported by the WAN 2.6 API?

WAN 2.6 supports a maximum resolution of 1280×720 (720p) and a maximum video duration of 10 seconds per clip. The model operates at 16 fps in standard mode and 24 fps in high-quality mode, meaning a 10-second clip at 24 fps produces 240 frames. Both text-to-video and image-to-video input modes are supported. Developers needing longer videos must implement client-side clip stitching, as single-infe

How many parameters does WAN 2.6 have and what does that mean for self-hosted deployment costs?

WAN 2.6 has approximately 14 billion parameters, which places significant VRAM demands on self-hosted deployments — typically requiring at least one A100 80GB GPU or equivalent. Inference latency on that hardware is 90–120 seconds per 720p/5s clip, translating to roughly 30–48 video seconds per GPU-hour. For teams comparing self-hosting versus third-party API pricing, the GPU compute cost per clip

Tags

WAN Alibaba Video Generation API 2026

Related Articles