WAN 2.6 API: Complete Guide to Alibaba's Latest Video Model
WAN 2.6 API Guide: Alibaba’s Latest Video Generation Model
What’s New
WAN 2.6 is Alibaba’s most capable open-source video generation model to date, delivering significant improvements over WAN 2.1 across motion quality, prompt adherence, and resolution support. The model achieves a VBench score of 85.22, outperforming comparable open-source competitors, and supports video generation at up to 1280×720 resolution with durations extending to 10 seconds per clip. Alibaba released WAN 2.6 under an open-weight license, making it accessible via both self-hosted deployments and third-party API providers.
Key Specifications
| Parameter | WAN 2.6 |
|---|---|
| Max Resolution | 1280 × 720 (720p) |
| Max Video Duration | 10 seconds |
| Frame Rate | 16 fps (standard), 24 fps (high quality) |
| Input Modes | Text-to-video, Image-to-video |
| Model Parameters | ~14 billion |
| Inference Latency (720p, 5s) | ~90–120 seconds (A100 GPU) |
| API Price (typical third-party) | ~$0.06–$0.10 per video generation |
| Open-Weight License | Yes (Alibaba WAN License) |
| Languages Supported | Chinese + English prompt bilingual |
Pricing note: Alibaba does not operate a direct consumer API for WAN 2.6 at time of writing. Pricing figures above reflect third-party inference providers. Always check your provider’s current rate card.
Comparison with Previous Version
| Feature | WAN 2.1 | WAN 2.6 | Change |
|---|---|---|---|
| VBench Score | 83.4 | 85.22 | +2.2% |
| Max Resolution | 1280 × 720 | 1280 × 720 | Unchanged |
| Max Duration | 5 seconds | 10 seconds | +100% |
| Text-to-Video | ✅ | ✅ | Unchanged |
| Image-to-Video | ✅ | ✅ | Improved motion |
| Motion Smoothness | Good | Excellent | Improved |
| Bilingual Prompt | Partial | Full | Improved |
| Model Size | ~14B | ~14B | Unchanged |
| Estimated Inference Time (720p) | ~60–80s | ~90–120s | Higher (longer clips) |
| Open-Weight | Yes | Yes | Unchanged |
The most meaningful upgrade in WAN 2.6 is the doubling of maximum output duration to 10 seconds, which directly enables use cases like short-form social content and product demos without manual clip stitching. Motion coherence across the full clip length is noticeably more stable compared to WAN 2.1.
API Quick Start
WAN 2.6 follows a standard REST inference pattern compatible with most inference platforms. The examples below use the WAN 2.6 endpoint as exposed by a compatible provider (adjust base_url and model slug for your chosen platform).
Python — Text-to-Video
import requests
import time
import os
# ── Configuration ──────────────────────────────────────────────────────────────
API_KEY = os.environ.get("WAN_API_KEY", "your-api-key-here")
BASE_URL = "https://api.your-provider.com/v1" # replace with your provider's URL
MODEL_ID = "wan-2.6" # check your provider's model slug
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
# ── Step 1: Submit generation job ───────────────────────────────────────────────
def submit_video_job(prompt: str, duration: int = 5, resolution: str = "1280x720") -> str:
"""
Submit a text-to-video generation request to WAN 2.6.
Args:
prompt: English or Chinese text description.
duration: Video length in seconds (1–10).
resolution: Output resolution string.
Returns:
task_id: String ID for polling job status.
"""
payload = {
"model": MODEL_ID,
"prompt": prompt,
"parameters": {
"duration": duration, # seconds; max 10 for WAN 2.6
"resolution": resolution, # "1280x720" or "720x1280" (portrait)
"fps": 16, # 16 or 24
"guidance_scale": 7.5, # classifier-free guidance strength
"num_inference_steps": 50, # higher = better quality, slower
}
}
resp = requests.post(f"{BASE_URL}/video/generations", headers=HEADERS, json=payload)
resp.raise_for_status() # raises HTTPError for 4xx / 5xx responses
task_id = resp.json()["task_id"]
print(f"[+] Job submitted. Task ID: {task_id}")
return task_id
# ── Step 2: Poll for completion ─────────────────────────────────────────────────
def poll_job(task_id: str, poll_interval: int = 10, timeout: int = 300) -> str:
"""
Poll the job status endpoint until the video is ready.
Args:
task_id: Task ID returned from submit_video_job().
poll_interval: Seconds between status checks.
timeout: Max total wait time in seconds.
Returns:
video_url: Direct URL to the generated video file.
"""
elapsed = 0
while elapsed < timeout:
resp = requests.get(f"{BASE_URL}/video/generations/{task_id}", headers=HEADERS)
resp.raise_for_status()
data = resp.json()
status = data.get("status")
if status == "succeeded":
video_url = data["output"]["video_url"]
print(f"[✓] Video ready: {video_url}")
return video_url
elif status == "failed":
raise RuntimeError(f"Generation failed: {data.get('error', 'unknown error')}")
else:
print(f"[…] Status: {status} — waiting {poll_interval}s ({elapsed}s elapsed)")
time.sleep(poll_interval)
elapsed += poll_interval
raise TimeoutError(f"Job {task_id} did not complete within {timeout}s")
# ── Step 3: Download the result ─────────────────────────────────────────────────
def download_video(video_url: str, output_path: str = "output.mp4") -> None:
"""Download the generated video to a local file."""
resp = requests.get(video_url, stream=True)
resp.raise_for_status()
with open(output_path, "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
print(f"[✓] Saved to {output_path}")
# ── Main execution ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
PROMPT = (
"A golden retriever runs across an autumn forest trail, "
"sunlight filtering through the trees, cinematic slow motion, 4K"
)
try:
task_id = submit_video_job(prompt=PROMPT, duration=5, resolution="1280x720")
video_url = poll_job(task_id)
download_video(video_url, output_path="wan26_output.mp4")
except requests.HTTPError as e:
print(f"[✗] HTTP error: {e.response.status_code} — {e.response.text}")
except (RuntimeError, TimeoutError) as e:
print(f"[✗] {e}")
Image-to-Video (Python)
import requests
import base64
import os
API_KEY = os.environ.get("WAN_API_KEY", "your-api-key-here")
BASE_URL = "https://api.your-provider.com/v1"
MODEL_ID = "wan-2.6"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
def image_to_video(image_path: str, prompt: str, duration: int = 5) -> str:
"""
Animate a still image using WAN 2.6 image-to-video mode.
Args:
image_path: Local path to source image (JPEG or PNG).
prompt: Motion description to guide the animation.
duration: Output length in seconds (1–10).
Returns:
task_id for downstream polling.
"""
# Encode image as base64
with open(image_path, "rb") as img_file:
image_b64 = base64.b64encode(img_file.read()).decode("utf-8")
payload = {
"model": MODEL_ID,
"prompt": prompt,
"image": f"data:image/jpeg;base64,{image_b64}", # or image/png
"parameters": {
"duration": duration,
"resolution": "1280x720",
"fps": 16,
"motion_strength": 0.7, # 0.0 (subtle) – 1.0 (strong motion)
"num_inference_steps": 50,
}
}
resp = requests.post(f"{BASE_URL}/video/image-to-video", headers=HEADERS, json=payload)
resp.raise_for_status()
task_id = resp.json()["task_id"]
print(f"[+] Image-to-video job submitted. Task ID: {task_id}")
return task_id
cURL — Minimal Text-to-Video Request
# Submit a WAN 2.6 text-to-video job via cURL
curl -X POST "https://api.your-provider.com/v1/video/generations" \
-H "Authorization: Bearer $WAN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan-2.6",
"prompt": "Time-lapse of a city skyline transitioning from dusk to night, neon lights, cinematic",
"parameters": {
"duration": 5,
"resolution": "1280x720",
"fps": 16,
"guidance_scale": 7.5,
"num_inference_steps": 50
}
}'
# Poll job status (replace TASK_ID with the returned task_id)
curl "https://api.your-provider.com/v1/video/generations/TASK_ID" \
-H "Authorization: Bearer $WAN_API_KEY"
Best Use Cases
-
Short-form social content (TikTok / Reels / Shorts): WAN 2.6’s 10-second output duration covers the minimum viable clip length for most short-video platforms without requiring stitching, cutting production pipeline steps significantly.
-
E-commerce product animation: The image-to-video mode is well-suited for animating static product photography — rotating a shoe, rippling fabric, or steaming a beverage — with the
motion_strengthparameter controlling how dramatic the effect is. -
Concept visualization for creative teams: Designers and directors can use text-to-video to rapidly prototype scene compositions at 720p before committing to full production, reducing iteration cost by using the API’s ~$0.06–$0.10 per-clip pricing.
-
Bilingual content pipelines: WAN 2.6’s native Chinese–English bilingual prompt understanding means teams working across both languages don’t need to translate prompts before submission, preserving nuance in culturally specific descriptions.
-
B-roll generation for video editors: Editors can generate filler footage — weather transitions, abstract motion backgrounds, landscape pans — on demand without stock licensing fees, particularly useful for documentary and explainer video workflows.
-
Educational and training material production: Institutions can programmatically generate illustrative clips at scale (e.g., science visualizations, historical scene reconstructions) by looping API calls within a content management pipeline.
Access All AI APIs Through AtlasCloud
Managing API keys and integrations for multiple AI providers adds friction to your workflow. AtlasCloud provides unified API access to 300+ production-ready models — including all the models discussed in this article — through a single endpoint and one API key.
New users get a 25% bonus on first top-up (up to $100) at AtlasCloud.
# Access any model through AtlasCloud's unified API
import requests
response = requests.post(
"https://api.atlascloud.ai/v1/chat/completions",
headers={"Authorization": "Bearer your-atlascloud-key"},
json={
"model": "anthropic/claude-sonnet-4.6", # switch to any of 300+ models
"messages": [{"role": "user", "content": "Hello!"}]
}
)
AtlasCloud bridges leading Chinese and international AI models — Kling, Seedance, WAN, Flux, Claude, GPT, Gemini and more — making it straightforward to compare and swap models without refactoring your integration.
Conclusion
WAN 2.6 represents a meaningful step forward for open-source video generation, with its doubled maximum clip duration, VBench score of 85.22, and robust bilingual prompt support making it one of the most versatile models available today. For developers building video pipelines, the async job submission pattern shown above handles the 90–120 second inference window cleanly without blocking your application thread.
If you need to compare WAN 2.6 against alternatives like Kling 2.0 or Seedance without managing separate API credentials for each, AtlasCloud’s unified endpoint is the fastest path to a provider-agnostic architecture.
References
- Alibaba WAN 2.1 Model Card & Technical Report — Hugging Face: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
- VBench Leaderboard (official benchmark for video generation models): https://huggingface.co/spaces/Vchitect/VBench_Leaderboard
- Alibaba WAN GitHub Repository (architecture, license, training details): https://github.com/Wan-Video/Wan2.1
{
"title": "WAN 2.6 API Guide: Alibaba's Latest Video Model (2025)",
"description": "Complete WAN 2.6 API guide covering specs, pricing, Python code examples, and how it compares to WAN 2.1. Start generating 720p video in minutes.",
"faq": [
{
"question": "What is the VBench score for WAN 2.6 and how does it compare to WAN 2.1?",
"answer": "WAN 2.6 scores 85.22 on VBench, up from 83.4 for WAN 2.1 — a 2.2% improvement. VBench is the standard open benchmark for evaluating video generation quality across motion, fidelity, and prompt adherence. Source: VBench Leaderboard (https://huggingface.co/spaces/Vchitect/VBench_Leaderboard)."
},
{
"question": "How much does the WAN 2.6 API cost per video?",
"answer": "Alibaba does not currently offer a direct consumer API for WAN 2.6. Third-party inference providers typically charge $0.06–$0.10 per video generation at 720p. Pricing varies by provider, resolution, and clip duration, so always check your chosen platform's current rate card."
---
## Access All AI APIs Through AtlasCloud
Instead of juggling multiple API keys and provider integrations, [AtlasCloud](https://www.atlascloud.ai?ref=JPM683) lets you access 300+ production-ready AI models through a single unified API — including all the models discussed in this article.
New users get a **25% bonus on first top-up** (up to $100).
```python
# Access any model through AtlasCloud's unified API
import requests
response = requests.post(
"https://api.atlascloud.ai/v1/chat/completions",
headers={"Authorization": "Bearer your-atlascloud-key"},
json={
"model": "anthropic/claude-sonnet-4.6", # swap to any of 300+ models
"messages": [{"role": "user", "content": "Hello!"}]
}
)
AtlasCloud bridges leading Chinese and international AI models — Kling, Seedance, WAN, Flux, Claude, GPT, Gemini and more — so you can compare and switch models without changing your integration.
Try this API on AtlasCloud
AtlasCloudFrequently Asked Questions
What is WAN 2.6's VBench score and how does it compare to other open-source video models?
WAN 2.6 achieves a VBench score of 85.22, which outperforms comparable open-source video generation competitors. This score reflects improvements over its predecessor WAN 2.1 across three key dimensions: motion quality, prompt adherence, and resolution support. The model supports up to 1280×720 (720p) resolution and 10-second clips at 16–24 fps, making it currently one of the highest-performing op
How long does WAN 2.6 take to generate a video via API and what hardware is required?
WAN 2.6 inference latency for a 720p, 5-second clip runs approximately 90–120 seconds on an A100 GPU. This means developers should architect their applications with asynchronous job queuing rather than synchronous HTTP requests, as generation times will exceed typical API timeout thresholds. For shorter clips (under 5 seconds) or lower resolutions, latency will be proportionally reduced. Self-host
What are the maximum resolution and video duration limits supported by the WAN 2.6 API?
WAN 2.6 supports a maximum resolution of 1280×720 (720p) and a maximum video duration of 10 seconds per clip. The model operates at 16 fps in standard mode and 24 fps in high-quality mode, meaning a 10-second clip at 24 fps produces 240 frames. Both text-to-video and image-to-video input modes are supported. Developers needing longer videos must implement client-side clip stitching, as single-infe
How many parameters does WAN 2.6 have and what does that mean for self-hosted deployment costs?
WAN 2.6 has approximately 14 billion parameters, which places significant VRAM demands on self-hosted deployments — typically requiring at least one A100 80GB GPU or equivalent. Inference latency on that hardware is 90–120 seconds per 720p/5s clip, translating to roughly 30–48 video seconds per GPU-hour. For teams comparing self-hosting versus third-party API pricing, the GPU compute cost per clip
Tags
Related Articles
Seedance 2.0 API Guide
A comprehensive guide to Seedance 2.0 API Guide
AI Video Generation API Benchmark 2026: Kling vs Seedance vs WAN
Explore our 2026 AI video generation API benchmark comparing Kling, Seedance, and WAN. Discover speed, quality, and pricing insights to choose the best tool.
Seedance 2.0 API Integration Guide: Text-to-Video with Python
Learn how to integrate the Seedance 2.0 API for text-to-video generation using Python. Step-by-step guide with code examples, authentication, and best practices.