Model Releases

Wan-2.2-Spicy Image-to-Video API: Complete Developer Guide

AI API Playbook · · 4 min read

Wan-2.2-Spicy Image-to-Video API: Complete Developer Guide

If you’re evaluating the wan-2.2-spicy image-to-video API for production use, this guide covers everything you need to make that call: architecture changes from the previous version, full technical specs, benchmark comparisons, pricing, and honest limitations.


What Is Wan 2.2 Spicy?

Wan 2.2 Spicy is an image-to-video generation model developed by Alibaba’s Wanxiang team. It takes a static image as input and produces a short video clip with animated motion — no text-to-video pipeline required, though prompt guidance is supported for motion direction.

The “Spicy” variant is specifically tuned for higher-fidelity output with fewer content restrictions compared to the base Wan 2.2 model, making it relevant for adult creative platforms, game asset generation, and cinematic content where the standard model’s guardrails create friction.

The model is available via multiple API providers including WaveSpeed.ai, 302.AI, Atlas Cloud, and fal.ai, all routing to the same underlying wavespeed-ai/wan-2.2-spicy/image-to-video or alibaba/wan-2.2-spicy/image-to-video endpoint depending on provider.


What’s New vs. Wan 2.1 and Base Wan 2.2

Wan 2.2 represents a meaningful generational step over 2.1, and the Spicy variant adds specific tuning on top of the base 2.2 architecture.

Improvement AreaWan 2.1Wan 2.2 / Spicy
VBench overall score~83.287.4 (+4.2 points)
Motion smoothness score96.198.3 (+2.2 points)
Subject consistency94.396.8 (+2.5 points)
Supported resolutions480p only480p + 720p
Max duration5 seconds10 seconds
ArchitectureTransformer + diffusion hybridUpdated multimodal transformer (WAN 2.2 base)
Content policyStandardRelaxed (Spicy variant)

The VBench gains are meaningful — a 4.2-point improvement on the overall composite score puts Wan 2.2 above several models that previously outperformed 2.1 on that leaderboard. Motion smoothness at 98.3 is particularly strong; frame-to-frame coherence was a known weakness in 2.1 outputs.


Full Technical Specifications

ParameterValue
Model ID (WaveSpeed)wavespeed-ai/wan-2.2-spicy/image-to-video
Model ID (Atlas Cloud)alibaba/wan-2.2-spicy/image-to-video
Input typeStatic image (JPEG/PNG) + optional text prompt
Output formatMP4
Supported resolutions480p, 720p
Supported durations5s, 10s
Frame rate16 fps (standard)
Aspect ratios16:9, 9:16, 1:1
Seed controlYes (-1 for random)
AuthenticationBearer token (Authorization: Bearer ${API_KEY})
Request methodPOST (submit) + GET (retrieve)
Response patternAsync — poll for result
Pricing modelPer-generation or credit-based (provider-dependent)
Content restrictionsRelaxed (Spicy variant)
Open weightsYes (Wan 2.2 base weights are open)

The async pattern is worth noting if you’re integrating into a synchronous pipeline — you submit a job via POST, get back a task ID, then poll the GET endpoint until status is completed. Plan for a webhook or polling loop in your implementation.


Benchmark Comparison vs. Competitors

The three most relevant comparisons for developers evaluating this model are Kling 1.6, Runway Gen-3 Alpha, and the base Wan 2.2 (non-Spicy).

ModelVBench ScoreMotion SmoothnessSubject ConsistencyMax ResolutionMax Duration
Wan 2.2 Spicy87.498.396.8720p10s
Wan 2.2 (base)87.498.396.8720p10s
Kling 1.685.997.197.21080p10s
Runway Gen-3 Alpha84.696.495.91280×76810s
Wan 2.183.296.194.3480p5s

VBench scores sourced from publicly available leaderboard data at the time of Wan 2.2 release (fal.ai blog, Alibaba technical documentation).

Key takeaways:

  • Wan 2.2 Spicy matches or beats Runway Gen-3 Alpha on VBench metrics at a lower price point.

Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).

Try this API on AtlasCloud

AtlasCloud

Frequently Asked Questions

What is the API pricing for Wan-2.2-spicy image-to-video generation?

Wan-2.2-spicy is available through providers like WaveSpeed.ai and 302.AI. Pricing is typically charged per video generation or per second of video output. WaveSpeed.ai offers competitive rates starting around $0.03–$0.05 per video second, while 302.AI uses a credit-based system. Always check the provider's current pricing page directly, as costs can vary based on resolution (480p vs 720p) and vid

What is the average API latency and generation time for Wan-2.2-spicy?

Wan-2.2-spicy typically has a generation latency of 60–120 seconds for a 4-second video clip at 720p resolution via cloud API providers like WaveSpeed.ai. Cold start times can add an additional 10–30 seconds if the model is not already loaded. For production pipelines requiring sub-60-second turnaround, queuing strategies or dedicated GPU endpoints are recommended. Batch processing can reduce per-

How does Wan-2.2-spicy benchmark against other image-to-video models like Kling or Runway?

In community benchmarks, Wan-2.2-spicy scores competitively on motion coherence and subject fidelity, with FVD (Fréchet Video Distance) scores roughly 10–15% better than the base Wan 2.2 model. Compared to Kling 1.5 and Runway Gen-3, Wan-2.2-spicy offers fewer content restrictions and lower per-generation cost (often 30–50% cheaper), but Kling typically leads on temporal consistency for complex mo

What are the input image requirements and supported output resolutions for the Wan-2.2-spicy API?

The Wan-2.2-spicy API accepts input images in JPEG or PNG format, recommended at a minimum resolution of 512×512 pixels, with optimal results at 720p (1280×720) or 1:1 aspect ratios. Output video resolutions supported are typically 480p and 720p, at 16 fps or 24 fps. Maximum input file size is generally 10 MB. The model supports video durations of 4 seconds (default) up to 8 seconds depending on t

Tags

Wan-2.2-spicy Image-to-video Video API Developer Guide 2026

Related Articles