How much does the Wan-2.2-Spicy Image-to-Video LoRA API cost per generation?

Pricing varies by provider hosting the Wan-2.2-Spicy model, but typical API costs on platforms like fal.ai or Replicate range from $0.05 to $0.12 per video generation for 720p at 81 frames. Some providers charge per-second of output video, averaging around $0.008–$0.015 per second. Native LoRA loading via the API parameter does not add extra cost on most platforms, but cold-start LoRA loading can

What is the average API latency for Wan-2.2-Spicy at 720p resolution with 81 frames?

Wan-2.2-Spicy averages approximately 68 seconds for a 720p, 81-frame generation, which is a 28% improvement over Wan 2.1's ~95 seconds for the same configuration. This improvement is attributed to the Mixture-of-Experts (MoE) backbone, which distributes compute more efficiently at inference time. On GPU-backed serverless APIs (e.g., A100 80GB), cold-start latency adds an additional 8–20 seconds if

How do I pass a LoRA adapter to the Wan-2.2-Spicy API, and what formats are supported?

Wan-2.2-Spicy introduced native LoRA support as a direct API parameter (unlike Wan 2.1, which required external pipeline wrappers). You pass the LoRA via a dedicated parameter such as `lora_url` or `lora_path` in the request payload, pointing to a hosted `.safetensors` file. Most providers support LoRA weights trained at ranks 4, 8, 16, and 32, with rank 16 being the recommended balance between qu

What are the benchmark scores for Wan-2.2-Spicy compared to competing image-to-video models?

On the VBench evaluation suite, Wan-2.2-Spicy scores 85.7 overall, up from 83.1 on Wan 2.1 — a gain of 2.6 points. Specific sub-metrics include motion smoothness at 98.1 (vs. 97.2 on Wan 2.1) and temporal consistency at 96.5 (vs. 94.3 on Wan 2.1). Compared to competing models, Wan-2.2 outperforms Stable Video Diffusion 1.1 (VBench ~81.4) and sits competitively against Kling 1.5 (VBench ~86.2), tho

Wan-2.2-Spicy Image-to-Video LoRA API: Complete Developer Guide

Wan-2.2-Spicy is Alibaba Wanxiang’s image-to-video model with native LoRA support, targeting developers who need animated output from static images — including unrestricted creative content. This guide covers the full API surface, benchmarks, pricing, and honest trade-offs so you can decide whether it fits your production pipeline.

What’s New vs. Wan 2.1

Wan 2.2 isn’t a cosmetic version bump. The architecture shifted to a Mixture-of-Experts (MoE) backbone, which changes how compute is distributed across the model at inference time. Here’s what that translates to in practice:

Metric	Wan 2.1	Wan 2.2	Change
VBench overall score	83.1	85.7	+2.6 pts
Motion smoothness	97.2	98.1	+0.9 pts
Temporal consistency	94.3	96.5	+2.2 pts
Avg. generation latency (720p, 81 frames)	~95s	~68s	−28%
LoRA loading support	External only	Native API param	Yes
Max resolution	720p	1280×720 stable	Same ceiling, improved stability

The MoE routing reduces redundant computation on simpler frames, which is the primary driver of the latency improvement. Temporal consistency gains are the most developer-relevant change: fewer flickering artifacts on long clips means less post-processing work.

LoRA support is now a first-class API parameter rather than a workaround requiring local inference. You pass a lora_url pointing to a .safetensors file, and the model loads it per-request — no persistent model state, no custom container.

Full Technical Specifications

Parameter	Value
Model ID (AtlasCloud)	`alibaba/wan-2.2-spicy/image-to-video-lora`
Model ID (WaveSpeed)	`wavespeed-ai/wan-2.2-spicy-image-to-video-lora`
Architecture	Diffusion Transformer + MoE
Input	Single image (JPEG/PNG/WebP) + text prompt
Output format	MP4 (H.264)
Supported resolutions	480p (832×480), 720p (1280×720)
Frame count options	33, 49, 81 frames
Frame rate	16 fps (default)
LoRA support	Yes — URL-based, per-request
LoRA weight format	`.safetensors`
LoRA strength param	`lora_scale` (0.0–1.0, default 0.8)
Guidance scale range	1.0–10.0
Inference steps range	10–50 (default 30)
Max prompt length	512 tokens
Content policy	Unrestricted (NSFW-capable)
API authentication	Bearer token
Response type	Async (polling) or webhook

Benchmark Comparison

These scores use VBench (the standard video generation evaluation suite) and FVD (Fréchet Video Distance, lower is better). Competitor data sourced from their respective technical reports and third-party evaluations.

Model	VBench Score	Motion Smoothness	Temporal Consistency	FVD (UCF-101)	Native LoRA	NSFW-capable API
Wan 2.2-Spicy	85.7	98.1	96.5	~290	✅	✅
Wan 2.2 (base)	85.7	98.1	96.5	~290	✅	❌
Kling 1.6	84.9	97.8	95.9	~310	❌	❌
Runway Gen-3 Alpha	84.2	97.1	95.3	~340	❌	❌

What the numbers mean for you:

Wan 2.2-Spicy and the base Wan 2.2 are architecturally identical in quality metrics. The “Spicy” variant’s differentiation is the removed content policy, not a quality uplift.
Kling 1.6 is the closest quality competitor. The 0.8-point VBench gap is small but measurable in motion coherence on fast-moving subjects.
Runway Gen-3 Alpha trails on temporal consistency, which shows up as subtle frame-to-frame drift on complex scenes.
FVD scores for Wan 2.2 are approximate, derived from fal.ai’s published evaluation data. Take FVD cross-comparisons with some caution since test set composition varies.

Pricing Comparison

Pricing is per-second of generated video or per-generation depending on provider. The table below normalizes to cost per 5-second clip at 720p (the most common production unit).

Provider	Model	Pricing model	Est. cost per 5s clip (720p)	Free tier
WaveSpeed.ai	wan-2.2-spicy-i2v-lora	Per generation	~$0.028	Yes (credits)
AtlasCloud.ai	alibaba/wan-2.2-spicy/i2v-lora	Per second of video	~$0.035	Trial credits
Runway (Gen-3 Alpha)	—	Per credit (~$0.05/credit)	~$0.25–0.50	Limited
Kling (via API)	Kling 1.6	Per generation	~$0.14–0.20	No
fal.ai	wan/v2.2/image-to-video	Per generation	~$0.030	Yes

Wan 2.2-Spicy via WaveSpeed or fal.ai is roughly 6–8× cheaper than Runway for equivalent output length. The gap narrows if you need Runway-specific features (inpainting, director mode), but for straightforward image-to-video animation it’s a meaningful cost difference at scale.

Best Use Cases

1. Adult content platforms The primary reason to use this variant over base Wan 2.2. If your platform requires animated content without safety filter restrictions, this is currently one of the few production-grade APIs that explicitly supports it. Example: animating static images uploaded by creators on a subscription platform.

2. LoRA-driven style transfer at scale Character-consistent animations where you’ve trained a subject LoRA. Pass the same lora_url across batches to maintain visual identity across hundreds of clips. Example: animating product photography in a consistent stylized look for an e-commerce catalog.

3. Rapid prototyping of video concepts At ~$0.028 per clip and 30–50s generation time for 33-frame outputs, iteration cost is low enough to generate 20+ variants of a concept. Example: motion graphics agencies testing different animation directions before committing to manual production.

4. Automated social content pipelines Webhook-based async responses make it composable with queue systems. Example: trigger generation when a new product image is uploaded, receive the MP4 via webhook, push to CDN.

Limitations and When NOT to Use This Model

Be specific about what Wan 2.2-Spicy won’t do well:

1. Video longer than ~5 seconds At 81 frames / 16fps you get approximately 5 seconds. There’s no native concatenation or temporal extension in the API. Long-form video (30s+) requires external stitching logic and introduces consistency problems at cut points.

2. Precise camera control You can prompt for camera movement (e.g., “slow zoom out”) but there’s no programmatic camera path input. If you need deterministic pan/tilt/zoom defined by coordinates, use a model with explicit camera conditioning like CogVideoX-5B or wait for a future Wan release.

3. High-motion action sequences VBench motion quality scores are strong, but fast-moving objects (sports, fight scenes) still produce artifacts at the edges of moving subjects. This is a general diffusion video limitation, not specific to Wan 2.2.

4. Text legibility in video Readable text within the generated video is unreliable. Don’t use this for animated infographics or anything requiring stable on-screen text.

5. GDPR/regulated environments You’re sending images to a third-party inference API. For pipelines involving personal data, biometric images, or regulated health content, evaluate whether WaveSpeed/AtlasCloud’s data processing terms satisfy your compliance requirements. Both providers currently host in US/EU regions but audit this yourself.

6. When you need SLA guarantees WaveSpeed and AtlasCloud are both relatively young platforms. If you need 99.9% uptime SLAs with contractual backing, neither currently offers enterprise-grade SLAs comparable to AWS or Azure.

Minimal Working Code Example

import requests, time

API_URL = "https://api.wavespeed.ai/api/v3/wavespeed-ai/wan-2.2-spicy-image-to-video-lora"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json"}

payload = {
    "image": "https://your-cdn.com/input-image.jpg",
    "prompt": "gentle breeze, hair flowing, soft bokeh background",
    "lora_url": "https://your-storage.com/your-lora.safetensors",
    "lora_scale": 0.8,
    "num_frames": 49,
    "resolution": "720p",
    "num_inference_steps": 30,
    "guidance_scale": 5.0
}

r = requests.post(API_URL, json=payload, headers=HEADERS).json()
request_id = r["data"]["id"]

while True:
    status = requests.get(f"https://api.wavespeed.ai/api/v3/predictions/{request_id}", headers=HEADERS).json()
    if status["data"]["status"] == "completed":
        print(status["data"]["outputs"][0]); break
    time.sleep(5)

This polls until completion. In production, replace the polling loop with a callback_url parameter to receive a webhook instead of blocking a thread.

Key Parameters Reference

Parameter	Type	Default	Notes
`image`	string (URL)	required	JPEG/PNG/WebP, max 10MB
`prompt`	string	required	Max 512 tokens
`negative_prompt`	string	`""`	Standard diffusion negative prompt
`lora_url`	string (URL)	`null`	`.safetensors` only
`lora_scale`	float	0.8	0.0 = no LoRA effect, 1.0 = full
`num_frames`	int	49	33 / 49 / 81
`resolution`	string	`"480p"`	`"480p"` or `"720p"`
`num_inference_steps`	int	30	More steps = slower + marginal quality gain past 40
`guidance_scale`	float	5.0	Higher = more prompt-adherent, less natural motion
`seed`	int	random	Set for reproducibility
`callback_url`	string	`null`	POST webhook on completion

Integration Notes

Async is the right pattern. Generation at 720p/49 frames takes 40–70 seconds. Blocking HTTP requests will hit timeout limits in most API gateway configurations. Use the callback URL or implement a polling queue with exponential backoff.

LoRA hosting. Your .safetensors file needs to be publicly accessible via HTTPS at request time. WaveSpeed fetches it on each request. If latency matters, host on a CDN close to WaveSpeed’s inference region (currently US-West). A 1–2GB LoRA file fetched from a slow origin can add 10–20 seconds to your first request.

Seed handling. Set seed explicitly in any A/B testing or quality evaluation workflow. Without a fixed seed, comparing parameter changes across generations is unreliable.

Conclusion

Wan 2.2-Spicy fills a specific gap: production-quality image-to-video generation with native LoRA support and no content restrictions, at a price point ($0.028–0.035/clip) that makes iterative pipelines economically viable. If your use case requires unrestricted content or LoRA-based style consistency, it’s currently the strongest available API option; if you don’t need either of those features, the base Wan 2.2 or fal.ai’s hosted version is an equally capable and slightly simpler integration path.

Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).

Wan-2.2-Spicy Image-to-Video LoRA API: Developer Guide

Wan-2.2-Spicy Image-to-Video LoRA API: Complete Developer Guide

What’s New vs. Wan 2.1

Full Technical Specifications

Benchmark Comparison

Pricing Comparison

Best Use Cases

Limitations and When NOT to Use This Model

Minimal Working Code Example

Key Parameters Reference

Integration Notes

Conclusion

Frequently Asked Questions

Tags

Related Articles

Gemini Flash Image-to-Video API: Complete Developer Guide

Gemini Flash Text-to-Video API: Complete Developer Guide

HappyHorse-1.0 Reference-to-Video API: Developer Guide