Vidu Q3-Turbo Start-End-to-Video API: Developer Guide
Vidu Q3-Turbo Start-End-to-Video API: Complete Developer Guide
The Vidu Q3-Turbo start-end-to-video API takes two frames — a start image and an end image — and generates a video clip that transitions between them. If you’re evaluating whether to integrate this into a production pipeline, this guide covers the full spec, benchmarks, pricing, and honest trade-offs.
What’s New vs. Previous Versions
Vidu’s model lineage runs Q1 → Q2 Turbo/Pro → Q3 Turbo/Pro. Here’s what changed at each generational step that’s relevant to the start-end-to-video task:
| Improvement Area | Q2 Turbo | Q3 Turbo | Change |
|---|---|---|---|
| Max output resolution | 720p | 1080p | +50% pixel density |
| Max clip duration | 4 seconds | 8 seconds (Q3 series) | +100% |
| Frame rate | 16 fps | 24 fps | +50% |
| Motion coherence (VBench) | ~78.2 | ~82.6 (Q3 series) | ~+5.6% |
| Subject consistency | Moderate | High | Qualitative; Q3 holds reference identity across longer spans |
| Turnaround speed | Baseline | ~30–40% faster than Q3 Pro | Turbo vs. Pro tradeoff within Q3 |
Key architecture change: Q3 introduces what Shengshu Technology describes as improved temporal attention across the full clip — meaning the model doesn’t just interpolate; it attempts physically plausible motion paths between your start and end frames. Q2 Turbo would sometimes produce drift artifacts mid-clip; Q3 Turbo reduces these noticeably on controlled test inputs.
Q3 Turbo vs. Q3 Pro: Within the Q3 family, Turbo trades some fidelity (lower per-frame detail on complex scenes) for faster generation. For use cases where you’re iterating rapidly — concept animation, storyboard previsualization — Turbo is the right pick. For final delivery, Q3 Pro is worth the wait.
Full Technical Specifications
| Parameter | Value |
|---|---|
| API endpoint | POST /vidu/q3-turbo/start-end2video (Vtrix) or platform-specific |
| Authentication | Bearer Token (Authorization header) |
| Input — start frame | Image URL or base64; JPEG/PNG |
| Input — end frame | Image URL or base64; JPEG/PNG |
| Recommended input resolution | 1280×720 minimum; 1920×1080 ideal |
| Output resolution | Up to 1080p |
| Output format | MP4 (H.264) |
| Frame rate | 24 fps |
| Clip duration | 4 or 8 seconds (configurable) |
| Aspect ratios supported | 16:9, 9:16, 1:1 |
| Generation mode | Async (poll by task ID) |
| Polling mechanism | GET request with task_id |
| Typical generation time | 30–90 seconds depending on load and clip length |
| Rate limits | Varies by provider tier; check your API Key Management page |
| Prompt support | Optional text prompt to guide motion |
| Reference image support | Via start/end frame definition; no separate reference image param in this endpoint |
The API is asynchronous by design. You POST the job, receive a task_id, and poll a status endpoint until the job returns completed with a video URL. Plan your integration accordingly — don’t block threads waiting.
Benchmark Comparison
No single universal benchmark covers start-end-to-video specifically, but VBench and qualitative human evaluation scores exist for the underlying models. The comparison below uses VBench composite scores and motion smoothness sub-scores where available.
| Model | VBench Composite | Motion Smoothness | Max Resolution | Max Duration | Start+End Input |
|---|---|---|---|---|---|
| Vidu Q3 Turbo | ~82.6 | ~96.1 | 1080p | 8 sec | ✅ |
| Kling v2.6 Pro | ~83.1 | ~95.8 | 1080p | 10 sec | ✅ |
| Kling v3.0 Pro | ~84.2 | ~96.4 | 1080p | 10 sec | ✅ |
| Vidu Q2 Turbo | ~78.2 | ~93.4 | 720p | 4 sec | ✅ |
Reading these numbers honestly: VBench scores in the low-to-mid 80s are competitive for current-generation models. Kling v3.0 Pro scores slightly higher in composite, but it’s also positioned as a premium Pro-tier model with correspondingly higher latency and cost. Vidu Q3 Turbo’s value proposition is 24fps output at 1080p with faster turnaround than either Kling Pro variant — not a top raw-quality score.
The ~96.1 motion smoothness score means transitions between your start and end frames are generally artifact-free at normal playback speed. At 0.5x slow motion, you’ll see interpolation seams on high-frequency edge content (text, fine fabric).
Pricing vs. Alternatives
Pricing for AI video APIs is billed per second of output or per generation. Here’s how Q3 Turbo stacks up:
| Provider / Model | Billing Unit | Approx. Cost per 4-sec Clip | Approx. Cost per 8-sec Clip |
|---|---|---|---|
| Vidu Q3 Turbo (via Vtrix/platform) | Per generation | ~$0.20–$0.35 | ~$0.35–$0.60 |
| Vidu Q3 Pro | Per generation | ~$0.45–$0.70 | ~$0.70–$1.10 |
| Kling v2.6 Pro (Novita AI) | Per second | ~$0.28 | ~$0.56 |
| Kling v3.0 Pro (Novita AI) | Per second | ~$0.40 | ~$0.80 |
| Vidu Q2 Turbo | Per generation | ~$0.10–$0.18 | N/A (4s max) |
Prices are estimates based on publicly listed API rates as of mid-2025. Always check your provider’s current pricing page — these change frequently.
Cost-per-quality assessment: Q3 Turbo sits at a reasonable midpoint. If you’re generating hundreds of clips per day, the gap between Q3 Turbo and Q3 Pro compounds quickly (~2x cost). The jump from Q2 Turbo to Q3 Turbo is mostly justified by the resolution and duration increase — 720p at 4 seconds is a meaningful limitation for production use cases.
Best Use Cases
1. E-commerce product animation Take a “flat product on white” shot as the start frame, a “product in lifestyle context” as the end frame, and generate a 4-second transition clip. The model handles camera-stable subjects well. Realistic for: apparel, footwear, packaged goods.
2. Storyboard-to-animatic conversion Use consecutive storyboard panels as start/end pairs. Q3 Turbo’s 30–90 second generation time means you can run a 12-panel board in parallel batches and have a rough animatic in under 5 minutes — significantly faster than traditional 2D animation workflows.
3. Real estate and architectural walkthroughs Start frame: exterior shot. End frame: interior shot. The model generates a plausible zoom/transition. Quality is sufficient for client pitch decks; not sufficient for final broadcast.
4. Social media loops Because you control both the start and end frame, you can feed the same image as both start and end, combined with a motion prompt, to create seamless loop content. Works reliably at 1:1 aspect ratio for platform-optimized clips.
5. Concept visualization for games/film pre-production Scene transition blocking — e.g., character enters door (start frame) / character arrives at destination (end frame). Fast enough for director review cycles.
Limitations and Cases Where You Should NOT Use This Model
Don’t use Q3 Turbo when:
-
You need precise motion control. You cannot specify intermediate keyframes or velocity curves. If a character needs to wave their hand in a specific way, the model will invent the motion path. Kling v3.0 with motion brush, or a dedicated controllable video model, is a better fit.
-
Your content contains text or complex UI. Sub-frame text legibility degrades noticeably. Any clip where readable text in motion matters — tutorial screencasts, UI demos, typography animation — will produce unacceptable artifacts.
-
You need longer than 8 seconds. Maximum clip duration is 8 seconds. For longer outputs, you’d need to stitch multiple clips, which introduces cut artifacts unless you plan your start/end frames carefully at each boundary.
-
Your pipeline requires synchronous response. The async polling model is non-negotiable. If your infrastructure doesn’t tolerate webhooks or polling loops (e.g., a serverless function with a hard 30-second timeout), you need to architect a queue system or choose a different API.
-
You need 60fps output. Q3 Turbo outputs at 24fps. Broadcast or gaming contexts requiring 60fps will need post-processing upsampling, which costs quality.
-
Your subject involves heavy deformation or non-rigid motion. Cloth simulation, water, fire — the model handles these worse than character animation. Expect smearing on high-motion fluid content.
-
Privacy-sensitive biometric content. Like all current video generation APIs, the model’s outputs and potentially your input frames may pass through third-party infrastructure. Read the data handling terms of your chosen provider (Vtrix, Novita AI, or platform.vidu.com directly) before sending personally identifiable imagery.
Minimal Working Code Example
import requests, time
API_KEY = "your_bearer_token"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
BASE = "https://api.vtrix.ai" # replace with your provider base URL
payload = {
"model": "vidu-q3-turbo",
"start_image_url": "https://your-cdn.com/frame_start.jpg",
"end_image_url": "https://your-cdn.com/frame_end.jpg",
"duration": 4,
"aspect_ratio": "16:9",
"prompt": "smooth camera pull-back"
}
job = requests.post(f"{BASE}/vidu/q3-turbo/start-end2video", json=payload, headers=HEADERS).json()
task_id = job["task_id"]
for _ in range(30):
result = requests.get(f"{BASE}/tasks/{task_id}", headers=HEADERS).json()
if result["status"] == "completed":
print(result["video_url"]); break
time.sleep(5)
Replace the base URL with your provider’s endpoint. The polling loop checks every 5 seconds up to 150 seconds total — adjust for your SLA requirements. Error handling and retry logic are omitted for brevity; add them before shipping to production.
Provider Access Options
The Vidu Q3-Turbo start-end-to-video API is accessible through multiple routes:
- platform.vidu.com — Vidu’s own platform; direct access, supports the full endpoint spec including start-end-to-video, text-to-video, and template APIs.
- Vtrix API — Third-party wrapper with documented Bearer Token auth; useful if you’re already on their infrastructure.
- Novita AI — Lists Vidu Q2 series with documentation patterns directly applicable to Q3; check their model catalog for Q3 Turbo availability as it rolls out.
Using an intermediary API provider adds a layer between you and Shengshu Technology’s infrastructure. This can mean lower prices or bundled credits, but also means you’re subject to that provider’s uptime, rate limits, and data handling policies — not just Vidu’s. For production workloads, direct platform access typically gives you better SLA visibility.
Conclusion
Vidu Q3-Turbo’s start-end-to-video API delivers a meaningful upgrade over Q2 Turbo — 1080p output, 24fps, and up to 8-second clips at a competitive price point — making it a credible option for product animation, pre-visualization, and social content pipelines. It doesn’t beat Kling v3.0 Pro on raw VBench scores, and the async-only architecture plus absence of keyframe control are real constraints you need to design around before committing to an integration.
Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).
Try this API on AtlasCloud
AtlasCloudFrequently Asked Questions
What is the pricing for Vidu Q3-Turbo start-end-to-video API per generation?
Vidu Q3-Turbo is a credit-based API. Based on the developer guide, pricing scales with output duration and resolution. Generating an 8-second clip at 1080p consumes more credits than shorter 4-second clips at 720p. Developers should check the official Vidu pricing page for exact per-credit costs, but the Q3-Turbo tier is positioned as a cost-optimized option compared to Q3-Pro, trading some qualit
What is the API latency or generation time for Vidu Q3-Turbo start-end-to-video requests?
Vidu Q3-Turbo is designed for faster turnaround compared to Q3-Pro. While exact P50/P95 latency numbers depend on server load, the Turbo variant targets production pipeline use cases where speed matters. Output resolution of 1080p at 24fps for up to 8 seconds means generation times are longer than the previous Q2 Turbo (which maxed at 720p/16fps/4s). Developers should plan for asynchronous job pol
How does Vidu Q3-Turbo score on VBench compared to competitors for video generation quality?
According to the developer guide, Vidu Q3-Turbo (Q3 series) achieves a VBench motion coherence score of approximately 82.6, compared to 78.2 for Q2 Turbo — a improvement of roughly +5.6%. This places the Q3 series competitively in the mid-to-high tier of video generation models. Subject consistency is rated qualitatively as 'High' vs 'Moderate' for Q2 Turbo, meaning Q3-Turbo better preserves refer
What are the maximum resolution, frame rate, and clip duration limits for the Vidu Q3-Turbo API?
Vidu Q3-Turbo supports a maximum output resolution of 1080p (up from 720p in Q2 Turbo, representing a +50% increase in pixel density), a frame rate of 24fps (up from 16fps in Q2 Turbo, +50%), and a maximum clip duration of 8 seconds (up from 4 seconds in Q2 Turbo, +100%). These three combined improvements make Q3-Turbo substantially more capable for production use cases than Q2. For the start-end-
Tags
Related Articles
Seedance 2.0 Image-to-Video API: Complete Developer Guide
Master the Seedance 2.0 Fast Image-to-Video API with our complete developer guide. Learn endpoints, parameters, authentication, and best practices to build faster.
Seedance 2.0 Fast Reference-to-Video API: Developer Guide
Master the Seedance 2.0 Fast Reference-to-Video API with our complete developer guide. Explore endpoints, parameters, and code examples to build faster video apps.
Seedance 2.0 Text-to-Video API: Complete Developer Guide
Master the Seedance 2.0 Text-to-Video API with our complete developer guide. Explore endpoints, parameters, code examples, and best practices to build AI video apps.