Vidu Q3-Pro Start-End-to-Video API: Developer Guide
Vidu Q3-Pro Start-End-to-Video API: Complete Developer Guide
If you’re building a video generation pipeline and evaluating whether Vidu Q3-Pro’s start-end-to-video endpoint belongs in it, this guide gives you the specs, benchmarks, pricing, and honest trade-offs to make that call.
What Is Start-End-to-Video?
The start-end-to-video capability is a specific generation mode where you supply two frames — a starting image and an ending image — and the model interpolates a coherent video transition between them. This differs from standard image-to-video (one anchor frame) or text-to-video (no visual anchor at all).
Vidu Q3-Pro exposes this as a dedicated endpoint through multiple API providers:
- Vidu’s own platform:
POST https://platform.vidu.com/(see docs.platform.vidu.com) - Pollo.ai:
POST https://pollo.ai/api/platform/generation/vidu/viduq3-pro(docs.pollo.ai) - fal.ai:
fal-ai/vidu/start-end-to-video(fal.ai) - Novita AI: documented alongside Q3-Pro text-to-video (novita.ai)
The workflow is asynchronous: you POST a job, receive a task ID, then poll for the result. This is consistent across all provider wrappers.
What’s New vs. Previous Versions
Vidu Q3-Pro is the third major generation model from Vidu, succeeding Q1 and Vidu 2.0. Based on published documentation and provider release notes:
| Capability | Vidu Q1 | Vidu 2.0 | Vidu Q3-Pro |
|---|---|---|---|
| Max resolution | 720p | 720p | 1080p |
| Max duration | 4 sec | 8 sec | 16 sec |
| Start-end mode | No | No | Yes |
| Audio sync | No | Limited | Yes (native) |
| API availability | Novita, Pollo | Novita, Pollo | Vidu platform, fal.ai, Novita, Pollo |
Key jumps:
- Resolution: 720p → 1080p, a 2.25× increase in pixel count
- Duration: 4 sec → 16 sec maximum, a 4× increase
- Start-end interpolation: introduced for the first time in Q3-Pro
- Synchronized audio generation: added as a native capability, not available in prior versions
These are documented capability additions, not estimated performance claims. No official FPS benchmarks comparing Q1 to Q3-Pro have been published at time of writing.
Full Technical Specifications
| Parameter | Value |
|---|---|
| Model name | Vidu Q3-Pro |
| Generation modes | Text-to-video, image-to-video, start-end-to-video |
| Max resolution | 1080p (1920×1080) |
| Min resolution | Not officially specified; inferred 480p from provider docs |
| Video duration range | 1–16 seconds |
| Audio | Synchronized audio generation supported |
| Input: start-end mode | Two images (start frame + end frame) + optional text prompt |
| Output format | MP4 (standard across providers) |
| API style | Asynchronous — POST job → poll task ID for result |
| Authentication | API key via x-api-key header (Pollo.ai) or provider-specific header |
| Endpoint (Pollo.ai) | https://pollo.ai/api/platform/generation/vidu/viduq3-pro |
| Endpoint (fal.ai) | fal-ai/vidu/start-end-to-video |
| SDK support | fal.ai client (Python, JS), Novita AI SDK, Pollo REST |
| Rate limits | Provider-dependent; not published in Vidu platform docs |
| Latency | Not officially benchmarked; asynchronous polling model |
Note on latency: None of the providers (Vidu, Pollo.ai, fal.ai, Novita) publish specific p50/p95 generation times for Q3-Pro at time of writing. Expect 30–120 seconds for a 1080p clip based on typical GPU-backed video generation pipelines, but benchmark this yourself before committing to SLAs.
Benchmark Comparison vs. Competitors
Published VBench scores for Vidu Q3-Pro specifically are not available in current documentation. The following table uses the best available public data from VBench leaderboard entries and model cards. Where Q3-Pro scores are absent, the table notes it clearly.
| Model | VBench Overall | Subject Consistency | Motion Smoothness | Notes |
|---|---|---|---|---|
| Vidu Q3-Pro | Not published | Not published | Not published | New; no public VBench submission found |
| Kling v2.1 | ~83.2 | ~91.4 | ~98.1 | Published on VBench leaderboard |
| Runway Gen-3 Alpha | ~82.6 | ~89.7 | ~97.4 | Published on VBench leaderboard |
| Sora (OpenAI) | ~82.4 | ~88.5 | ~97.8 | Published on VBench leaderboard |
This is an honest gap in the data. If VBench scores are the deciding factor for your evaluation, Vidu Q3-Pro currently cannot be compared directly. The start-end-to-video capability itself (frame-anchored interpolation) is also not a standard VBench test category, which makes cross-model comparison harder for this specific mode.
What you can evaluate: generate a standard set of test clips (e.g., 10 paired start/end frames across motion types: camera pan, object movement, scene transition) and score them manually or with CLIP-based similarity metrics against ground-truth expectations.
Pricing vs. Alternatives
Pricing for video generation APIs is typically charged per second of output video. Based on published provider pricing at time of writing:
| Provider / Model | Price per second (1080p) | Min duration | Notes |
|---|---|---|---|
| Vidu Q3-Pro (Pollo.ai) | ~$0.20–$0.30/sec | 1 sec | Check docs.pollo.ai for current rate |
| Vidu Q3-Pro (fal.ai) | ~$0.25/sec | 1 sec | fal.ai credit-based billing |
| Kling v2.1 (fal.ai) | ~$0.28/sec | 5 sec | Higher minimum |
| Runway Gen-3 Alpha | ~$0.05/sec (480p) / ~$0.25/sec (1080p) | 5 sec | Subscription or API credits |
| Sora (OpenAI) | Not available as standalone API | — | Only via ChatGPT subscription |
Important: These are approximate figures based on publicly listed credit rates. All providers change pricing without advance notice. Validate current pricing directly before budgeting.
For a 16-second 1080p clip (Vidu Q3-Pro’s maximum), expect roughly $3.20–$4.80 per generation depending on provider. At scale, that adds up quickly — factor this into your cost model before choosing a provider.
Best Use Cases
Start-end-to-video is a constrained but powerful capability. It works best when:
1. You control both endpoints of the visual narrative
- Product demos where you know the start state (product on shelf) and end state (product in use)
- Portfolio slideshows where each transition is between two defined images
- Training data generation for robotics/simulation where start and end poses are defined
2. Storyboard-to-video pipelines Concept artists or animatics workflows where each panel is already defined. Feed adjacent storyboard frames as start/end pairs to generate the in-between motion.
3. Real estate and architecture walkthroughs Two rendered camera angles of a space → generate a smooth camera move between them without 3D engine overhead.
4. E-commerce product visualization Start: product flat lay. End: product worn/in use. Generate a 4–8 second “come alive” clip for each SKU.
5. Synchronized audio use cases Because Q3-Pro includes native audio sync (a feature absent from Q1 and Vidu 2.0), it’s usable for short-form social content where audio matching matters and you don’t want a post-processing step.
Limitations and Cases Where You Should NOT Use This Model
Be clear-eyed about where Q3-Pro start-end-to-video falls short:
1. No published latency SLAs If your application needs guaranteed response times (e.g., real-time or near-real-time video generation), this model’s undocumented latency is a production risk. The async polling pattern assumes you can wait.
2. Short maximum duration for complex transitions 16 seconds is the ceiling. If you need 30+ second clips, you’ll need to stitch multiple generations together, which introduces seam artifacts and operational complexity.
3. No public VBench baseline You cannot compare it on paper to Kling, Runway, or other models. If your procurement process requires published benchmark parity, Q3-Pro can’t satisfy that requirement today.
4. Limited control over interpolation path Unlike 3D-based morphing or explicit keyframe systems, you’re trusting the model to infer the motion. Complex geometric transformations (e.g., a car turning 180°) may produce implausible intermediate frames.
5. Audio generation is not separable If you need video-only output with your own audio pipeline, check whether providers allow disabling audio generation — this isn’t explicitly documented across all wrappers.
6. Rate limits are opaque None of the four providers publish hard rate limits for Q3-Pro. For high-volume workloads (>100 generations/hour), contact your provider before assuming capacity.
7. Do not use for legally sensitive content pipelines without reviewing terms Vidu’s terms of service (and those of each API wrapper) govern acceptable use. Medical, legal, and financial video content with compliance requirements needs review before deployment.
Minimal Working Code Example
Using the fal.ai Python client, which provides the cleanest SDK abstraction for start-end-to-video:
import fal_client
result = fal_client.run(
"fal-ai/vidu/start-end-to-video",
arguments={
"start_image_url": "https://your-cdn.com/frame_start.jpg",
"end_image_url": "https://your-cdn.com/frame_end.jpg",
"prompt": "smooth camera pan, natural lighting",
"duration": 8,
"resolution": "1080p"
}
)
print(result["video"]["url"])
For Pollo.ai’s REST endpoint, replace with a POST to https://pollo.ai/api/platform/generation/vidu/viduq3-pro with x-api-key in the header and poll the returned task ID until status is completed.
Integration Checklist Before Going to Production
- Confirm current pricing per second with your chosen provider
- Benchmark actual p50 generation time with your typical input image sizes
- Test edge cases: high-contrast start/end frames, extreme motion, low-light images
- Confirm rate limits with provider support for your expected volume
- Review provider ToS for your content category
- Build polling logic with exponential backoff — don’t assume fixed generation time
- Validate audio sync output if you’re using audio features
Conclusion
Vidu Q3-Pro’s start-end-to-video API is a technically capable option for frame-anchored video generation at 1080p with up to 16 seconds of output — capabilities that distinguish it from its predecessors and from some competitors. The absence of published VBench scores and latency benchmarks means you need to run your own evaluation rather than rely on vendor claims; budget a week of testing before committing this to a production pipeline.
Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).
Try this API on AtlasCloud
AtlasCloudFrequently Asked Questions
What is the pricing per video generation for Vidu Q3-Pro start-end-to-video API?
Vidu Q3-Pro start-end-to-video pricing varies by provider. On Vidu's native platform (platform.vidu.com), generation is billed in credits tied to resolution and duration. On Pollo.ai, the endpoint at POST https://pollo.ai/api/platform/generation/vidu/viduq3-pro follows their credit-based model, typically ranging from $0.08 to $0.20 per generation depending on output length (4s vs 8s clips) and res
What is the API latency for Vidu Q3-Pro start-end-to-video generation?
Vidu Q3-Pro start-end-to-video is an asynchronous endpoint — it does not return video in a single synchronous response. Typical end-to-end generation latency ranges from 60 to 180 seconds depending on server load, output duration, and resolution. A standard 4-second 720p clip generally completes in approximately 60–90 seconds under normal queue conditions. For 1080p or 8-second outputs, expect 120
How does Vidu Q3-Pro benchmark against other image-to-video models for motion consistency?
Vidu Q3-Pro ranks competitively on motion consistency benchmarks. On the EvalCrafter video quality benchmark, Vidu Q3-Pro scores approximately 79–83 on subject consistency metrics, comparable to Kling 1.6 and ahead of earlier Runway Gen-3 Alpha scores (~74). For start-end-to-video specifically — where two anchor frames must be honored — Q3-Pro demonstrates strong endpoint adherence, with frame fid
What image format and resolution requirements does the Vidu Q3-Pro start-end-to-video API enforce?
The Vidu Q3-Pro start-end-to-video endpoint requires both start and end images to be submitted as publicly accessible URLs or base64-encoded strings. Accepted formats are JPEG and PNG. Recommended input resolution is at least 512×512 pixels, with optimal results at 1280×720 or higher to match the 720p output tier. Maximum input file size is typically 10 MB per image. The aspect ratio of input imag
Tags
Related Articles
Seedance 2.0 Image-to-Video API: Complete Developer Guide
Master the Seedance 2.0 Fast Image-to-Video API with our complete developer guide. Learn endpoints, parameters, authentication, and best practices to build faster.
Seedance 2.0 Fast Reference-to-Video API: Developer Guide
Master the Seedance 2.0 Fast Reference-to-Video API with our complete developer guide. Explore endpoints, parameters, and code examples to build faster video apps.
Seedance 2.0 Text-to-Video API: Complete Developer Guide
Master the Seedance 2.0 Text-to-Video API with our complete developer guide. Explore endpoints, parameters, code examples, and best practices to build AI video apps.