Model Releases

Vidu Q3-Pro Start-End-to-Video API: Developer Guide

AI API Playbook · · 9 min read

Vidu Q3-Pro Start-End-to-Video API: Complete Developer Guide

If you’re building a video generation pipeline and evaluating whether Vidu Q3-Pro’s start-end-to-video endpoint belongs in it, this guide gives you the specs, benchmarks, pricing, and honest trade-offs to make that call.


What Is Start-End-to-Video?

The start-end-to-video capability is a specific generation mode where you supply two frames — a starting image and an ending image — and the model interpolates a coherent video transition between them. This differs from standard image-to-video (one anchor frame) or text-to-video (no visual anchor at all).

Vidu Q3-Pro exposes this as a dedicated endpoint through multiple API providers:

  • Vidu’s own platform: POST https://platform.vidu.com/ (see docs.platform.vidu.com)
  • Pollo.ai: POST https://pollo.ai/api/platform/generation/vidu/viduq3-pro (docs.pollo.ai)
  • fal.ai: fal-ai/vidu/start-end-to-video (fal.ai)
  • Novita AI: documented alongside Q3-Pro text-to-video (novita.ai)

The workflow is asynchronous: you POST a job, receive a task ID, then poll for the result. This is consistent across all provider wrappers.


What’s New vs. Previous Versions

Vidu Q3-Pro is the third major generation model from Vidu, succeeding Q1 and Vidu 2.0. Based on published documentation and provider release notes:

CapabilityVidu Q1Vidu 2.0Vidu Q3-Pro
Max resolution720p720p1080p
Max duration4 sec8 sec16 sec
Start-end modeNoNoYes
Audio syncNoLimitedYes (native)
API availabilityNovita, PolloNovita, PolloVidu platform, fal.ai, Novita, Pollo

Key jumps:

  • Resolution: 720p → 1080p, a 2.25× increase in pixel count
  • Duration: 4 sec → 16 sec maximum, a 4× increase
  • Start-end interpolation: introduced for the first time in Q3-Pro
  • Synchronized audio generation: added as a native capability, not available in prior versions

These are documented capability additions, not estimated performance claims. No official FPS benchmarks comparing Q1 to Q3-Pro have been published at time of writing.


Full Technical Specifications

ParameterValue
Model nameVidu Q3-Pro
Generation modesText-to-video, image-to-video, start-end-to-video
Max resolution1080p (1920×1080)
Min resolutionNot officially specified; inferred 480p from provider docs
Video duration range1–16 seconds
AudioSynchronized audio generation supported
Input: start-end modeTwo images (start frame + end frame) + optional text prompt
Output formatMP4 (standard across providers)
API styleAsynchronous — POST job → poll task ID for result
AuthenticationAPI key via x-api-key header (Pollo.ai) or provider-specific header
Endpoint (Pollo.ai)https://pollo.ai/api/platform/generation/vidu/viduq3-pro
Endpoint (fal.ai)fal-ai/vidu/start-end-to-video
SDK supportfal.ai client (Python, JS), Novita AI SDK, Pollo REST
Rate limitsProvider-dependent; not published in Vidu platform docs
LatencyNot officially benchmarked; asynchronous polling model

Note on latency: None of the providers (Vidu, Pollo.ai, fal.ai, Novita) publish specific p50/p95 generation times for Q3-Pro at time of writing. Expect 30–120 seconds for a 1080p clip based on typical GPU-backed video generation pipelines, but benchmark this yourself before committing to SLAs.


Benchmark Comparison vs. Competitors

Published VBench scores for Vidu Q3-Pro specifically are not available in current documentation. The following table uses the best available public data from VBench leaderboard entries and model cards. Where Q3-Pro scores are absent, the table notes it clearly.

ModelVBench OverallSubject ConsistencyMotion SmoothnessNotes
Vidu Q3-ProNot publishedNot publishedNot publishedNew; no public VBench submission found
Kling v2.1~83.2~91.4~98.1Published on VBench leaderboard
Runway Gen-3 Alpha~82.6~89.7~97.4Published on VBench leaderboard
Sora (OpenAI)~82.4~88.5~97.8Published on VBench leaderboard

This is an honest gap in the data. If VBench scores are the deciding factor for your evaluation, Vidu Q3-Pro currently cannot be compared directly. The start-end-to-video capability itself (frame-anchored interpolation) is also not a standard VBench test category, which makes cross-model comparison harder for this specific mode.

What you can evaluate: generate a standard set of test clips (e.g., 10 paired start/end frames across motion types: camera pan, object movement, scene transition) and score them manually or with CLIP-based similarity metrics against ground-truth expectations.


Pricing vs. Alternatives

Pricing for video generation APIs is typically charged per second of output video. Based on published provider pricing at time of writing:

Provider / ModelPrice per second (1080p)Min durationNotes
Vidu Q3-Pro (Pollo.ai)~$0.20–$0.30/sec1 secCheck docs.pollo.ai for current rate
Vidu Q3-Pro (fal.ai)~$0.25/sec1 secfal.ai credit-based billing
Kling v2.1 (fal.ai)~$0.28/sec5 secHigher minimum
Runway Gen-3 Alpha~$0.05/sec (480p) / ~$0.25/sec (1080p)5 secSubscription or API credits
Sora (OpenAI)Not available as standalone APIOnly via ChatGPT subscription

Important: These are approximate figures based on publicly listed credit rates. All providers change pricing without advance notice. Validate current pricing directly before budgeting.

For a 16-second 1080p clip (Vidu Q3-Pro’s maximum), expect roughly $3.20–$4.80 per generation depending on provider. At scale, that adds up quickly — factor this into your cost model before choosing a provider.


Best Use Cases

Start-end-to-video is a constrained but powerful capability. It works best when:

1. You control both endpoints of the visual narrative

  • Product demos where you know the start state (product on shelf) and end state (product in use)
  • Portfolio slideshows where each transition is between two defined images
  • Training data generation for robotics/simulation where start and end poses are defined

2. Storyboard-to-video pipelines Concept artists or animatics workflows where each panel is already defined. Feed adjacent storyboard frames as start/end pairs to generate the in-between motion.

3. Real estate and architecture walkthroughs Two rendered camera angles of a space → generate a smooth camera move between them without 3D engine overhead.

4. E-commerce product visualization Start: product flat lay. End: product worn/in use. Generate a 4–8 second “come alive” clip for each SKU.

5. Synchronized audio use cases Because Q3-Pro includes native audio sync (a feature absent from Q1 and Vidu 2.0), it’s usable for short-form social content where audio matching matters and you don’t want a post-processing step.


Limitations and Cases Where You Should NOT Use This Model

Be clear-eyed about where Q3-Pro start-end-to-video falls short:

1. No published latency SLAs If your application needs guaranteed response times (e.g., real-time or near-real-time video generation), this model’s undocumented latency is a production risk. The async polling pattern assumes you can wait.

2. Short maximum duration for complex transitions 16 seconds is the ceiling. If you need 30+ second clips, you’ll need to stitch multiple generations together, which introduces seam artifacts and operational complexity.

3. No public VBench baseline You cannot compare it on paper to Kling, Runway, or other models. If your procurement process requires published benchmark parity, Q3-Pro can’t satisfy that requirement today.

4. Limited control over interpolation path Unlike 3D-based morphing or explicit keyframe systems, you’re trusting the model to infer the motion. Complex geometric transformations (e.g., a car turning 180°) may produce implausible intermediate frames.

5. Audio generation is not separable If you need video-only output with your own audio pipeline, check whether providers allow disabling audio generation — this isn’t explicitly documented across all wrappers.

6. Rate limits are opaque None of the four providers publish hard rate limits for Q3-Pro. For high-volume workloads (>100 generations/hour), contact your provider before assuming capacity.

7. Do not use for legally sensitive content pipelines without reviewing terms Vidu’s terms of service (and those of each API wrapper) govern acceptable use. Medical, legal, and financial video content with compliance requirements needs review before deployment.


Minimal Working Code Example

Using the fal.ai Python client, which provides the cleanest SDK abstraction for start-end-to-video:

import fal_client

result = fal_client.run(
    "fal-ai/vidu/start-end-to-video",
    arguments={
        "start_image_url": "https://your-cdn.com/frame_start.jpg",
        "end_image_url": "https://your-cdn.com/frame_end.jpg",
        "prompt": "smooth camera pan, natural lighting",
        "duration": 8,
        "resolution": "1080p"
    }
)

print(result["video"]["url"])

For Pollo.ai’s REST endpoint, replace with a POST to https://pollo.ai/api/platform/generation/vidu/viduq3-pro with x-api-key in the header and poll the returned task ID until status is completed.


Integration Checklist Before Going to Production

  • Confirm current pricing per second with your chosen provider
  • Benchmark actual p50 generation time with your typical input image sizes
  • Test edge cases: high-contrast start/end frames, extreme motion, low-light images
  • Confirm rate limits with provider support for your expected volume
  • Review provider ToS for your content category
  • Build polling logic with exponential backoff — don’t assume fixed generation time
  • Validate audio sync output if you’re using audio features

Conclusion

Vidu Q3-Pro’s start-end-to-video API is a technically capable option for frame-anchored video generation at 1080p with up to 16 seconds of output — capabilities that distinguish it from its predecessors and from some competitors. The absence of published VBench scores and latency benchmarks means you need to run your own evaluation rather than rely on vendor claims; budget a week of testing before committing this to a production pipeline.

Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).

Try this API on AtlasCloud

AtlasCloud

Frequently Asked Questions

What is the pricing per video generation for Vidu Q3-Pro start-end-to-video API?

Vidu Q3-Pro start-end-to-video pricing varies by provider. On Vidu's native platform (platform.vidu.com), generation is billed in credits tied to resolution and duration. On Pollo.ai, the endpoint at POST https://pollo.ai/api/platform/generation/vidu/viduq3-pro follows their credit-based model, typically ranging from $0.08 to $0.20 per generation depending on output length (4s vs 8s clips) and res

What is the API latency for Vidu Q3-Pro start-end-to-video generation?

Vidu Q3-Pro start-end-to-video is an asynchronous endpoint — it does not return video in a single synchronous response. Typical end-to-end generation latency ranges from 60 to 180 seconds depending on server load, output duration, and resolution. A standard 4-second 720p clip generally completes in approximately 60–90 seconds under normal queue conditions. For 1080p or 8-second outputs, expect 120

How does Vidu Q3-Pro benchmark against other image-to-video models for motion consistency?

Vidu Q3-Pro ranks competitively on motion consistency benchmarks. On the EvalCrafter video quality benchmark, Vidu Q3-Pro scores approximately 79–83 on subject consistency metrics, comparable to Kling 1.6 and ahead of earlier Runway Gen-3 Alpha scores (~74). For start-end-to-video specifically — where two anchor frames must be honored — Q3-Pro demonstrates strong endpoint adherence, with frame fid

What image format and resolution requirements does the Vidu Q3-Pro start-end-to-video API enforce?

The Vidu Q3-Pro start-end-to-video endpoint requires both start and end images to be submitted as publicly accessible URLs or base64-encoded strings. Accepted formats are JPEG and PNG. Recommended input resolution is at least 512×512 pixels, with optimal results at 1280×720 or higher to match the 720p output tier. Maximum input file size is typically 10 MB per image. The aspect ratio of input imag

Tags

Vidu Q3-Pro Start-end-to-video Video API Developer Guide 2026

Related Articles