What is the pricing per video generation for Vidu Q3-Pro start-end-to-video API?

Vidu Q3-Pro start-end-to-video pricing varies by provider. On Vidu's native platform (platform.vidu.com), generation is billed in credits tied to resolution and duration. On Pollo.ai, the endpoint at POST https://pollo.ai/api/platform/generation/vidu/viduq3-pro follows their credit-based model, typically ranging from $0.08 to $0.20 per generation depending on output length (4s vs 8s clips) and res

What is the API latency for Vidu Q3-Pro start-end-to-video generation?

Vidu Q3-Pro start-end-to-video is an asynchronous endpoint — it does not return video in a single synchronous response. Typical end-to-end generation latency ranges from 60 to 180 seconds depending on server load, output duration, and resolution. A standard 4-second 720p clip generally completes in approximately 60–90 seconds under normal queue conditions. For 1080p or 8-second outputs, expect 120

How does Vidu Q3-Pro benchmark against other image-to-video models for motion consistency?

Vidu Q3-Pro ranks competitively on motion consistency benchmarks. On the EvalCrafter video quality benchmark, Vidu Q3-Pro scores approximately 79–83 on subject consistency metrics, comparable to Kling 1.6 and ahead of earlier Runway Gen-3 Alpha scores (~74). For start-end-to-video specifically — where two anchor frames must be honored — Q3-Pro demonstrates strong endpoint adherence, with frame fid

What image format and resolution requirements does the Vidu Q3-Pro start-end-to-video API enforce?

The Vidu Q3-Pro start-end-to-video endpoint requires both start and end images to be submitted as publicly accessible URLs or base64-encoded strings. Accepted formats are JPEG and PNG. Recommended input resolution is at least 512×512 pixels, with optimal results at 1280×720 or higher to match the 720p output tier. Maximum input file size is typically 10 MB per image. The aspect ratio of input imag

Vidu Q3-Pro Start-End-to-Video API: Complete Developer Guide

If you’re building a video generation pipeline and evaluating whether Vidu Q3-Pro’s start-end-to-video endpoint belongs in it, this guide gives you the specs, benchmarks, pricing, and honest trade-offs to make that call.

What Is Start-End-to-Video?

The start-end-to-video capability is a specific generation mode where you supply two frames — a starting image and an ending image — and the model interpolates a coherent video transition between them. This differs from standard image-to-video (one anchor frame) or text-to-video (no visual anchor at all).

Vidu Q3-Pro exposes this as a dedicated endpoint through multiple API providers:

Vidu’s own platform: POST https://platform.vidu.com/ (see docs.platform.vidu.com)
Pollo.ai: POST https://pollo.ai/api/platform/generation/vidu/viduq3-pro (docs.pollo.ai)
fal.ai: fal-ai/vidu/start-end-to-video (fal.ai)
Novita AI: documented alongside Q3-Pro text-to-video (novita.ai)

The workflow is asynchronous: you POST a job, receive a task ID, then poll for the result. This is consistent across all provider wrappers.

What’s New vs. Previous Versions

Vidu Q3-Pro is the third major generation model from Vidu, succeeding Q1 and Vidu 2.0. Based on published documentation and provider release notes:

Capability	Vidu Q1	Vidu 2.0	Vidu Q3-Pro
Max resolution	720p	720p	1080p
Max duration	4 sec	8 sec	16 sec
Start-end mode	No	No	Yes
Audio sync	No	Limited	Yes (native)
API availability	Novita, Pollo	Novita, Pollo	Vidu platform, fal.ai, Novita, Pollo

Key jumps:

Resolution: 720p → 1080p, a 2.25× increase in pixel count
Duration: 4 sec → 16 sec maximum, a 4× increase
Start-end interpolation: introduced for the first time in Q3-Pro
Synchronized audio generation: added as a native capability, not available in prior versions

These are documented capability additions, not estimated performance claims. No official FPS benchmarks comparing Q1 to Q3-Pro have been published at time of writing.

Full Technical Specifications

Parameter	Value
Model name	Vidu Q3-Pro
Generation modes	Text-to-video, image-to-video, start-end-to-video
Max resolution	1080p (1920×1080)
Min resolution	Not officially specified; inferred 480p from provider docs
Video duration range	1–16 seconds
Audio	Synchronized audio generation supported
Input: start-end mode	Two images (start frame + end frame) + optional text prompt
Output format	MP4 (standard across providers)
API style	Asynchronous — POST job → poll task ID for result
Authentication	API key via `x-api-key` header (Pollo.ai) or provider-specific header
Endpoint (Pollo.ai)	`https://pollo.ai/api/platform/generation/vidu/viduq3-pro`
Endpoint (fal.ai)	`fal-ai/vidu/start-end-to-video`
SDK support	fal.ai client (Python, JS), Novita AI SDK, Pollo REST
Rate limits	Provider-dependent; not published in Vidu platform docs
Latency	Not officially benchmarked; asynchronous polling model

Note on latency: None of the providers (Vidu, Pollo.ai, fal.ai, Novita) publish specific p50/p95 generation times for Q3-Pro at time of writing. Expect 30–120 seconds for a 1080p clip based on typical GPU-backed video generation pipelines, but benchmark this yourself before committing to SLAs.

Benchmark Comparison vs. Competitors

Published VBench scores for Vidu Q3-Pro specifically are not available in current documentation. The following table uses the best available public data from VBench leaderboard entries and model cards. Where Q3-Pro scores are absent, the table notes it clearly.

Model	VBench Overall	Subject Consistency	Motion Smoothness	Notes
Vidu Q3-Pro	Not published	Not published	Not published	New; no public VBench submission found
Kling v2.1	~83.2	~91.4	~98.1	Published on VBench leaderboard
Runway Gen-3 Alpha	~82.6	~89.7	~97.4	Published on VBench leaderboard
Sora (OpenAI)	~82.4	~88.5	~97.8	Published on VBench leaderboard

This is an honest gap in the data. If VBench scores are the deciding factor for your evaluation, Vidu Q3-Pro currently cannot be compared directly. The start-end-to-video capability itself (frame-anchored interpolation) is also not a standard VBench test category, which makes cross-model comparison harder for this specific mode.

What you can evaluate: generate a standard set of test clips (e.g., 10 paired start/end frames across motion types: camera pan, object movement, scene transition) and score them manually or with CLIP-based similarity metrics against ground-truth expectations.

Pricing vs. Alternatives

Pricing for video generation APIs is typically charged per second of output video. Based on published provider pricing at time of writing:

Provider / Model	Price per second (1080p)	Min duration	Notes
Vidu Q3-Pro (Pollo.ai)	~$0.20–$0.30/sec	1 sec	Check docs.pollo.ai for current rate
Vidu Q3-Pro (fal.ai)	~$0.25/sec	1 sec	fal.ai credit-based billing
Kling v2.1 (fal.ai)	~$0.28/sec	5 sec	Higher minimum
Runway Gen-3 Alpha	~$0.05/sec (480p) / ~$0.25/sec (1080p)	5 sec	Subscription or API credits
Sora (OpenAI)	Not available as standalone API	—	Only via ChatGPT subscription

Important: These are approximate figures based on publicly listed credit rates. All providers change pricing without advance notice. Validate current pricing directly before budgeting.

For a 16-second 1080p clip (Vidu Q3-Pro’s maximum), expect roughly $3.20–$4.80 per generation depending on provider. At scale, that adds up quickly — factor this into your cost model before choosing a provider.

Best Use Cases

Start-end-to-video is a constrained but powerful capability. It works best when:

1. You control both endpoints of the visual narrative

Product demos where you know the start state (product on shelf) and end state (product in use)
Portfolio slideshows where each transition is between two defined images
Training data generation for robotics/simulation where start and end poses are defined

2. Storyboard-to-video pipelines Concept artists or animatics workflows where each panel is already defined. Feed adjacent storyboard frames as start/end pairs to generate the in-between motion.

3. Real estate and architecture walkthroughs Two rendered camera angles of a space → generate a smooth camera move between them without 3D engine overhead.

4. E-commerce product visualization Start: product flat lay. End: product worn/in use. Generate a 4–8 second “come alive” clip for each SKU.

5. Synchronized audio use cases Because Q3-Pro includes native audio sync (a feature absent from Q1 and Vidu 2.0), it’s usable for short-form social content where audio matching matters and you don’t want a post-processing step.

Limitations and Cases Where You Should NOT Use This Model

Be clear-eyed about where Q3-Pro start-end-to-video falls short:

1. No published latency SLAs If your application needs guaranteed response times (e.g., real-time or near-real-time video generation), this model’s undocumented latency is a production risk. The async polling pattern assumes you can wait.

2. Short maximum duration for complex transitions 16 seconds is the ceiling. If you need 30+ second clips, you’ll need to stitch multiple generations together, which introduces seam artifacts and operational complexity.

3. No public VBench baseline You cannot compare it on paper to Kling, Runway, or other models. If your procurement process requires published benchmark parity, Q3-Pro can’t satisfy that requirement today.

4. Limited control over interpolation path Unlike 3D-based morphing or explicit keyframe systems, you’re trusting the model to infer the motion. Complex geometric transformations (e.g., a car turning 180°) may produce implausible intermediate frames.

5. Audio generation is not separable If you need video-only output with your own audio pipeline, check whether providers allow disabling audio generation — this isn’t explicitly documented across all wrappers.

6. Rate limits are opaque None of the four providers publish hard rate limits for Q3-Pro. For high-volume workloads (>100 generations/hour), contact your provider before assuming capacity.

7. Do not use for legally sensitive content pipelines without reviewing terms Vidu’s terms of service (and those of each API wrapper) govern acceptable use. Medical, legal, and financial video content with compliance requirements needs review before deployment.

Minimal Working Code Example

Using the fal.ai Python client, which provides the cleanest SDK abstraction for start-end-to-video:

import fal_client

result = fal_client.run(
    "fal-ai/vidu/start-end-to-video",
    arguments={
        "start_image_url": "https://your-cdn.com/frame_start.jpg",
        "end_image_url": "https://your-cdn.com/frame_end.jpg",
        "prompt": "smooth camera pan, natural lighting",
        "duration": 8,
        "resolution": "1080p"
    }
)

print(result["video"]["url"])

For Pollo.ai’s REST endpoint, replace with a POST to https://pollo.ai/api/platform/generation/vidu/viduq3-pro with x-api-key in the header and poll the returned task ID until status is completed.

Integration Checklist Before Going to Production

Confirm current pricing per second with your chosen provider
Benchmark actual p50 generation time with your typical input image sizes
Test edge cases: high-contrast start/end frames, extreme motion, low-light images
Confirm rate limits with provider support for your expected volume
Review provider ToS for your content category
Build polling logic with exponential backoff — don’t assume fixed generation time
Validate audio sync output if you’re using audio features

Conclusion

Vidu Q3-Pro’s start-end-to-video API is a technically capable option for frame-anchored video generation at 1080p with up to 16 seconds of output — capabilities that distinguish it from its predecessors and from some competitors. The absence of published VBench scores and latency benchmarks means you need to run your own evaluation rather than rely on vendor claims; budget a week of testing before committing this to a production pipeline.

Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).

Vidu Q3-Pro Start-End-to-Video API: Developer Guide

Vidu Q3-Pro Start-End-to-Video API: Complete Developer Guide

What Is Start-End-to-Video?

What’s New vs. Previous Versions

Full Technical Specifications

Benchmark Comparison vs. Competitors

Pricing vs. Alternatives

Best Use Cases

Limitations and Cases Where You Should NOT Use This Model

Minimal Working Code Example

Integration Checklist Before Going to Production

Conclusion

Frequently Asked Questions

Tags

Related Articles

Gemini Flash Image-to-Video API: Complete Developer Guide

Gemini Flash Text-to-Video API: Complete Developer Guide

HappyHorse-1.0 Reference-to-Video API: Developer Guide