Kling v3 API Python Tutorial: Complete Guide 2026
---
title: "How to Use Kling v3 API: Complete Python Tutorial 2026"
description: "Step-by-step Python tutorial for Kling v3 API integration. Working code, real endpoints, error handling, and production patterns."
date: 2026-02-28
author: "aiapiplaybook.com"
tags: ["kling api", "kling v3", "python tutorial", "ai video generation", "api integration"]
---
How to Use Kling v3 API: Complete Python Tutorial 2026
3 numbers to know before you start:
- ~90 seconds median generation latency for a 5-second 720p clip via the v3 API (async polling model)
- ~$0.14–$0.28 per video clip depending on resolution and duration tier (based on Kling’s published credit pricing)
- 4K output support — Kling v3 (released February 4, 2026) is the first Kling model to offer native 4K video generation
Kling v3 added native audio generation, multi-shot scene control, and character consistency features on top of the image-to-video and text-to-video foundations from v1/v2. This tutorial covers the API integration path specifically — not the web UI. If you want working Python that handles authentication, job submission, polling, and error recovery, this is the guide.
Prerequisites
Accounts and API Access
- Kling API account — Register at klingai.com and navigate to the developer/API section. API access is separate from the web UI subscription.
- API Key — Generate from the API dashboard. Store it as an environment variable immediately; never hardcode it.
- Credits — Purchase credits before testing. Free tier exists but has strict rate limits (typically 3 concurrent jobs max).
Python Environment
Tested on Python 3.10+. The code in this tutorial does not use the Kling SDK (no official SDK existed at time of writing) — it uses httpx for async HTTP and python-dotenv for environment management.
# Install required packages
pip install httpx python-dotenv pydantic
# For async patterns (already in stdlib for Python 3.10+)
# asyncio is built-in, no install needed
Environment Setup
# Create a .env file in your project root
touch .env
echo "KLING_API_KEY=your_api_key_here" >> .env
echo "KLING_API_BASE_URL=https://api.klingai.com" >> .env
Verify your Python version:
python --version # Must be 3.10 or higher
python -c "import httpx, dotenv, pydantic; print('All dependencies OK')"
Authentication and Client Setup
Kling v3 API uses Bearer token authentication. Every request requires the Authorization: Bearer <API_KEY> header. There is no OAuth flow for standard API access — your API key is your credential.
# kling_client.py
# Base client setup — reuse this across all your Kling API calls
import os
import httpx
from dotenv import load_dotenv
load_dotenv() # Load .env file into environment
KLING_API_KEY = os.getenv("KLING_API_KEY")
KLING_BASE_URL = os.getenv("KLING_API_BASE_URL", "https://api.klingai.com")
if not KLING_API_KEY:
raise EnvironmentError("KLING_API_KEY not set. Check your .env file.")
# Build a reusable httpx client with auth headers baked in
# Using a persistent client avoids TCP connection overhead on every request
client = httpx.Client(
base_url=KLING_BASE_URL,
headers={
"Authorization": f"Bearer {KLING_API_KEY}",
"Content-Type": "application/json",
"Accept": "application/json",
},
timeout=30.0, # 30s is enough for API calls; video generation is async anyway
)
def check_auth() -> dict:
"""
Hit the account info endpoint to verify the API key works
before you burn credits on actual generation calls.
"""
response = client.get("/v1/account/info")
response.raise_for_status() # Raises HTTPStatusError on 4xx/5xx
return response.json()
if __name__ == "__main__":
info = check_auth()
print(f"Authenticated. Credits remaining: {info.get('credits', 'N/A')}")
Run this standalone to confirm auth works before going further:
python kling_client.py
# Expected: Authenticated. Credits remaining: <your_balance>
Core Implementation
Basic Text-to-Video Request
Kling v3’s API is asynchronous — you submit a job, get a task_id, then poll until the job completes. There is no streaming endpoint for video.
# basic_t2v.py
# Minimal text-to-video job submission and retrieval
# This pattern is the foundation for everything else in this tutorial
import time
import httpx
from kling_client import client # Import the client we set up above
def submit_text_to_video(
prompt: str,
model: str = "kling-v3",
duration: int = 5, # seconds: 5 or 10
aspect_ratio: str = "16:9", # "16:9", "9:16", "1:1"
resolution: str = "720p", # "720p", "1080p", "4k"
negative_prompt: str = "",
) -> str:
"""
Submit a text-to-video job. Returns task_id (str).
Why return task_id and not the video? Because Kling generates
videos async — the initial response is never the finished video.
"""
payload = {
"model": model,
"prompt": prompt,
"negative_prompt": negative_prompt,
"duration": duration,
"aspect_ratio": aspect_ratio,
"resolution": resolution,
}
response = client.post("/v1/videos/text2video", json=payload)
response.raise_for_status()
data = response.json()
task_id = data["task_id"]
print(f"Job submitted. task_id: {task_id}")
return task_id
def poll_video_status(task_id: str, poll_interval: int = 10, max_wait: int = 600) -> dict:
"""
Poll Kling API until the video job completes or fails.
poll_interval=10s is the recommended minimum — polling faster
doesn't speed up generation and may trigger rate limits.
max_wait=600s (10 min) is conservative; most 5s clips finish in 60-120s.
"""
elapsed = 0
while elapsed < max_wait:
response = client.get(f"/v1/videos/{task_id}")
response.raise_for_status()
data = response.json()
status = data.get("status")
print(f"[{elapsed}s] Status: {status}")
if status == "completed":
return data # Contains video_url, metadata, etc.
if status == "failed":
error_msg = data.get("error", {}).get("message", "Unknown error")
raise RuntimeError(f"Video generation failed: {error_msg}")
# status == "pending" or "processing" — keep waiting
time.sleep(poll_interval)
elapsed += poll_interval
raise TimeoutError(f"Job {task_id} did not complete within {max_wait}s")
if __name__ == "__main__":
task_id = submit_text_to_video(
prompt="A red fox running through a snowy forest at dusk, cinematic wide shot",
resolution="720p",
duration=5,
)
result = poll_video_status(task_id)
video_url = result["video"]["url"]
print(f"Video ready: {video_url}")
Production-Ready Implementation
The basic version above works but lacks retry logic, async support, and proper error categorization. Here’s a production pattern:
# kling_production.py
# Production-grade Kling v3 client with async support,
# exponential backoff, and structured error handling
import asyncio
import time
import logging
from dataclasses import dataclass
from enum import Enum
from typing import Optional
import httpx
from dotenv import load_dotenv
import os
load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("kling_api")
KLING_API_KEY = os.getenv("KLING_API_KEY")
KLING_BASE_URL = os.getenv("KLING_API_BASE_URL", "https://api.klingai.com")
class VideoResolution(str, Enum):
P720 = "720p"
P1080 = "1080p"
K4 = "4k" # Kling v3 only
class AspectRatio(str, Enum):
WIDESCREEN = "16:9"
PORTRAIT = "9:16"
SQUARE = "1:1"
@dataclass
class VideoJobConfig:
"""
Typed config for a video generation job.
Using a dataclass here instead of raw dicts prevents
silent errors from typos in parameter names.
"""
prompt: str
model: str = "kling-v3"
duration: int = 5 # 5 or 10 seconds
resolution: VideoResolution = VideoResolution.P720
aspect_ratio: AspectRatio = AspectRatio.WIDESCREEN
negative_prompt: str = ""
cfg_scale: float = 0.5 # Prompt adherence: 0.0–1.0
image_url: Optional[str] = None # For image-to-video; None = text-to-video
audio_enabled: bool = False # Kling v3 native audio feature
class KlingAPIError(Exception):
"""Base exception for Kling API errors."""
def __init__(self, message: str, status_code: int = None, error_code: str = None):
super().__init__(message)
self.status_code = status_code
self.error_code = error_code
class KlingRateLimitError(KlingAPIError):
pass
class KlingInsufficientCreditsError(KlingAPIError):
pass
class KlingContentPolicyError(KlingAPIError):
pass
class KlingClient:
def __init__(self):
self.base_url = KLING_BASE_URL
self.headers = {
"Authorization": f"Bearer {KLING_API_KEY}",
"Content-Type": "application/json",
}
# AsyncClient is more efficient when running multiple concurrent jobs
self._async_client = httpx.AsyncClient(
base_url=self.base_url,
headers=self.headers,
timeout=30.0,
)
def _raise_for_kling_error(self, response: httpx.Response) -> None:
"""
Kling returns structured errors in the response body even on 4xx.
Parse them into specific exceptions so callers can handle them differently.
"""
if response.status_code == 200:
return
try:
error_data = response.json().get("error", {})
error_code = error_data.get("code", "UNKNOWN")
error_msg = error_data.get("message", response.text)
except Exception:
error_code = "PARSE_ERROR"
error_msg = response.text
if response.status_code == 429:
raise KlingRateLimitError(
f"Rate limit hit: {error_msg}", 429, error_code
)
if response.status_code == 402:
raise KlingInsufficientCreditsError(
f"Insufficient credits: {error_msg}", 402, error_code
)
if error_code == "CONTENT_POLICY_VIOLATION":
raise KlingContentPolicyError(
f"Content policy: {error_msg}", response.status_code, error_code
)
raise KlingAPIError(error_msg, response.status_code, error_code)
async def submit_job(self, config: VideoJobConfig) -> str:
"""Submit a video generation job. Returns task_id."""
# Route to i2v or t2v endpoint based on whether an image is provided
endpoint = "/v1/videos/image2video" if config.image_url else "/v1/videos/text2video"
payload = {
"model": config.model,
"prompt": config.prompt,
"negative_prompt": config.negative_prompt,
"duration": config.duration,
"resolution": config.resolution.value,
"aspect_ratio": config.aspect_ratio.value,
"cfg_scale": config.cfg_scale,
"audio_enabled": config.audio_enabled,
}
if config.image_url:
payload["image_url"] = config.image_url
response = await self._async_client.post(endpoint, json=payload)
self._raise_for_kling_error(response)
task_id = response.json()["task_id"]
logger.info(f"Job submitted: {task_id} | model={config.model} | res={config.resolution.value}")
return task_id
async def poll_until_done(
self,
task_id: str,
poll_interval: float = 10.0,
max_wait: float = 600.0,
) -> dict:
"""
Async polling with exponential backoff on rate limit errors.
Uses asyncio.sleep instead of time.sleep so other async tasks
can run concurrently during the wait.
"""
elapsed = 0.0
backoff = poll_interval
while elapsed < max_wait:
try:
response = await self._async_client.get(f"/v1/videos/{task_id}")
self._raise_for_kling_error(response)
data = response.json()
status = data.get("status")
logger.info(f"[{elapsed:.0f}s] {task_id}: {status}")
if status == "completed":
return data
if status == "failed":
raise KlingAPIError(
f"Job failed: {data.get('error', {}).get('message', 'unknown')}",
error_code=data.get("error", {}).get("code"),
)
# Still processing — wait and try again
await asyncio.sleep(backoff)
elapsed += backoff
backoff = min(backoff, 30.0) # Cap backoff at 30s
except KlingRateLimitError:
# Back off aggressively on rate limits — don't just retry immediately
backoff = min(backoff * 2, 60.0)
logger.warning(f"Rate limited. Backing off to {backoff}s")
await asyncio.sleep(backoff)
elapsed += backoff
raise TimeoutError(f"Job {task_id} did not complete within {max_wait}s")
async def generate_video(self, config: VideoJobConfig) -> str:
"""End-to-end: submit job, wait, return video URL."""
task_id = await self.submit_job(config)
result = await self.poll_until_done(task_id)
video_url = result["video"]["url"]
logger.info(f"Video ready: {video_url}")
return video_url
async def close(self):
await self._async_client.aclose()
# Usage example — run multiple jobs concurrently
async def main():
client = KlingClient()
try:
jobs = [
VideoJobConfig(
prompt="Aerial view of Tokyo at night, neon lights reflecting on wet streets",
resolution=VideoResolution.P1080,
duration=5,
),
VideoJobConfig(
prompt="Close-up of a hummingbird drinking from a red flower, slow motion",
resolution=VideoResolution.P720,
duration=5,
audio_enabled=True, # Kling v3 native audio
),
]
# Submit and poll both jobs concurrently — much faster than sequential
results = await asyncio.gather(*[client.generate_video(j) for j in jobs])
for i, url in enumerate(results):
print(f"Job {i+1}: {url}")
finally:
await client.close()
if __name__ == "__main__":
asyncio.run(main())
API Parameters Reference
| Parameter | Type | Default | Valid Range | What It Affects |
|---|---|---|---|---|
model | string | "kling-v3" | "kling-v1", "kling-v1-5", "kling-v3" | Model version; v3 required for 4K and native audio |
prompt | string | — (required) | 1–2500 chars | Primary generation instruction |
negative_prompt | string | "" | 0–1000 chars | Elements to exclude from output |
duration | integer | 5 | 5, 10 | Video length in seconds; 10s costs ~2× credits |
resolution | string | "720p" | "720p", "1080p", "4k" | Output resolution; 4K only on kling-v3 |
aspect_ratio | string | "16:9" | "16:9", "9:16", "1:1" | Frame dimensions; affects composition |
cfg_scale | float | 0.5 | 0.0–1.0 | Prompt adherence vs. creative freedom; lower = more variation |
image_url | string | null | Valid HTTPS URL | Source image for image-to-video; triggers i2v endpoint |
audio_enabled | boolean | false | true, false | Enables native audio generation (v3 only) |
camera_control | object | null | See docs for schema | Camera movement presets (zoom, pan, orbit) |
character_ref | string | null | task_id of a character job | Character consistency across clips |
Notes:
4kresolution is only accepted whenmodeliskling-v3; passing it with v1 or v1-5 returns a400 INVALID_PARAMETERerroraudio_enabled=trueadds approximately 15–25% to generation time in practicecfg_scalevalues below0.3tend to produce inconsistent results with complex prompts
Error Handling
Kling’s API returns structured JSON errors. Handle them at the exception level, not by string-matching error messages.
| HTTP Status | Error Code | Cause | Fix |
|---|---|---|---|
400 | INVALID_PARAMETER | Bad parameter value (e.g., 4K on v1 model) | Check parameter table above; validate before sending |
401 | UNAUTHORIZED | Missing or invalid API key | Verify KLING_API_KEY env var; regenerate key if needed |
402 | INSUFFICIENT_CREDITS | Account balance too low | Top up credits in dashboard |
422 | CONTENT_POLICY_VIOLATION | Prompt triggered content filter | Revise prompt; avoid restricted content categories |
429 | RATE_LIMIT_EXCEEDED | Too many requests per minute | Back off exponentially; free tier: 3 RPM, paid: varies by plan |
500 | INTERNAL_ERROR | Server-side failure | Retry with backoff; report to Kling support if persistent |
503 | MODEL_OVERLOADED | High server load | Retry after 30–60s; peak hours (US/EU daytime) hit this most |
# error_handling_example.py
# Shows how to catch and handle each Kling error type distinctly
import asyncio
from kling_production import KlingClient, VideoJobConfig, VideoResolution
from kling_production import (
KlingAPIError,
KlingRateLimitError,
KlingInsufficientCreditsError,
KlingContentPolicyError,
)
async def safe_generate(config: VideoJobConfig) -> str | None:
"""
Wrapper that catches known errors and returns None on non-retryable failures.
Returns video URL on success.
"""
client = KlingClient()
try:
url = await client.generate_video(config)
return url
except KlingInsufficientCreditsError:
# Non-retryable — no point retrying until user tops up
print("ERROR: Out of credits. Add funds at klingai.com/dashboard")
return None
except KlingContentPolicyError as e:
# Non-retryable — the prompt itself is the problem
print(f"ERROR: Prompt rejected by content policy [{e.error_code}]")
print("Revise your prompt and try again.")
return None
except KlingRateLimitError:
# The production client already backs off internally,
# but if we still hit this here, the job exceeded max retries
print("ERROR: Rate limit exceeded after retries. Queuing for later.")
return None
except TimeoutError:
# Job was submitted but didn't finish in time
# The job may still complete — check task_id status manually
print("ERROR: Job timed out. Check dashboard for status.")
return None
except KlingAPIError as e:
# Catch-all for unexpected API errors
print(f"ERROR: API error {e.status_code} [{e.error_code}]: {e}")
return None
finally:
await client.close()
if __name__ == "__main__":
config = VideoJobConfig(
prompt="A calm ocean at sunrise with gentle waves",
resolution=VideoResolution.P720,
)
result = asyncio.run(safe_generate(config))
if result:
print(f"Success: {result}")
Performance and Cost Reference
Cost and timing benchmarks based on published Kling credit pricing and community-reported timing data (as of February 2026). Credit-to-USD conversion assumes $0.01 per credit at standard pricing.
| Configuration | Credits per Job | Approx. USD | Median Latency | Notes |
|---|---|---|---|---|
| kling-v3, 720p, 5s | 10 credits | ~$0.10 | 60–90s | Standard tier; fastest option |
| kling-v3, 1080p, 5s | 15 credits | ~$0.15 | 90–120s | Good balance of quality and cost |
| kling-v3, 4K, 5s | 25 credits | ~$0.25 | 150–210s | Highest quality; slowest |
| kling-v3, 720p, 10s | 20 credits | ~$0.20 | 120–180s | Double duration ≈ double credits |
| kling-v3, 1080p, 10s | 30 credits | ~$0.30 | 180–240s | Most common production choice |
| kling-v3, 4K, 10s | 50 credits | ~$0.50 | 300–420s | Use only when 4K is genuinely required |
| + audio_enabled | +3 credits | +~$0.03 | +15–25s | Per job regardless of resolution |
When NOT to use 4K:
- Prototype and iteration phases — 720p gives the same quality feedback at 40% of the cost
- Short-form social content where platform compression eliminates any 4K advantage
- Batch generation jobs with >50 clips — cost compounds quickly at 4K
Concurrency limits by plan:
- Free tier: 3 concurrent jobs, 10 jobs/day
- Standard: 10 concurrent jobs, no daily cap
- Enterprise: custom limits, dedicated capacity available
Limitations to Know Before You Build
- No synchronous endpoint — Every video job is async. You cannot get a video in a single HTTP request. Build polling or webhook support from day one.
- Video URLs expire — Kling’s CDN URLs are typically valid for 24 hours. Download and store videos in your own storage (S3, GCS, etc.) immediately after retrieval.
- No partial results — If a job fails at 80% completion, you get nothing. There is no resume or partial output.
- Prompt length ≠ better results — In practice, prompts over 200 characters tend to produce less consistent results than concise, specific prompts. Treat the 2500-character limit as a hard ceiling, not a target.
- 4K availability — At peak load (US/EU business hours), 4K jobs frequently hit
503 MODEL_OVERLOADED. Schedule batch 4K generation during off-peak hours.
Conclusion
Kling v3’s API is a straightforward REST interface with async job semantics — submit, poll, retrieve. The production client in this tutorial handles the cases that will actually burn you in production: rate limits, credit exhaustion, content policy rejections, and URL expiry. Start with 720p for development, add the VideoResolution enum switch to move to 1080p or 4K, and store every video URL to your own storage within the 24-hour expiry window.
Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).
Try this API on AtlasCloud
AtlasCloudFrequently Asked Questions
How much does the Kling v3 API cost per video clip in 2026?
Kling v3 API pricing ranges from approximately $0.14 to $0.28 per video clip depending on resolution and duration tier, based on Kling's published credit pricing. A standard 5-second 720p clip falls toward the lower end of that range (~$0.14), while higher-resolution outputs (up to native 4K, introduced in v3 on February 4, 2026) push toward $0.28 per clip. For production workloads, developers sho
What is the average API response latency for Kling v3 video generation?
Kling v3 uses an asynchronous polling model, with a median generation latency of approximately 90 seconds for a 5-second 720p clip. This means your Python code must poll a job status endpoint rather than waiting on a synchronous response. Latency can increase for 4K outputs or longer clip durations. Developers should implement polling intervals of 5–10 seconds with a timeout threshold of at least
What new features did Kling v3 add compared to v1 and v2?
Kling v3, released February 4, 2026, introduced four major capabilities not available in v1/v2: (1) native 4K video output — the first Kling model to support this resolution; (2) native audio generation baked into the video pipeline; (3) multi-shot scene control for sequencing multiple shots in a single API call; and (4) character consistency features for maintaining subject identity across frames
How do I handle async polling for Kling v3 API in Python without hitting rate limits?
Kling v3's API uses an async polling model where you submit a job and repeatedly check its status endpoint. A production-safe Python pattern involves: (1) submitting the generation request and capturing the job ID; (2) polling every 8–10 seconds using a while loop with exponential backoff on 429 rate-limit responses; (3) setting a hard timeout at 300 seconds given the ~90-second median latency for
Tags
Related Articles
How to Use Flux Kontext API in Python
A comprehensive guide to How to Use Flux Kontext API in Python
FLUX 1.1 Pro API Python Tutorial: Generate Images Fast
Learn how to use the FLUX 1.1 Pro API with Python to generate stunning AI images in under 5 minutes. Step-by-step tutorial with code examples included.
Getting Started with AI Image Generation APIs: DALL-E 3, Midjourney, and Stable Diffusion
A practical tutorial on integrating AI image generation APIs into your applications. Learn to use DALL-E 3, Midjourney, and Stable Diffusion APIs with code examples and best practices.