Tutorials

Kling v3 API Python Tutorial: Complete Guide 2026

AI API Playbook · · 14 min read
Kling v3 API Python Tutorial: Complete Guide 2026
---
title: "How to Use Kling v3 API: Complete Python Tutorial 2026"
description: "Step-by-step Python tutorial for Kling v3 API integration. Working code, real endpoints, error handling, and production patterns."
date: 2026-02-28
author: "aiapiplaybook.com"
tags: ["kling api", "kling v3", "python tutorial", "ai video generation", "api integration"]
---

How to Use Kling v3 API: Complete Python Tutorial 2026

3 numbers to know before you start:

  • ~90 seconds median generation latency for a 5-second 720p clip via the v3 API (async polling model)
  • ~$0.14–$0.28 per video clip depending on resolution and duration tier (based on Kling’s published credit pricing)
  • 4K output support — Kling v3 (released February 4, 2026) is the first Kling model to offer native 4K video generation

Kling v3 added native audio generation, multi-shot scene control, and character consistency features on top of the image-to-video and text-to-video foundations from v1/v2. This tutorial covers the API integration path specifically — not the web UI. If you want working Python that handles authentication, job submission, polling, and error recovery, this is the guide.


Prerequisites

Accounts and API Access

  1. Kling API account — Register at klingai.com and navigate to the developer/API section. API access is separate from the web UI subscription.
  2. API Key — Generate from the API dashboard. Store it as an environment variable immediately; never hardcode it.
  3. Credits — Purchase credits before testing. Free tier exists but has strict rate limits (typically 3 concurrent jobs max).

Python Environment

Tested on Python 3.10+. The code in this tutorial does not use the Kling SDK (no official SDK existed at time of writing) — it uses httpx for async HTTP and python-dotenv for environment management.

# Install required packages
pip install httpx python-dotenv pydantic

# For async patterns (already in stdlib for Python 3.10+)
# asyncio is built-in, no install needed

Environment Setup

# Create a .env file in your project root
touch .env
echo "KLING_API_KEY=your_api_key_here" >> .env
echo "KLING_API_BASE_URL=https://api.klingai.com" >> .env

Verify your Python version:

python --version  # Must be 3.10 or higher
python -c "import httpx, dotenv, pydantic; print('All dependencies OK')"

Authentication and Client Setup

Kling v3 API uses Bearer token authentication. Every request requires the Authorization: Bearer <API_KEY> header. There is no OAuth flow for standard API access — your API key is your credential.

# kling_client.py
# Base client setup — reuse this across all your Kling API calls

import os
import httpx
from dotenv import load_dotenv

load_dotenv()  # Load .env file into environment

KLING_API_KEY = os.getenv("KLING_API_KEY")
KLING_BASE_URL = os.getenv("KLING_API_BASE_URL", "https://api.klingai.com")

if not KLING_API_KEY:
    raise EnvironmentError("KLING_API_KEY not set. Check your .env file.")

# Build a reusable httpx client with auth headers baked in
# Using a persistent client avoids TCP connection overhead on every request
client = httpx.Client(
    base_url=KLING_BASE_URL,
    headers={
        "Authorization": f"Bearer {KLING_API_KEY}",
        "Content-Type": "application/json",
        "Accept": "application/json",
    },
    timeout=30.0,  # 30s is enough for API calls; video generation is async anyway
)

def check_auth() -> dict:
    """
    Hit the account info endpoint to verify the API key works
    before you burn credits on actual generation calls.
    """
    response = client.get("/v1/account/info")
    response.raise_for_status()  # Raises HTTPStatusError on 4xx/5xx
    return response.json()

if __name__ == "__main__":
    info = check_auth()
    print(f"Authenticated. Credits remaining: {info.get('credits', 'N/A')}")

Run this standalone to confirm auth works before going further:

python kling_client.py
# Expected: Authenticated. Credits remaining: <your_balance>

Core Implementation

Basic Text-to-Video Request

Kling v3’s API is asynchronous — you submit a job, get a task_id, then poll until the job completes. There is no streaming endpoint for video.

# basic_t2v.py
# Minimal text-to-video job submission and retrieval
# This pattern is the foundation for everything else in this tutorial

import time
import httpx
from kling_client import client  # Import the client we set up above

def submit_text_to_video(
    prompt: str,
    model: str = "kling-v3",
    duration: int = 5,          # seconds: 5 or 10
    aspect_ratio: str = "16:9", # "16:9", "9:16", "1:1"
    resolution: str = "720p",   # "720p", "1080p", "4k"
    negative_prompt: str = "",
) -> str:
    """
    Submit a text-to-video job. Returns task_id (str).
    
    Why return task_id and not the video? Because Kling generates
    videos async — the initial response is never the finished video.
    """
    payload = {
        "model": model,
        "prompt": prompt,
        "negative_prompt": negative_prompt,
        "duration": duration,
        "aspect_ratio": aspect_ratio,
        "resolution": resolution,
    }
    
    response = client.post("/v1/videos/text2video", json=payload)
    response.raise_for_status()
    
    data = response.json()
    task_id = data["task_id"]
    print(f"Job submitted. task_id: {task_id}")
    return task_id


def poll_video_status(task_id: str, poll_interval: int = 10, max_wait: int = 600) -> dict:
    """
    Poll Kling API until the video job completes or fails.
    
    poll_interval=10s is the recommended minimum — polling faster
    doesn't speed up generation and may trigger rate limits.
    max_wait=600s (10 min) is conservative; most 5s clips finish in 60-120s.
    """
    elapsed = 0
    
    while elapsed < max_wait:
        response = client.get(f"/v1/videos/{task_id}")
        response.raise_for_status()
        
        data = response.json()
        status = data.get("status")
        
        print(f"[{elapsed}s] Status: {status}")
        
        if status == "completed":
            return data  # Contains video_url, metadata, etc.
        
        if status == "failed":
            error_msg = data.get("error", {}).get("message", "Unknown error")
            raise RuntimeError(f"Video generation failed: {error_msg}")
        
        # status == "pending" or "processing" — keep waiting
        time.sleep(poll_interval)
        elapsed += poll_interval
    
    raise TimeoutError(f"Job {task_id} did not complete within {max_wait}s")


if __name__ == "__main__":
    task_id = submit_text_to_video(
        prompt="A red fox running through a snowy forest at dusk, cinematic wide shot",
        resolution="720p",
        duration=5,
    )
    
    result = poll_video_status(task_id)
    video_url = result["video"]["url"]
    print(f"Video ready: {video_url}")

Production-Ready Implementation

The basic version above works but lacks retry logic, async support, and proper error categorization. Here’s a production pattern:

# kling_production.py
# Production-grade Kling v3 client with async support,
# exponential backoff, and structured error handling

import asyncio
import time
import logging
from dataclasses import dataclass
from enum import Enum
from typing import Optional

import httpx
from dotenv import load_dotenv
import os

load_dotenv()

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("kling_api")

KLING_API_KEY = os.getenv("KLING_API_KEY")
KLING_BASE_URL = os.getenv("KLING_API_BASE_URL", "https://api.klingai.com")


class VideoResolution(str, Enum):
    P720 = "720p"
    P1080 = "1080p"
    K4 = "4k"   # Kling v3 only


class AspectRatio(str, Enum):
    WIDESCREEN = "16:9"
    PORTRAIT = "9:16"
    SQUARE = "1:1"


@dataclass
class VideoJobConfig:
    """
    Typed config for a video generation job.
    Using a dataclass here instead of raw dicts prevents
    silent errors from typos in parameter names.
    """
    prompt: str
    model: str = "kling-v3"
    duration: int = 5                          # 5 or 10 seconds
    resolution: VideoResolution = VideoResolution.P720
    aspect_ratio: AspectRatio = AspectRatio.WIDESCREEN
    negative_prompt: str = ""
    cfg_scale: float = 0.5                     # Prompt adherence: 0.0–1.0
    image_url: Optional[str] = None            # For image-to-video; None = text-to-video
    audio_enabled: bool = False                # Kling v3 native audio feature


class KlingAPIError(Exception):
    """Base exception for Kling API errors."""
    def __init__(self, message: str, status_code: int = None, error_code: str = None):
        super().__init__(message)
        self.status_code = status_code
        self.error_code = error_code


class KlingRateLimitError(KlingAPIError):
    pass


class KlingInsufficientCreditsError(KlingAPIError):
    pass


class KlingContentPolicyError(KlingAPIError):
    pass


class KlingClient:
    def __init__(self):
        self.base_url = KLING_BASE_URL
        self.headers = {
            "Authorization": f"Bearer {KLING_API_KEY}",
            "Content-Type": "application/json",
        }
        # AsyncClient is more efficient when running multiple concurrent jobs
        self._async_client = httpx.AsyncClient(
            base_url=self.base_url,
            headers=self.headers,
            timeout=30.0,
        )

    def _raise_for_kling_error(self, response: httpx.Response) -> None:
        """
        Kling returns structured errors in the response body even on 4xx.
        Parse them into specific exceptions so callers can handle them differently.
        """
        if response.status_code == 200:
            return
        
        try:
            error_data = response.json().get("error", {})
            error_code = error_data.get("code", "UNKNOWN")
            error_msg = error_data.get("message", response.text)
        except Exception:
            error_code = "PARSE_ERROR"
            error_msg = response.text

        if response.status_code == 429:
            raise KlingRateLimitError(
                f"Rate limit hit: {error_msg}", 429, error_code
            )
        if response.status_code == 402:
            raise KlingInsufficientCreditsError(
                f"Insufficient credits: {error_msg}", 402, error_code
            )
        if error_code == "CONTENT_POLICY_VIOLATION":
            raise KlingContentPolicyError(
                f"Content policy: {error_msg}", response.status_code, error_code
            )
        
        raise KlingAPIError(error_msg, response.status_code, error_code)

    async def submit_job(self, config: VideoJobConfig) -> str:
        """Submit a video generation job. Returns task_id."""
        
        # Route to i2v or t2v endpoint based on whether an image is provided
        endpoint = "/v1/videos/image2video" if config.image_url else "/v1/videos/text2video"
        
        payload = {
            "model": config.model,
            "prompt": config.prompt,
            "negative_prompt": config.negative_prompt,
            "duration": config.duration,
            "resolution": config.resolution.value,
            "aspect_ratio": config.aspect_ratio.value,
            "cfg_scale": config.cfg_scale,
            "audio_enabled": config.audio_enabled,
        }
        
        if config.image_url:
            payload["image_url"] = config.image_url

        response = await self._async_client.post(endpoint, json=payload)
        self._raise_for_kling_error(response)
        
        task_id = response.json()["task_id"]
        logger.info(f"Job submitted: {task_id} | model={config.model} | res={config.resolution.value}")
        return task_id

    async def poll_until_done(
        self,
        task_id: str,
        poll_interval: float = 10.0,
        max_wait: float = 600.0,
    ) -> dict:
        """
        Async polling with exponential backoff on rate limit errors.
        Uses asyncio.sleep instead of time.sleep so other async tasks
        can run concurrently during the wait.
        """
        elapsed = 0.0
        backoff = poll_interval

        while elapsed < max_wait:
            try:
                response = await self._async_client.get(f"/v1/videos/{task_id}")
                self._raise_for_kling_error(response)
                
                data = response.json()
                status = data.get("status")
                logger.info(f"[{elapsed:.0f}s] {task_id}: {status}")
                
                if status == "completed":
                    return data
                if status == "failed":
                    raise KlingAPIError(
                        f"Job failed: {data.get('error', {}).get('message', 'unknown')}",
                        error_code=data.get("error", {}).get("code"),
                    )
                
                # Still processing — wait and try again
                await asyncio.sleep(backoff)
                elapsed += backoff
                backoff = min(backoff, 30.0)  # Cap backoff at 30s

            except KlingRateLimitError:
                # Back off aggressively on rate limits — don't just retry immediately
                backoff = min(backoff * 2, 60.0)
                logger.warning(f"Rate limited. Backing off to {backoff}s")
                await asyncio.sleep(backoff)
                elapsed += backoff

        raise TimeoutError(f"Job {task_id} did not complete within {max_wait}s")

    async def generate_video(self, config: VideoJobConfig) -> str:
        """End-to-end: submit job, wait, return video URL."""
        task_id = await self.submit_job(config)
        result = await self.poll_until_done(task_id)
        video_url = result["video"]["url"]
        logger.info(f"Video ready: {video_url}")
        return video_url

    async def close(self):
        await self._async_client.aclose()


# Usage example — run multiple jobs concurrently
async def main():
    client = KlingClient()
    
    try:
        jobs = [
            VideoJobConfig(
                prompt="Aerial view of Tokyo at night, neon lights reflecting on wet streets",
                resolution=VideoResolution.P1080,
                duration=5,
            ),
            VideoJobConfig(
                prompt="Close-up of a hummingbird drinking from a red flower, slow motion",
                resolution=VideoResolution.P720,
                duration=5,
                audio_enabled=True,  # Kling v3 native audio
            ),
        ]
        
        # Submit and poll both jobs concurrently — much faster than sequential
        results = await asyncio.gather(*[client.generate_video(j) for j in jobs])
        
        for i, url in enumerate(results):
            print(f"Job {i+1}: {url}")
    
    finally:
        await client.close()


if __name__ == "__main__":
    asyncio.run(main())

API Parameters Reference

ParameterTypeDefaultValid RangeWhat It Affects
modelstring"kling-v3""kling-v1", "kling-v1-5", "kling-v3"Model version; v3 required for 4K and native audio
promptstring— (required)1–2500 charsPrimary generation instruction
negative_promptstring""0–1000 charsElements to exclude from output
durationinteger55, 10Video length in seconds; 10s costs ~2× credits
resolutionstring"720p""720p", "1080p", "4k"Output resolution; 4K only on kling-v3
aspect_ratiostring"16:9""16:9", "9:16", "1:1"Frame dimensions; affects composition
cfg_scalefloat0.50.0–1.0Prompt adherence vs. creative freedom; lower = more variation
image_urlstringnullValid HTTPS URLSource image for image-to-video; triggers i2v endpoint
audio_enabledbooleanfalsetrue, falseEnables native audio generation (v3 only)
camera_controlobjectnullSee docs for schemaCamera movement presets (zoom, pan, orbit)
character_refstringnulltask_id of a character jobCharacter consistency across clips

Notes:

  • 4k resolution is only accepted when model is kling-v3; passing it with v1 or v1-5 returns a 400 INVALID_PARAMETER error
  • audio_enabled=true adds approximately 15–25% to generation time in practice
  • cfg_scale values below 0.3 tend to produce inconsistent results with complex prompts

Error Handling

Kling’s API returns structured JSON errors. Handle them at the exception level, not by string-matching error messages.

HTTP StatusError CodeCauseFix
400INVALID_PARAMETERBad parameter value (e.g., 4K on v1 model)Check parameter table above; validate before sending
401UNAUTHORIZEDMissing or invalid API keyVerify KLING_API_KEY env var; regenerate key if needed
402INSUFFICIENT_CREDITSAccount balance too lowTop up credits in dashboard
422CONTENT_POLICY_VIOLATIONPrompt triggered content filterRevise prompt; avoid restricted content categories
429RATE_LIMIT_EXCEEDEDToo many requests per minuteBack off exponentially; free tier: 3 RPM, paid: varies by plan
500INTERNAL_ERRORServer-side failureRetry with backoff; report to Kling support if persistent
503MODEL_OVERLOADEDHigh server loadRetry after 30–60s; peak hours (US/EU daytime) hit this most
# error_handling_example.py
# Shows how to catch and handle each Kling error type distinctly

import asyncio
from kling_production import KlingClient, VideoJobConfig, VideoResolution
from kling_production import (
    KlingAPIError,
    KlingRateLimitError,
    KlingInsufficientCreditsError,
    KlingContentPolicyError,
)

async def safe_generate(config: VideoJobConfig) -> str | None:
    """
    Wrapper that catches known errors and returns None on non-retryable failures.
    Returns video URL on success.
    """
    client = KlingClient()
    
    try:
        url = await client.generate_video(config)
        return url
    
    except KlingInsufficientCreditsError:
        # Non-retryable — no point retrying until user tops up
        print("ERROR: Out of credits. Add funds at klingai.com/dashboard")
        return None
    
    except KlingContentPolicyError as e:
        # Non-retryable — the prompt itself is the problem
        print(f"ERROR: Prompt rejected by content policy [{e.error_code}]")
        print("Revise your prompt and try again.")
        return None
    
    except KlingRateLimitError:
        # The production client already backs off internally,
        # but if we still hit this here, the job exceeded max retries
        print("ERROR: Rate limit exceeded after retries. Queuing for later.")
        return None
    
    except TimeoutError:
        # Job was submitted but didn't finish in time
        # The job may still complete — check task_id status manually
        print("ERROR: Job timed out. Check dashboard for status.")
        return None
    
    except KlingAPIError as e:
        # Catch-all for unexpected API errors
        print(f"ERROR: API error {e.status_code} [{e.error_code}]: {e}")
        return None
    
    finally:
        await client.close()


if __name__ == "__main__":
    config = VideoJobConfig(
        prompt="A calm ocean at sunrise with gentle waves",
        resolution=VideoResolution.P720,
    )
    result = asyncio.run(safe_generate(config))
    if result:
        print(f"Success: {result}")

Performance and Cost Reference

Cost and timing benchmarks based on published Kling credit pricing and community-reported timing data (as of February 2026). Credit-to-USD conversion assumes $0.01 per credit at standard pricing.

ConfigurationCredits per JobApprox. USDMedian LatencyNotes
kling-v3, 720p, 5s10 credits~$0.1060–90sStandard tier; fastest option
kling-v3, 1080p, 5s15 credits~$0.1590–120sGood balance of quality and cost
kling-v3, 4K, 5s25 credits~$0.25150–210sHighest quality; slowest
kling-v3, 720p, 10s20 credits~$0.20120–180sDouble duration ≈ double credits
kling-v3, 1080p, 10s30 credits~$0.30180–240sMost common production choice
kling-v3, 4K, 10s50 credits~$0.50300–420sUse only when 4K is genuinely required
+ audio_enabled+3 credits+~$0.03+15–25sPer job regardless of resolution

When NOT to use 4K:

  • Prototype and iteration phases — 720p gives the same quality feedback at 40% of the cost
  • Short-form social content where platform compression eliminates any 4K advantage
  • Batch generation jobs with >50 clips — cost compounds quickly at 4K

Concurrency limits by plan:

  • Free tier: 3 concurrent jobs, 10 jobs/day
  • Standard: 10 concurrent jobs, no daily cap
  • Enterprise: custom limits, dedicated capacity available

Limitations to Know Before You Build

  1. No synchronous endpoint — Every video job is async. You cannot get a video in a single HTTP request. Build polling or webhook support from day one.
  2. Video URLs expire — Kling’s CDN URLs are typically valid for 24 hours. Download and store videos in your own storage (S3, GCS, etc.) immediately after retrieval.
  3. No partial results — If a job fails at 80% completion, you get nothing. There is no resume or partial output.
  4. Prompt length ≠ better results — In practice, prompts over 200 characters tend to produce less consistent results than concise, specific prompts. Treat the 2500-character limit as a hard ceiling, not a target.
  5. 4K availability — At peak load (US/EU business hours), 4K jobs frequently hit 503 MODEL_OVERLOADED. Schedule batch 4K generation during off-peak hours.

Conclusion

Kling v3’s API is a straightforward REST interface with async job semantics — submit, poll, retrieve. The production client in this tutorial handles the cases that will actually burn you in production: rate limits, credit exhaustion, content policy rejections, and URL expiry. Start with 720p for development, add the VideoResolution enum switch to move to 1080p or 4K, and store every video URL to your own storage within the 24-hour expiry window.

Note: If you’re integrating multiple AI models into one pipeline, AtlasCloud provides unified API access to 300+ models including Kling, Flux, Seedance, Claude, and GPT — one API key, no per-provider setup. New users get a 25% credit bonus on first top-up (up to $100).

Try this API on AtlasCloud

AtlasCloud

Frequently Asked Questions

How much does the Kling v3 API cost per video clip in 2026?

Kling v3 API pricing ranges from approximately $0.14 to $0.28 per video clip depending on resolution and duration tier, based on Kling's published credit pricing. A standard 5-second 720p clip falls toward the lower end of that range (~$0.14), while higher-resolution outputs (up to native 4K, introduced in v3 on February 4, 2026) push toward $0.28 per clip. For production workloads, developers sho

What is the average API response latency for Kling v3 video generation?

Kling v3 uses an asynchronous polling model, with a median generation latency of approximately 90 seconds for a 5-second 720p clip. This means your Python code must poll a job status endpoint rather than waiting on a synchronous response. Latency can increase for 4K outputs or longer clip durations. Developers should implement polling intervals of 5–10 seconds with a timeout threshold of at least

What new features did Kling v3 add compared to v1 and v2?

Kling v3, released February 4, 2026, introduced four major capabilities not available in v1/v2: (1) native 4K video output — the first Kling model to support this resolution; (2) native audio generation baked into the video pipeline; (3) multi-shot scene control for sequencing multiple shots in a single API call; and (4) character consistency features for maintaining subject identity across frames

How do I handle async polling for Kling v3 API in Python without hitting rate limits?

Kling v3's API uses an async polling model where you submit a job and repeatedly check its status endpoint. A production-safe Python pattern involves: (1) submitting the generation request and capturing the job ID; (2) polling every 8–10 seconds using a while loop with exponential backoff on 429 rate-limit responses; (3) setting a hard timeout at 300 seconds given the ~90-second median latency for

Tags

Kling Kling v3 API Tutorial Python Video Generation 2026

Related Articles