Text-to-Video model

Kling LipSync (Text‑to‑Video)

Generate lip‑synced video directly from a script.

Kling’s native‑audio generation creates text‑to‑video clips with synchronized voice and lip sync, including multi‑person dialogue.

Best for: Script‑only workflows

Inputs: Text

Outputs: Video

What this model is best at

Short answer: Kling’s native‑audio generation creates text‑to‑video clips with synchronized voice and lip sync, including multi‑person dialogue.

Use this workspace to preview the model, compare example output, and start creating with the recommended workflow for this model.

Highlight 1

Voice narration with natural emotion.

Highlight 2

Multi‑person dialogue with lip sync.

Highlight 3

Singing/rap and ambient audio support.

Text-to-Video

Kling LipSync (Text‑to‑Video) workspace

Start from the built-in workflow below, then tune the model inside the standard LipsyncX creation surface.

Talking Photo Video Dubbing Long Video Pet & Anime

1. Upload photo

1. Choose a face

Choose a template or uploadDrag & drop video or photoor click to upload

2. Choose Model

3. Add Script

Instant script templates

One-click copy for greetings, celebrations, and announcements.

—

Billing unit10 credits / 5s

Billing units—

Estimated length—

Est. total—

Uses real audio duration when available.

Voice

Speech speed (0.90x)

0 / 1000

—

Step 1/4

Choose a face

Follow the next step to keep building your video.

—

Avg render time

7 min

Languages supported

50+

Creators onboarded

3,200+

Trusted by teams

StudioBlendAudioNovaCourseWaveMintlyVisionSpark

Script‑to‑video

Type a script and generate a talking clip.

Script

Generated

Popular use cases

Use case 1

Rapid prototyping

No media required.

Use case 2

Concept testing

Validate scripts quickly.

Use case 3

Internal drafts

Fast review loops.

Quick specs

Primary use

Text‑to‑video with lip sync

Inputs

Script / prompt

Output

Video with generated audio

Best strength

Script‑only workflow

Best practices

Write clear dialogue with speaker changes labeled.

Keep prompts concise and visually specific.

Use short segments to test tone before longer runs.

FAQ

Is audio generated with the video?

Yes. Native audio is generated alongside the video output.

Can it handle multi‑person dialogue?

Yes. Kling supports multi‑person dialogue with lip sync.

Which languages are supported?

Chinese and English voice output are supported.

Ready to try Kling LipSync (Text‑to‑Video)?

Use the built-in workspace to test prompts, compare outputs, and see how this model fits your content workflow.