Video-to-Video model

PixVerse LipSync

Reliable video lip sync with fast turnaround.

PixVerse Speech (LipSync) aligns mouth movement to audio for expressive, emotion‑driven performance using either a PixVerse video_id or uploaded video.

Best for: Social clips

Inputs: Video + Audio

Outputs: Video

What this model is best at

Short answer: PixVerse Speech (LipSync) aligns mouth movement to audio for expressive, emotion‑driven performance using either a PixVerse video_id or uploaded video.

Use this workspace to preview the model, compare example output, and start creating with the recommended workflow for this model.

Highlight 1

Analyzes both audio and mouth motion for tight sync.

Highlight 2

Accepts PixVerse video_id or uploaded MP4/MOV.

Highlight 3

Audio via file upload or built‑in TTS script.

Video-to-Video

PixVerse LipSync workspace

Start from the built-in workflow below, then tune the model inside the standard LipsyncX creation surface.

Talking Photo Video Dubbing Long Video Pet & Anime

1. Upload photo

1. Choose a face

Choose a template or uploadDrag & drop video or photoor click to upload

2. Choose Model

3. Add Script

Instant script templates

One-click copy for greetings, celebrations, and announcements.

—

Billing unit10 credits / 5s

Billing units—

Estimated length—

Est. total—

Uses real audio duration when available.

Voice

Speech speed (0.90x)

0 / 1000

—

Step 1/4

Choose a face

Follow the next step to keep building your video.

—

Avg render time

7 min

Languages supported

50+

Creators onboarded

3,200+

Trusted by teams

StudioBlendAudioNovaCourseWaveMintlyVisionSpark

Social clip refresh

Swap narration for a faster hook.

Original

Social clip refresh original

Synced

Social clip refresh generated

Popular use cases

Use case 1

Short‑form

Quick hook iterations.

Use case 2

Social ads

Fast creative refresh.

Use case 3

Creator posts

Lightweight updates.

Quick specs

Primary use

Fast lip sync for social clips

Inputs

PixVerse video_id or uploaded video + audio

Output

Synced video

Best strength

Speed and simplicity

Best practices

Keep clips short for the fastest turnaround.

Use clean, noise‑free audio for crisp mouth motion.

Ensure the face is clear and well‑lit.

FAQ

What are the video limits?

Up to 30 seconds, 1920px resolution, and 50MB per video.

What audio formats are supported?

MP3 or WAV audio, up to 30 seconds and 50MB.

Can I use a script instead of audio?

Yes. Provide a TTS script to generate the audio automatically.

Ready to try PixVerse LipSync?

Use the built-in workspace to test prompts, compare outputs, and see how this model fits your content workflow.