What this model is best at
Short answer: Audio‑driven multi‑character avatar model built for realistic group conversations with synchronized lip sync and natural turn‑taking.
Use this workspace to preview the model, compare example output, and start creating with the recommended workflow for this model.
Highlight 1
Multi‑character conversations with synchronized lip sync.
Highlight 2
Multi‑stream audio support for multi‑speaker dialogue.
Highlight 3
Natural group dynamics and turn‑taking.
Audio-to-Video
LongCat Multi‑Avatar workspace
Start from the built-in workflow below, then tune the model inside the standard LipsyncX creation surface.
1. Upload photo
2. Choose Model
3. Add Script
Instant script templates
One-click copy for greetings, celebrations, and announcements.
Step 1/4
Choose a face
Follow the next step to keep building your video.
Trusted by teams
Two‑speaker panel
Drive multiple avatars from one audio track.
Popular use cases
Podcast panels
Multi‑guest episodes.
Roundtables
Two‑speaker summaries.
Debates
Split‑speaker scripts.
Quick specs
Best practices
FAQ
Can it handle multiple speakers?
Yes. It is designed for multi‑character lip‑sync conversations.
What inputs are required?
Provide portraits plus a separate audio stream for each speaker.
Is it suitable for longer scenes?
It targets long‑form stability with consistent identity.
