I Made AI Personas Debate in Realtime. Here's What Nearly Broke It
The Problem With AI Conversations
If you've ever used an AI presentation or discussion tool, you know the feeling. The AI finishes speaking. Then silence. A second passes. Two seconds. The next response loads. The immersion dies.
That gap is the enemy. It's the thing that reminds you this is software, not a real conversation. I wanted to eliminate it completely.
The Vision — Discussion Dodo
I wanted to build something that felt like a real panel debate. Multiple personas with different viewpoints, arguing a topic in your language, with no awkward pauses between turns. Up to 4 panelists plus a host, each with their own voice, personality, and stance.
The topic: anything. The language: any of 10+. The feel: a real talk show.
The Pipeline — Gapless By Design
The key insight was intelligent prefetching. While the current persona is speaking, the next persona's response is already being generated and buffered in the background. By the time the first voice finishes, the second one is ready to fire instantly.
This is the same principle as video streaming — you don't wait for the whole video to load, it loads ahead of what you're watching. I applied the same logic to audio turns.
The turn system supports 4 to 50 configurable turns, extendable on the fly. Each turn knows who speaks next, what context came before, and what stance that persona holds. The result is a conversation that flows, not stutters.

The Persona System
Each persona has a name, a voice (from 6 TTS options), a language, and a defined stance on the topic. They stay in character across every turn. The AI doesn't just generate random dialogue — it generates context-aware responses that build on what the previous persona said.
For the "Is AI stealing jobs?" debate in Hindi, I used three personas — Job Slayer, Opportunity Seeker, and Reality Check. Each one pushed back on the others authentically.

Adding Drama — Background Music
A debate without emotional tension is just a conversation. So I added background music as a layer underneath the voices. The same argument hits completely differently with a tense score underneath it. The music runs through the full Web Audio API pipeline — volume, panning, and effects all apply without interfering with the voice tracks.

You Can Interrupt It
Hold M to jump in at any point. Dodo pauses, listens via OpenAI Whisper, processes your interjection, and responds in context before resuming the discussion. This is the feature that makes it feel truly interactive rather than just a playback system.

Everything Gets Recorded
All of this — the personas, the music, your voice, your camera — gets captured in CamPrompter's recording pipeline as separate audio tracks. You walk away with a ready-to-publish video.
Watch It in Action
"Is AI stealing jobs?" debate in Hindi
Try It Yourself
Built with
- OpenAI — TTS & Whisper STT
- Anthropic — Claude LLM support
- Google Gemini — LLM support
- Together AI — LLM support