Inside VoiGu's AI conversation mode
A long-form look at how Conversation works — how Gu adapts to your level, how pronunciation is scored, and why your voice never leaves the device.
The first version of VoiGu's conversation feature shipped in March 2026. We'd been quietly building toward it for two years. This post is the story of why it exists, how it works, and what we got wrong along the way.
If you're a learner who hasn't tried it yet, the short version: open VoiGu, tap the microphone tab at the bottom, and start talking to Gu about your day. Gu will reply in your target language, adapted to your current level — and quietly drill the words you keep dodging.
Why a conversation mode at all
Most language apps have a dirty secret: they teach you to recognise a language, not to speak one. You can hit a 200-day streak and still freeze when a barista asks if you want oat milk.
This isn't anyone's fault — speaking practice is hard to fake at scale. Real speaking partners are expensive. Scripted dialogue gets boring. Drill-and-kill works for vocabulary but not for the messy improvisation that conversation actually is.
Large language models changed the math. For the first time, we can have a tutor in your pocket who will say something different every time, react to what you actually said, and gently pull the lesson in the direction of words you don't yet know.
How the level adaptation works
The first thing Conversation does when you start a session is estimate your level. Not from a placement test — we already know your level from your VoiGu progress. The model gets a tight summary of which structures you've mastered, which you've seen, and which you've never encountered.
Then it does something surprisingly subtle: it stays inside your level for the first few exchanges, then pushes one step beyond it, then drops back. Like a good human tutor, it scaffolds. You get the satisfaction of understanding most of what's said, plus the productive discomfort of one new structure to chew on.
We tuned this with a few hundred internal beta testers. Push too hard and learners disengage; coddle too much and they don't grow. The sweet spot turned out to be "85% familiar / 15% just-out-of-reach" — almost exactly what the language acquisition literature predicts.
Pronunciation scoring, in real time
Every time you speak, we score your pronunciation against a native baseline and surface specific feedback. Not "you said that wrong" — that's discouraging and unhelpful. More like "the 'ñ' was a touch short — try lingering on it", with a button to hear it modelled.
Under the hood, this is a phoneme-level alignment between your audio and a reference. We use a small specialised model that runs on the device — not the conversation model itself, which would be overkill for this. The result is a score (0–100) and a few specific call-outs per phrase.
We deliberately don't punish low scores. There's no "minimum to pass." Speaking is a muscle, and the only way to build it is to keep using it badly until you use it well.
The privacy part — which mattered to us a lot
Your voice never leaves your phone for transcription. The speech-to-text step runs entirely on-device, using Apple's and Google's native speech APIs. What gets sent to the conversation model is plain text — the same as if you'd typed it.
This was a deliberate trade-off. Cloud-based speech recognition is marginally more accurate, but it would mean your voice samples sitting on someone else's server. We weren't comfortable with that, and we don't think you should be either.
The text that does get sent is processed by an LLM with no long-term retention. Your conversation isn't used to train future models, isn't sold to anyone, and is deleted within 24 hours. The product detail page in Settings has the receipts.
What we got wrong in v1
The first build of Conversation talked too much. Learners would say one sentence and Gu would reply with three. We thought we were being helpful; turns out it just gave the model more space to make mistakes and didn't leave the learner room to think.
v1.2 cut Gu's average response length by 40%. Engagement went up. Lesson completion went up. The lesson was clear: in a tutoring loop, restraint is a feature.
We also originally let the model pick the topic. Bad call — learners kept hitting topics they had no vocabulary for and bouncing. Now you choose: a topic from a curated list, a recent lesson you want to practise, or a free-form prompt if you're brave.
How to try it
Conversation mode is available now in the latest version of VoiGu, on iOS and Android, for all VoiGu Plus subscribers. Free users get one Conversation session per day — enough to see if it clicks. Tap the microphone tab at the bottom of the app to start.
If you have feedback (good, bad, or weird), email us at hello@voigu.com. We read everything, and the next version of Conversation is being shaped right now by what learners tell us.
