How It Works

This page gives a plain-English overview of the technology behind VoiceOtaku. We won't go into implementation details, but you'll get a clear picture of the pipeline that turns your voice into a personalised anime recommendation.

The Pipeline

When you speak during a call, your voice goes through a four-step process:

Your voice  →  Transcription  →  AI thinking  →  Voice synthesis  →  Your ears

1. Voice Capture

Your browser records audio from your microphone in short chunks while you're speaking. The recording runs locally in your browser — audio is sent to the server only when you finish speaking a segment.

2. Speech-to-Text (Transcription)

The recorded audio is transcribed into text by a speech recognition model. This is the text you see appearing on screen as you speak.

3. AI Recommendation

The transcribed text is passed to a large language model (LLM) that acts as a knowledgeable anime advisor. It reads your full conversation history for context and generates a thoughtful, relevant recommendation.

The AI is designed to:

Draw on a wide knowledge of anime titles, genres, studios, and themes
Understand nuanced preferences ("something melancholy but hopeful")
Ask clarifying questions when your request is ambiguous
Remember what you said earlier in the same call

4. Text-to-Speech (Voice Synthesis)

The AI's text response is converted back into a natural-sounding voice. This audio is streamed back to your browser and played automatically. You'll hear the recommendation as you read it on screen.

The Queue

VoiceOtaku processes one call at a time. All queuing happens server-side using a fast in-memory store. Your place in the queue is tracked by a temporary session token that lives only for the duration of your visit. See The Queue System for user-facing details.

Privacy

No accounts required. VoiceOtaku does not require you to log in or register.
No persistent storage of conversations. Your transcript is not saved after the call ends.
Temporary session tokens. Any tokens used to manage your queue position expire automatically.
Microphone access is limited. Your browser only grants microphone access while you are actively on a call.

How It Works ​

The Pipeline ​

1. Voice Capture ​

2. Speech-to-Text (Transcription) ​

3. AI Recommendation ​

4. Text-to-Speech (Voice Synthesis) ​

The Queue ​

Privacy ​