Skip to content

How It Works

This page gives a plain-English overview of the technology behind VoiceOtaku. We won't go into implementation details, but you'll get a clear picture of the pipeline that turns your voice into a personalised anime recommendation.

The Pipeline

When you speak during a call, your voice goes through a four-step process:

Your voice  →  Transcription  →  AI thinking  →  Voice synthesis  →  Your ears

1. Voice Capture

Your browser records audio from your microphone in short chunks while you're speaking. The recording runs locally in your browser — audio is sent to the server only when you finish speaking a segment.

2. Speech-to-Text (Transcription)

The recorded audio is transcribed into text by a speech recognition model. This is the text you see appearing on screen as you speak.

3. AI Recommendation

The transcribed text is passed to a large language model (LLM) that acts as a knowledgeable anime advisor. It reads your full conversation history for context and generates a thoughtful, relevant recommendation.

The AI is designed to:

  • Draw on a wide knowledge of anime titles, genres, studios, and themes
  • Understand nuanced preferences ("something melancholy but hopeful")
  • Ask clarifying questions when your request is ambiguous
  • Remember what you said earlier in the same call

4. Text-to-Speech (Voice Synthesis)

The AI's text response is converted back into a natural-sounding voice. This audio is streamed back to your browser and played automatically. You'll hear the recommendation as you read it on screen.


The Queue

VoiceOtaku processes one call at a time. All queuing happens server-side using a fast in-memory store. Your place in the queue is tracked by a temporary session token that lives only for the duration of your visit. See The Queue System for user-facing details.


Privacy

  • No accounts required. VoiceOtaku does not require you to log in or register.
  • No persistent storage of conversations. Your transcript is not saved after the call ends.
  • Temporary session tokens. Any tokens used to manage your queue position expire automatically.
  • Microphone access is limited. Your browser only grants microphone access while you are actively on a call.

Made with ❤️by Aldrick Bonaobra