VoiceΒ AI
Back to App

πŸ—οΈ System Architecture

End-to-End Dataflow

The diagram below illustrates the journey of an audio sample through every subsystem. Each numbered step is elaborated in subsequent sections.


 Browser (Next.js)          API Route              Streamlit Backend                 HuggingFace
 ─────────────────────────  ─────────────────────  ───────────────────────────────   ───────────
 1. Mic Recording  ───────▢  2. POST /process-audio ─▢  3. Save temp WAV          
                                           β”‚        4. Feature Extraction (MFCC, spectral)
                                           β”‚        5. wav2vec2 Embedding
                                           β”‚        6. Emotion Classifier
                                           β”‚        7. SHAP Explainer
                                           β–Ό
                              8. JSON Results ◀─────────────────────────────────────
 9. UI Visualisation ◀────────

Frontend

  • Next.js 14 App Router for file-system routing and API endpoints.
  • React 18 functional components with hooks.
  • TailwindΒ CSS JIT classes for styling.
  • Framer-Motion for the animated recorder visualisation.
  • Web Audio API to capture and stream microphone input.

Backend

  • Streamlit 1.x app running on port 8501.
  • HuggingFace wav2vec2-lg-xlsr-en-speech-emotion-recognition model (β‰ˆ330 M parameters).
  • Fallback heuristic classifier to guarantee predictions offline.
  • Feature extraction with librosa, numpy, and custom DSP utilities.
  • SHAP explainability for per-feature attributions.
  • Optional RL fine-tuning pipeline using PPO in human_voice_ai.rl.

DevOps & Observability

  • Shell scripts deploy-local.sh and deploy-stable.sh orchestrate the services, install dependencies, and ensure port availability.
  • Docker support via Dockerfile & docker-compose.yml for containerised deployment.
  • GitHub Actions (planned) for CI, running unit & integration tests.
  • Colored terminal output for quick status inspection during local runs.

Explore additional design documents inΒ the repository.