🏗️ System Architecture

End-to-End Dataflow

The diagram below illustrates the journey of an audio sample through every subsystem. Each numbered step is elaborated in subsequent sections.


 Browser (Next.js)          API Route              Streamlit Backend                 HuggingFace
 ─────────────────────────  ─────────────────────  ───────────────────────────────   ───────────
 1. Mic Recording  ───────▶  2. POST /process-audio ─▶  3. Save temp WAV          
                                           │        4. Feature Extraction (MFCC, spectral)
                                           │        5. wav2vec2 Embedding
                                           │        6. Emotion Classifier
                                           │        7. SHAP Explainer
                                           ▼
                              8. JSON Results ◀─────────────────────────────────────
 9. UI Visualisation ◀────────

Frontend

Next.js 14 App Router for file-system routing and API endpoints.
React 18 functional components with hooks.
Tailwind CSS JIT classes for styling.
Framer-Motion for the animated recorder visualisation.
Web Audio API to capture and stream microphone input.

Backend

Streamlit 1.x app running on port 8501.
HuggingFace wav2vec2-lg-xlsr-en-speech-emotion-recognition model (≈330 M parameters).
Fallback heuristic classifier to guarantee predictions offline.
Feature extraction with librosa, numpy, and custom DSP utilities.
SHAP explainability for per-feature attributions.
Optional RL fine-tuning pipeline using PPO in human_voice_ai.rl.

DevOps & Observability

Shell scripts deploy-local.sh and deploy-stable.sh orchestrate the services, install dependencies, and ensure port availability.
Docker support via Dockerfile & docker-compose.yml for containerised deployment.
GitHub Actions (planned) for CI, running unit & integration tests.
Colored terminal output for quick status inspection during local runs.

Explore additional design documents in the repository.