ποΈ System Architecture
End-to-End Dataflow
The diagram below illustrates the journey of an audio sample through every subsystem. Each numbered step is elaborated in subsequent sections.
Browser (Next.js) API Route Streamlit Backend HuggingFace
βββββββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββββ βββββββββββ
1. Mic Recording ββββββββΆ 2. POST /process-audio ββΆ 3. Save temp WAV
β 4. Feature Extraction (MFCC, spectral)
β 5. wav2vec2 Embedding
β 6. Emotion Classifier
β 7. SHAP Explainer
βΌ
8. JSON Results ββββββββββββββββββββββββββββββββββββββ
9. UI Visualisation βββββββββ
Frontend
- Next.js 14 App Router for file-system routing and API endpoints.
- React 18 functional components with hooks.
- TailwindΒ CSS JIT classes for styling.
- Framer-Motion for the animated recorder visualisation.
- Web Audio API to capture and stream microphone input.
Backend
- Streamlit 1.x app running on port 8501.
- HuggingFace
wav2vec2-lg-xlsr-en-speech-emotion-recognitionmodel (β330βM parameters). - Fallback heuristic classifier to guarantee predictions offline.
- Feature extraction with
librosa,numpy, and custom DSP utilities. - SHAP explainability for per-feature attributions.
- Optional RL fine-tuning pipeline using PPO in
human_voice_ai.rl.
DevOps & Observability
- Shell scripts
deploy-local.shanddeploy-stable.shorchestrate the services, install dependencies, and ensure port availability. - Docker support via
Dockerfile&docker-compose.ymlfor containerised deployment. - GitHub Actions (planned) for CI, running unit & integration tests.
- Colored terminal output for quick status inspection during local runs.
Explore additional design documents inΒ the repository.