32 lines
2.9 KiB
Markdown
32 lines
2.9 KiB
Markdown
# Milestone 2: Local Audio & GUI (The Ears)
|
|
**Goal:** Enable the client to process high-quality audio locally and display the interface.
|
|
|
|
### 1. UI Layout (`client_node/ui`)
|
|
- [ ] **Dependencies:** Add `egui`, `eframe`.
|
|
- [ ] **AI Context Trap (Eframe + Tokio):** Do NOT use `#[tokio::main]` on the client. `eframe` demands the main thread. Manually build a `tokio::runtime::Runtime`, spawn the background network actors, and pass MPSC channels into the `AppState` before calling `eframe::run_native()`.
|
|
- [ ] **Architecture:** Create `struct AppState`. Implement `eframe::App` trait for it.
|
|
- [ ] **Layout:** Build the basic classic TeamSpeak UI. Left panel (tree view of hardcoded channels), right panel (text chat log).
|
|
|
|
### 2. Audio Capture (`client_node/audio/capture.rs`)
|
|
- [ ] **Dependencies:** Add `cpal`, `ringbuf`.
|
|
- [ ] **Device Setup:** Use `cpal::default_host().default_input_device()`. Build a stream config specifically requesting `48,000 Hz` and `1 channel` (Mono).
|
|
- [ ] **Headless Abstraction:** Ensure the `cpal` instantiation is hidden behind a trait so the CI test suite can inject deterministic "sine wave" `f32` vectors instead of requiring a physical microphone.
|
|
- [ ] **The Producer:** Create a `ringbuf::HeapRb<f32>` (e.g., 4096 capacity). Split it into `(producer, consumer)`.
|
|
- [ ] **Hardware Callback:** Inside the `cpal` data callback, write the raw `f32` samples directly into the `producer`. Strictly `no_std`-like rules here (no allocations, no locks).
|
|
|
|
### 3. DSP Chain & VAD (`client_node/audio/dsp.rs`)
|
|
- [ ] **Dependencies:** Add `webrtc-audio-processing`.
|
|
- [ ] **Thread Spawning:** Spawn a standard `std::thread` (not tokio) to act as the Audio Consumer.
|
|
- [ ] **Processing Loop:** Pull chunks of exactly `960` samples (20ms) from the `consumer` ringbuffer.
|
|
- [ ] **Filters:** Pass the 960 samples through `webrtc`'s `EchoCancellation` and `NoiseSuppression` methods.
|
|
- [ ] **Voice Activity Detection (VAD):** Implement `webrtc` VAD or an amplitude threshold calculator. If the chunk is "silence", drop it to save bandwidth.
|
|
|
|
### 4. Global Hotkeys / Push-To-Talk (PTT)
|
|
- [ ] **Dependencies:** Add `global-hotkey` (or `rdev`).
|
|
- [ ] **Event Loop:** Spawn a thread to listen for a specific keycode (e.g., `Mouse4` or `V`).
|
|
- [ ] **Integration:** Update an `Arc<AtomicBool>` `is_transmitting` flag. The DSP thread reads this flag; if false, it dumps the audio chunks.
|
|
|
|
### 5. Local Loopback & UI Bridge
|
|
- [ ] **Loopback Thread:** For testing, route the post-DSP 960-sample chunks directly into a `cpal` output stream (Speaker) to physically hear the microphone quality and VAD gating.
|
|
- [ ] **UI State Bridge:** Use `tokio::sync::mpsc` or simple `Arc<AtomicBool>` to signal the UI thread when VAD triggers, so `egui` can draw the green "Active Speaker" dot next to the user's name.
|
|
- [ ] **Audio Dumper UI:** Add a checkbox in the `egui` settings panel. When checked, write the 960-sample chunks to `raw_mic.wav` and `post_dsp.wav` using the `hound` crate for local inspection. |