# Low-Level Implementation Plan ## 1. Network Packet Anatomy (The Data Plane) To minimize latency, we use a custom binary format for UDP voice data instead of JSON or Protobuf[cite: 1]. * **UDP Voice Header (Fixed 16 Bytes):** * `u32` (4 bytes): **Session Token.** Generated during TCP handshake. The server drops any packet where the IP/Port does not match this token[cite: 1]. * `u64` (8 bytes): **Sequence Number.** Monotonically increasing per user. Essential for the Jitter Buffer to reorder packets[cite: 1]. * `u32` (4 bytes): **Timestamp.** Measured in audio samples (increments by 960 per 20ms frame) to handle playback timing[cite: 1]. * **Payload:** Raw Opus-encoded bytes (variable length, typically 60–120 bytes). The bitrate is not hardcoded; it is dictated dynamically by the server's `ChannelConfig` (e.g., 16kbps for voice, 96kbps for music bots) when the user joins a room. --- ## 2. Real-Time Audio Pipeline (`client_node/audio`) Audio threads must be "lock-free" to prevent stuttering. We use a Single-Producer Single-Consumer (SPSC) ring buffer[cite: 1]. * **Global Hotkeys / Push-to-Talk:** * Use `global-hotkey` (or `rdev`) to hook OS-level key presses, allowing PTT even when minimized[cite: 1]. * **Microphone Thread (The Producer):** * Initialize `cpal` with a 48kHz input stream[cite: 1]. * **Rule:** The hardware callback *must only* push raw `f32` samples into the `ringbuf`. No networking or heavy math allowed here[cite: 1]. * **DSP/Encoder Thread (The Consumer):** * Pull samples from `ringbuf`. * Process via `webrtc_audio_processing` (Echo Cancellation, Noise Suppression, and Voice Activity Detection/VAD). If VAD detects silence, stop transmitting to save bandwidth[cite: 1]. * Accumulate exactly $960$ samples ($20\text{ms}$)[cite: 1]. * Pass to `audiopus::Encoder`. * Send resulting bytes to the **Network Task** via an asynchronous MPSC channel[cite: 1]. --- ## 3. Jitter Buffer & Playback Logic (`client_node/network`) The Jitter Buffer compensates for unstable internet connection by adding a controlled "latency tax"[cite: 1]. * **The Sorting Mechanism:** Incoming UDP packets are inserted into a `BinaryHeap` (Min-Heap) sorted by **Sequence Number**[cite: 1]. * **The Watermark Strategy:** * Wait until the heap contains at least $40\text{ms}$ (2 frames) of audio before starting playback[cite: 1]. * This buffer allows late-arriving packets to be inserted in the correct order[cite: 1]. * **Playback Tick:** Every $20\text{ms}$, the playback thread pops the next sequence number. * **Success:** Decode the packet. Before pushing to the master `cpal` speaker buffer, multiply the specific user's `f32` decoded array by their local volume scalar (e.g., 0.5 for 50% volume) to enable **Per-User Volume Control**. * **Missing (Packet Loss):** If the sequence number is missing, call `audiopus::Decoder::decode` with a `None` frame to trigger **Packet Loss Concealment (PLC)**, which synthesizes a "guess" of the missing sound[cite: 1]. --- ## 4. Server Relay & Routing (`server_node/udp_relay.rs`) The server acts as a high-speed traffic controller. It must be "Zero-Copy" where possible. * **Validation:** Use `tokio::net::UdpSocket`. On receipt, verify the `u32 Session Token` against the `DashMap` state[cite: 1]. * **Broadcast Logic:** 1. Identify the sender's current `ChannelId`[cite: 1]. 2. Retrieve the list of `SocketAddr` for every other user in that channel[cite: 1]. 3. Iterate and send the exact byte buffer to each address. Use the `bytes` crate to share the buffer via reference counting (`Arc`) instead of cloning[cite: 1]. * **NAT Keep-Alives:** The server must ignore empty 0-byte UDP packets (used by clients to keep router ports open)[cite: 1]. * **TCP Control Lane & Chat Routing:** The TCP router handles synchronized text messages and broadcasts them to users in the same `ChannelId`[cite: 1]. * **Stateful Auto-Reconnect:** If the TCP socket drops, the client quietly reconnects and submits its existing `Session Token` to resume its channel presence without forcing a full re-login[cite: 1]. * **Whisper Lists (Direct UDP Routing):** The server supports targeted UDP forwarding. If a packet header contains a `Target_SessionToken`, the server routes the audio strictly to that user, bypassing the standard channel broadcast. --- ## 5. Wasm Plugin ABI (`client_node/plugins`) Since the Wasm sandbox cannot access host memory directly, we use a shared "mailbox" system. * **The ABI Pattern:** 1. Host (Rust) serializes event data (e.g., `OnMessage`) into JSON[cite: 1]. 2. Host allocates a block of memory inside the Wasm instance and writes the JSON there[cite: 1]. 3. Host calls the Wasm function, passing the memory pointer[cite: 1]. 4. Guest (Wasm) processes and returns a pointer to its response[cite: 1]. * **Audio Intercepts:** For voice changers, the Host passes a raw `&mut [f32]` buffer to the plugin. The plugin modifies the samples "in-place" before they reach the Opus encoder[cite: 1]. --- ## 6. Persistence & State Management (`server_node/database.rs`) The server uses `sqlx` for compile-time safe database interaction[cite: 1]. * **Hashing:** Use `Argon2id` with a salt of at least 16 bytes. Passwords should be hashed with a minimum of $3$ passes and $64\text{MB}$ of memory[cite: 1]. * **Migrations:** On startup, the server checks the `_sqlx_migrations` table. If the code expects a newer schema than the SQLite file has, it applies the `.sql` scripts in order before opening the network ports[cite: 1]. * **Admin API:** The `axum` web server requires a `Bearer` token (JWT) for all sensitive routes (`/api/kick`, `/api/ban`). This token is generated when the Admin logs into the dashboard[cite: 1]. * **Permissions & Access Control:** During TCP `ChannelJoin` events, the server checks the database for `Required_Role` and password locks before permitting entry[cite: 1]. * **Client-Side Persistence (Bookmarks):** The `client_node` maintains a local SQLite or `.toml` file to persist Server Bookmarks (IP, Port, Password, chosen Nickname) so users don't have to manually type connection details. --- ## 7. Zero-Conf Automation Logic (`scripts/install.sh`) * **Environment Check:** Script verifies `systemd` availability[cite: 1]. * **Permissioning:** Creates a non-privileged `voiceapp` user to run the binary (security hardening)[cite: 1]. * **Auto-Update:** `update.sh` compares the local binary hash against the `latest` release on GitHub via the API. If different, it downloads, replaces, and runs `systemctl restart voice_app`[cite: 1] --- ## 8. Testing & Debugging Strategy To ensure the real-time audio pipeline and network remain stable during development, several specific debugging tools are built directly into the workflow, completely avoiding the need for CLI flags or terminal commands. * **Developer Control Panel:** A dedicated "Testing & Debugging" tab within the `egui` client settings. This provides a purely graphical interface for all diagnostic tools. * **UI-Driven Audio Dumper:** A toggle in the Developer Panel that instantly records and writes the DSP pipeline streams to `.wav` files (`raw_mic.wav`, `post_dsp.wav`, `post_opus_decode.wav`) to physically inspect audio quality degradation. * **UI-Driven Chaos Simulator:** Sliders in the Developer Panel that dynamically inject artificial packet loss (%), latency (ms), and packet re-ordering into the outgoing UDP transport layer to stress-test the Jitter Buffer locally. * **In-App Debug Overlay:** An `egui` diagnostic HUD toggled via a UI button (or `F3`) that overlays real-time metrics: Network Ping (TCP and UDP), Jitter Buffer depth (ms), packet loss percentage, and active Opus PLC triggers. * **Load Test Dashboard:** The Server's web admin dashboard (`axum`) will feature a "Stress Test" page. Instead of running terminal scripts, the server admin can click "Spawn 100 Bots", which dynamically spins up headless internal clients that broadcast `.wav` audio to verify the server's UDP routing capacity.