Files
TS3-vibed/Documentation/Low_level_plan/Implementation_Plan.md
2026-05-03 10:50:25 +02:00

96 lines
8.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Low-Level Implementation Plan
## 1. Network Packet Anatomy (The Data Plane)
To minimize latency, we use a custom binary format for UDP voice data instead of JSON or Protobuf[cite: 1].
* **UDP Voice Header (Fixed 16 Bytes):**
* `u32` (4 bytes): **Session Token.** Generated during TCP handshake. The server drops any packet where the IP/Port does not match this token[cite: 1].
* `u64` (8 bytes): **Sequence Number.** Monotonically increasing per user. Essential for the Jitter Buffer to reorder packets[cite: 1].
* `u32` (4 bytes): **Timestamp.** Measured in audio samples (increments by 960 per 20ms frame) to handle playback timing[cite: 1].
* **Payload:** Raw Opus-encoded bytes (variable length, typically 60120 bytes). The bitrate is not hardcoded; it is dictated dynamically by the server's `ChannelConfig` (e.g., 16kbps for voice, 96kbps for music bots) when the user joins a room.
---
## 2. Real-Time Audio Pipeline (`client_node/audio`)
Audio threads must be "lock-free" to prevent stuttering. We use a Single-Producer Single-Consumer (SPSC) ring buffer[cite: 1].
* **Global Hotkeys / Push-to-Talk:**
* Use `global-hotkey` (or `rdev`) to hook OS-level key presses, allowing PTT even when minimized[cite: 1].
* **Microphone Thread (The Producer):**
* Initialize `cpal` with a 48kHz input stream[cite: 1].
* **Rule:** The hardware callback *must only* push raw `f32` samples into the `ringbuf`. No networking or heavy math allowed here[cite: 1].
* **DSP/Encoder Thread (The Consumer):**
* Pull samples from `ringbuf`.
* Process via `webrtc_audio_processing` (Echo Cancellation, Noise Suppression, and Voice Activity Detection/VAD). If VAD detects silence, stop transmitting to save bandwidth[cite: 1].
* Accumulate exactly $960$ samples ($20\text{ms}$)[cite: 1].
* Pass to `audiopus::Encoder`.
* Send resulting bytes to the **Network Task** via an asynchronous MPSC channel[cite: 1].
---
## 3. Jitter Buffer & Playback Logic (`client_node/network`)
The Jitter Buffer compensates for unstable internet connection by adding a controlled "latency tax"[cite: 1].
* **The Sorting Mechanism:** Incoming UDP packets are inserted into a `BinaryHeap` (Min-Heap) sorted by **Sequence Number**[cite: 1].
* **The Watermark Strategy:**
* Wait until the heap contains at least $40\text{ms}$ (2 frames) of audio before starting playback[cite: 1].
* This buffer allows late-arriving packets to be inserted in the correct order[cite: 1].
* **Playback Tick:** Every $20\text{ms}$, the playback thread pops the next sequence number.
* **Success:** Decode the packet. Before pushing to the master `cpal` speaker buffer, multiply the specific user's `f32` decoded array by their local volume scalar (e.g., 0.5 for 50% volume) to enable **Per-User Volume Control**.
* **Missing (Packet Loss):** If the sequence number is missing, call `audiopus::Decoder::decode` with a `None` frame to trigger **Packet Loss Concealment (PLC)**, which synthesizes a "guess" of the missing sound[cite: 1].
---
## 4. Server Relay & Routing (`server_node/udp_relay.rs`)
The server acts as a high-speed traffic controller. It must be "Zero-Copy" where possible.
* **Validation:** Use `tokio::net::UdpSocket`. On receipt, verify the `u32 Session Token` against the `DashMap` state[cite: 1].
* **Broadcast Logic:**
1. Identify the sender's current `ChannelId`[cite: 1].
2. Retrieve the list of `SocketAddr` for every other user in that channel[cite: 1].
3. Iterate and send the exact byte buffer to each address. Use the `bytes` crate to share the buffer via reference counting (`Arc`) instead of cloning[cite: 1].
* **NAT Keep-Alives:** The server must ignore empty 0-byte UDP packets (used by clients to keep router ports open)[cite: 1].
* **TCP Control Lane & Chat Routing:** The TCP router handles synchronized text messages and broadcasts them to users in the same `ChannelId`[cite: 1].
* **Stateful Auto-Reconnect:** If the TCP socket drops, the client quietly reconnects and submits its existing `Session Token` to resume its channel presence without forcing a full re-login[cite: 1].
* **Whisper Lists (Direct UDP Routing):** The server supports targeted UDP forwarding. If a packet header contains a `Target_SessionToken`, the server routes the audio strictly to that user, bypassing the standard channel broadcast.
---
## 5. Wasm Plugin ABI (`client_node/plugins`)
Since the Wasm sandbox cannot access host memory directly, we use a shared "mailbox" system.
* **The ABI Pattern:**
1. Host (Rust) serializes event data (e.g., `OnMessage`) into JSON[cite: 1].
2. Host allocates a block of memory inside the Wasm instance and writes the JSON there[cite: 1].
3. Host calls the Wasm function, passing the memory pointer[cite: 1].
4. Guest (Wasm) processes and returns a pointer to its response[cite: 1].
* **Audio Intercepts:** For voice changers, the Host passes a raw `&mut [f32]` buffer to the plugin. The plugin modifies the samples "in-place" before they reach the Opus encoder[cite: 1].
---
## 6. Persistence & State Management (`server_node/database.rs`)
The server uses `sqlx` for compile-time safe database interaction[cite: 1].
* **Hashing:** Use `Argon2id` with a salt of at least 16 bytes. Passwords should be hashed with a minimum of $3$ passes and $64\text{MB}$ of memory[cite: 1].
* **Migrations:** On startup, the server checks the `_sqlx_migrations` table. If the code expects a newer schema than the SQLite file has, it applies the `.sql` scripts in order before opening the network ports[cite: 1].
* **Admin API:** The `axum` web server requires a `Bearer` token (JWT) for all sensitive routes (`/api/kick`, `/api/ban`). This token is generated when the Admin logs into the dashboard[cite: 1].
* **Permissions & Access Control:** During TCP `ChannelJoin` events, the server checks the database for `Required_Role` and password locks before permitting entry[cite: 1].
* **Client-Side Persistence (Bookmarks):** The `client_node` maintains a local SQLite or `.toml` file to persist Server Bookmarks (IP, Port, Password, chosen Nickname) so users don't have to manually type connection details.
---
## 7. Zero-Conf Automation Logic (`scripts/install.sh`)
* **Environment Check:** Script verifies `systemd` availability[cite: 1].
* **Permissioning:** Creates a non-privileged `voiceapp` user to run the binary (security hardening)[cite: 1].
* **Auto-Update:** `update.sh` compares the local binary hash against the `latest` release on GitHub via the API. If different, it downloads, replaces, and runs `systemctl restart voice_app`[cite: 1]
---
## 8. Testing & Debugging Strategy
To ensure the real-time audio pipeline and network remain stable during development, several specific debugging tools are built directly into the workflow, completely avoiding the need for CLI flags or terminal commands.
* **Developer Control Panel:** A dedicated "Testing & Debugging" tab within the `egui` client settings. This provides a purely graphical interface for all diagnostic tools.
* **UI-Driven Audio Dumper:** A toggle in the Developer Panel that instantly records and writes the DSP pipeline streams to `.wav` files (`raw_mic.wav`, `post_dsp.wav`, `post_opus_decode.wav`) to physically inspect audio quality degradation.
* **UI-Driven Chaos Simulator:** Sliders in the Developer Panel that dynamically inject artificial packet loss (%), latency (ms), and packet re-ordering into the outgoing UDP transport layer to stress-test the Jitter Buffer locally.
* **In-App Debug Overlay:** An `egui` diagnostic HUD toggled via a UI button (or `F3`) that overlays real-time metrics: Network Ping (TCP and UDP), Jitter Buffer depth (ms), packet loss percentage, and active Opus PLC triggers.
* **Load Test Dashboard:** The Server's web admin dashboard (`axum`) will feature a "Stress Test" page. Instead of running terminal scripts, the server admin can click "Spawn 100 Bots", which dynamically spins up headless internal clients that broadcast `.wav` audio to verify the server's UDP routing capacity.