Add remaining project files
This commit is contained in:
96
Documentation/Low_level_plan/Implementation_Plan.md
Normal file
96
Documentation/Low_level_plan/Implementation_Plan.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# Low-Level Implementation Plan
|
||||
|
||||
## 1. Network Packet Anatomy (The Data Plane)
|
||||
To minimize latency, we use a custom binary format for UDP voice data instead of JSON or Protobuf[cite: 1].
|
||||
|
||||
* **UDP Voice Header (Fixed 16 Bytes):**
|
||||
* `u32` (4 bytes): **Session Token.** Generated during TCP handshake. The server drops any packet where the IP/Port does not match this token[cite: 1].
|
||||
* `u64` (8 bytes): **Sequence Number.** Monotonically increasing per user. Essential for the Jitter Buffer to reorder packets[cite: 1].
|
||||
* `u32` (4 bytes): **Timestamp.** Measured in audio samples (increments by 960 per 20ms frame) to handle playback timing[cite: 1].
|
||||
* **Payload:** Raw Opus-encoded bytes (variable length, typically 60–120 bytes). The bitrate is not hardcoded; it is dictated dynamically by the server's `ChannelConfig` (e.g., 16kbps for voice, 96kbps for music bots) when the user joins a room.
|
||||
|
||||
---
|
||||
|
||||
## 2. Real-Time Audio Pipeline (`client_node/audio`)
|
||||
Audio threads must be "lock-free" to prevent stuttering. We use a Single-Producer Single-Consumer (SPSC) ring buffer[cite: 1].
|
||||
|
||||
* **Global Hotkeys / Push-to-Talk:**
|
||||
* Use `global-hotkey` (or `rdev`) to hook OS-level key presses, allowing PTT even when minimized[cite: 1].
|
||||
* **Microphone Thread (The Producer):**
|
||||
* Initialize `cpal` with a 48kHz input stream[cite: 1].
|
||||
* **Rule:** The hardware callback *must only* push raw `f32` samples into the `ringbuf`. No networking or heavy math allowed here[cite: 1].
|
||||
* **DSP/Encoder Thread (The Consumer):**
|
||||
* Pull samples from `ringbuf`.
|
||||
* Process via `webrtc_audio_processing` (Echo Cancellation, Noise Suppression, and Voice Activity Detection/VAD). If VAD detects silence, stop transmitting to save bandwidth[cite: 1].
|
||||
* Accumulate exactly $960$ samples ($20\text{ms}$)[cite: 1].
|
||||
* Pass to `audiopus::Encoder`.
|
||||
* Send resulting bytes to the **Network Task** via an asynchronous MPSC channel[cite: 1].
|
||||
|
||||
---
|
||||
|
||||
## 3. Jitter Buffer & Playback Logic (`client_node/network`)
|
||||
The Jitter Buffer compensates for unstable internet connection by adding a controlled "latency tax"[cite: 1].
|
||||
|
||||
* **The Sorting Mechanism:** Incoming UDP packets are inserted into a `BinaryHeap` (Min-Heap) sorted by **Sequence Number**[cite: 1].
|
||||
* **The Watermark Strategy:**
|
||||
* Wait until the heap contains at least $40\text{ms}$ (2 frames) of audio before starting playback[cite: 1].
|
||||
* This buffer allows late-arriving packets to be inserted in the correct order[cite: 1].
|
||||
* **Playback Tick:** Every $20\text{ms}$, the playback thread pops the next sequence number.
|
||||
* **Success:** Decode the packet. Before pushing to the master `cpal` speaker buffer, multiply the specific user's `f32` decoded array by their local volume scalar (e.g., 0.5 for 50% volume) to enable **Per-User Volume Control**.
|
||||
* **Missing (Packet Loss):** If the sequence number is missing, call `audiopus::Decoder::decode` with a `None` frame to trigger **Packet Loss Concealment (PLC)**, which synthesizes a "guess" of the missing sound[cite: 1].
|
||||
|
||||
---
|
||||
|
||||
## 4. Server Relay & Routing (`server_node/udp_relay.rs`)
|
||||
The server acts as a high-speed traffic controller. It must be "Zero-Copy" where possible.
|
||||
|
||||
* **Validation:** Use `tokio::net::UdpSocket`. On receipt, verify the `u32 Session Token` against the `DashMap` state[cite: 1].
|
||||
* **Broadcast Logic:**
|
||||
1. Identify the sender's current `ChannelId`[cite: 1].
|
||||
2. Retrieve the list of `SocketAddr` for every other user in that channel[cite: 1].
|
||||
3. Iterate and send the exact byte buffer to each address. Use the `bytes` crate to share the buffer via reference counting (`Arc`) instead of cloning[cite: 1].
|
||||
* **NAT Keep-Alives:** The server must ignore empty 0-byte UDP packets (used by clients to keep router ports open)[cite: 1].
|
||||
* **TCP Control Lane & Chat Routing:** The TCP router handles synchronized text messages and broadcasts them to users in the same `ChannelId`[cite: 1].
|
||||
* **Stateful Auto-Reconnect:** If the TCP socket drops, the client quietly reconnects and submits its existing `Session Token` to resume its channel presence without forcing a full re-login[cite: 1].
|
||||
* **Whisper Lists (Direct UDP Routing):** The server supports targeted UDP forwarding. If a packet header contains a `Target_SessionToken`, the server routes the audio strictly to that user, bypassing the standard channel broadcast.
|
||||
|
||||
---
|
||||
|
||||
## 5. Wasm Plugin ABI (`client_node/plugins`)
|
||||
Since the Wasm sandbox cannot access host memory directly, we use a shared "mailbox" system.
|
||||
|
||||
* **The ABI Pattern:**
|
||||
1. Host (Rust) serializes event data (e.g., `OnMessage`) into JSON[cite: 1].
|
||||
2. Host allocates a block of memory inside the Wasm instance and writes the JSON there[cite: 1].
|
||||
3. Host calls the Wasm function, passing the memory pointer[cite: 1].
|
||||
4. Guest (Wasm) processes and returns a pointer to its response[cite: 1].
|
||||
* **Audio Intercepts:** For voice changers, the Host passes a raw `&mut [f32]` buffer to the plugin. The plugin modifies the samples "in-place" before they reach the Opus encoder[cite: 1].
|
||||
|
||||
---
|
||||
|
||||
## 6. Persistence & State Management (`server_node/database.rs`)
|
||||
The server uses `sqlx` for compile-time safe database interaction[cite: 1].
|
||||
|
||||
* **Hashing:** Use `Argon2id` with a salt of at least 16 bytes. Passwords should be hashed with a minimum of $3$ passes and $64\text{MB}$ of memory[cite: 1].
|
||||
* **Migrations:** On startup, the server checks the `_sqlx_migrations` table. If the code expects a newer schema than the SQLite file has, it applies the `.sql` scripts in order before opening the network ports[cite: 1].
|
||||
* **Admin API:** The `axum` web server requires a `Bearer` token (JWT) for all sensitive routes (`/api/kick`, `/api/ban`). This token is generated when the Admin logs into the dashboard[cite: 1].
|
||||
* **Permissions & Access Control:** During TCP `ChannelJoin` events, the server checks the database for `Required_Role` and password locks before permitting entry[cite: 1].
|
||||
* **Client-Side Persistence (Bookmarks):** The `client_node` maintains a local SQLite or `.toml` file to persist Server Bookmarks (IP, Port, Password, chosen Nickname) so users don't have to manually type connection details.
|
||||
|
||||
---
|
||||
|
||||
## 7. Zero-Conf Automation Logic (`scripts/install.sh`)
|
||||
* **Environment Check:** Script verifies `systemd` availability[cite: 1].
|
||||
* **Permissioning:** Creates a non-privileged `voiceapp` user to run the binary (security hardening)[cite: 1].
|
||||
* **Auto-Update:** `update.sh` compares the local binary hash against the `latest` release on GitHub via the API. If different, it downloads, replaces, and runs `systemctl restart voice_app`[cite: 1]
|
||||
|
||||
---
|
||||
|
||||
## 8. Testing & Debugging Strategy
|
||||
To ensure the real-time audio pipeline and network remain stable during development, several specific debugging tools are built directly into the workflow, completely avoiding the need for CLI flags or terminal commands.
|
||||
|
||||
* **Developer Control Panel:** A dedicated "Testing & Debugging" tab within the `egui` client settings. This provides a purely graphical interface for all diagnostic tools.
|
||||
* **UI-Driven Audio Dumper:** A toggle in the Developer Panel that instantly records and writes the DSP pipeline streams to `.wav` files (`raw_mic.wav`, `post_dsp.wav`, `post_opus_decode.wav`) to physically inspect audio quality degradation.
|
||||
* **UI-Driven Chaos Simulator:** Sliders in the Developer Panel that dynamically inject artificial packet loss (%), latency (ms), and packet re-ordering into the outgoing UDP transport layer to stress-test the Jitter Buffer locally.
|
||||
* **In-App Debug Overlay:** An `egui` diagnostic HUD toggled via a UI button (or `F3`) that overlays real-time metrics: Network Ping (TCP and UDP), Jitter Buffer depth (ms), packet loss percentage, and active Opus PLC triggers.
|
||||
* **Load Test Dashboard:** The Server's web admin dashboard (`axum`) will feature a "Stress Test" page. Instead of running terminal scripts, the server admin can click "Spawn 100 Bots", which dynamically spins up headless internal clients that broadcast `.wav` audio to verify the server's UDP routing capacity.
|
||||
Reference in New Issue
Block a user