# Technical Specifications & Standards ## 1. Database Architecture (Self-Hosted State) The server uses an embedded, file-based database for persistent storage, allowing the server to be a single binary. * **Core Library:** `sqlx` with SQLite. (Strictly using `sqlx` macros for compile-time query verification to prevent SQL injection). * **Cryptography Standard:** `rust-argon2` for password hashing. Passwords are never stored or transmitted in plain text. * **High-Level Schema Map:** * `Users`: ID, Username, Argon2_Hash, Global_Role. * `Channels`: ID, Parent_ID (for nesting), Name, Is_Voice, Required_Role. * `Bans`: IP_Address, User_ID, Expiry_Date. ## 2. WebAssembly (Wasm) Plugin API The plugin system must be universally accessible (polyglot) but strictly sandboxed. * **Core Library:** `extism` (Extism is vastly superior to raw Wasmtime for this use case because it automatically handles passing complex strings and byte arrays between the host and the plugin, avoiding manual memory pointer math). * **Data Exchange Standard:** All data passed between the Rust Host and the Wasm Guest is serialized using `JSON`. * **The API Boundary (What plugins CAN do):** * *Read-Only State:* Plugins can query the current channel layout and user list. * *Intercept Audio:* Plugins can request the raw f32 audio buffer *before* Opus encoding to apply voice effects. * *Inject Chat:* Plugins can send text messages via the bot/plugin API. * **The API Boundary (What plugins CANNOT do):** File system access and raw network socket access are strictly denied at the Wasm runtime level. ## 3. UI and Concurrency Architecture (The Actor Pattern) `egui` redraws the screen at 60 frames per second. If it waits for a network packet, the app freezes. We must use the **Actor Pattern** to keep them isolated. * **Core Libraries:** `eframe` (for `egui`) and `tokio::sync`. * **The Downstream (UI to Network):** Uses `tokio::sync::mpsc` (Multi-Producer, Single-Consumer). When a user clicks "Connect," the UI sends an enum variant `UiAction::Connect(ip)` down the channel and immediately returns to drawing the screen. * **The Upstream (Network to UI):** Uses `tokio::sync::watch`. The background Tokio network thread holds the "Master State" (who is speaking, who is in the channel). It pushes updates to the `watch` channel. The UI simply reads the latest value from this channel on every frame and draws it. ## 4. Audio Engine & DSP Standards Real-time audio requires strict mathematical constraints to guarantee low latency and prevent lag. * **Core Libraries:** `audiopus` (Opus compression), `cpal` (Hardware IO), and `webrtc-audio-processing` (Rust bindings for Google's WebRTC DSP). * **DSP Pipeline (Crucial for preventing echo):** Raw Mic Audio -> WebRTC Noise Suppression -> WebRTC Acoustic Echo Cancellation (AEC) -> Opus Encoder -> UDP Socket. * **Mathematical Standards:** * **Sample Rate:** Strictly locked to `48,000 Hz` (48kHz). This is the Opus codec standard. * **Frame Size:** Strictly locked to `20 milliseconds`. At 48kHz, this is exactly `960 samples` per frame. You cannot send more or less; Opus requires exact frame boundaries. * **Channels:** Microphone input is captured in `Mono` (1 channel). Speaker output is played in `Stereo` (2 channels, allowing for 3D positional audio later). * **Bitrate:** Variable Bitrate (VBR) targeting `48 kbps`. This provides crystal clear voice while using practically zero internet bandwidth. ## 5. Network Transport & NAT Traversal Strategy Home routers aggressively block incoming UDP traffic. We must define how voice packets survive NAT (Network Address Translation) firewalls. * **UDP Hole Punching Standard:** * The client must send a tiny, empty "Keep-Alive" UDP packet to the server every 5 seconds. This keeps the user's router port open so the server's incoming voice packets aren't blocked. * **UDP Payload Structure:** Every single UDP packet must begin with a strict binary header before the Opus payload: * `[Session Token: u32]` (Who is sending this?) * `[Sequence Number: u64]` (What order is this packet in?) * `[Timestamp: u64]` (When was this spoken?) * `[Encrypted Opus Data: Vec]` ## 6. Cryptography & Security Standards Audio must be encrypted so internet Service Providers (or hackers on public Wi-Fi) cannot listen to private voice channels. * **Core Libraries:** `rustls` (for TCP TLS) and `chacha20poly1305` (for UDP payload encryption). * **The Handshake Protocol:** 1. The client connects via TCP (which is secured by standard TLS). 2. The server generates a unique, temporary symmetric encryption key (ChaCha20) for that specific user session. 3. The server sends this key to the client over the secure TCP lane. 4. Both the client and server use this key to rapidly encrypt and decrypt the UDP voice packets. ## 7. Audio Playout Strategy (The Jitter Buffer) UDP packets do not arrive in the exact order they were sent. Some arrive fast, some arrive slow, and some arrive backward. If you play them the millisecond they arrive, the audio will crackle and pop. * **Standard Requirement: The Jitter Buffer.** * **Implementation Rule:** The receiving client must hold incoming UDP packets in a priority queue sorted by `Sequence Number` for a minimum of `40 milliseconds` (holding roughly 2 frames of audio) before sending them to the `cpal` speaker thread. * **Missing Packet Logic:** If a packet sequence number is completely missing after the 40ms wait time, the AI must trigger the `audiopus` decoder's built-in "Packet Loss Concealment" (PLC) to artificially guess the missing audio and prevent a hard static pop. ## 8. Observability and Debugging When the self-hosted server crashes on a Linux VPS, you cannot use print statements to figure out why. You need structured, asynchronous logging. * **Core Libraries:** `tracing` and `tracing-subscriber`. * **Implementation Rule:** Do not use `println!()`. All state changes, network drops, and database queries must be logged using `tracing::info!`, `tracing::warn!`, or `tracing::error!`. The server must output these logs to a rolling `.log` file on the host machine.