# General Concept: Rust Voice Communication App ## 1. Core Philosophy The application operates on a "Switchboard and Walkie-Talkie" model, designed for instant, drop-in voice communication. * **The Switchboard (Server):** A central routing hub. It maintains the blueprint of all channels and tracks which users are in which rooms. It **does not** process audio; it strictly relays data to the correct destinations. * **The Walkie-Talkies (Clients):** The desktop applications. They capture local microphone input, compress it, send it to the server, and decompress incoming audio for playback. ## 2. The User Experience (Core Features) * **Persistent Room List:** A static hierarchy of voice channels displayed on the side panel. * **Drop-In Audio:** No ringing or answering. Users click a room and are immediately broadcasting and receiving audio. * **Text Chat:** A synchronized text channel for every voice room, allowing users to share links and messages with current occupants. * **Active Speaker Indicators:** Visual cues (e.g., green outlines) next to user avatars that illuminate when voice data is being transmitted. * **Hardware Controls:** Easily accessible global Mute (microphone) and Deafen (headphones) toggles. ## 3. The Two-Lane Network Architecture To guarantee a responsive UI while preventing robotic, lagging audio, the app utilizes two simultaneous network streams: * **The Control Lane (TCP):** Slow but 100% reliable. Used for text messages, channel movements, authentication, and state updates. Ensures critical data is never lost. * **The Voice Lane (UDP):** Blazing fast but unreliable. Blasts compressed audio packets continuously. If a packet drops, the client discards it and moves to the newest data, prioritizing real-time delivery over perfect quality to prevent audio delay. ## 4. Cross-Platform Strategy The app is natively compiled for Windows, macOS, and Linux from a single Rust codebase. * **Audio I/O:** Handled via the `cpal` crate to interface seamlessly with WASAPI (Windows), CoreAudio (Mac), and ALSA/PulseAudio (Linux). * **User Interface:** Powered by `egui` and `eframe`, rendering natively via the system's preferred graphics API (DirectX, Metal, Vulkan/OpenGL). * **Global Hotkeys:** Handled via OS-specific registry hooks to capture Push-to-Talk events even when the application is minimized. ## 5. WebAssembly (Wasm) Plugin System A secure, language-agnostic extension framework that allows users to modify the client's behavior without altering the core Rust binary. * **The Sandbox:** Plugins run inside an isolated Wasm runtime. A malicious or broken plugin can crash its own sandbox but cannot crash the main voice client or access unauthorized system files. * **Language Agnostic:** Users can write plugins in Python, JavaScript, Go, or Rust, compiling them down to a `.wasm` file. * **Event Hooks:** The core application broadcasts specific triggers into the sandbox (e.g., `OnUserJoinChannel`, `OnAudioFrameCaptured`), allowing plugins to react to network events, manipulate local audio streams (e.g., voice changers), or automate chat functions. ## 6. Audio DSP Pipeline (Quality Control) Raw microphone input is inherently messy. Before audio is compressed and sent to the network, it must pass through a local Digital Signal Processing (DSP) chain to ensure professional voice quality. * **Acoustic Echo Cancellation (AEC):** Prevents the user's microphone from re-broadcasting audio coming from their own speakers. * **Noise Suppression:** Filters out continuous background noise (e.g., keyboard clacking, computer fans) using a lightweight algorithm (like WebRTC DSP or RNNoise). * **Voice Activity Detection (VAD) / Noise Gate:** Automatically stops transmitting network packets when the user is not actively speaking, saving massive amounts of bandwidth. ## 7. Identity and Authorization The system employs a strict Role-Based Access Control (RBAC) architecture to maintain order within the server. * **The Hierarchy:** Users are assigned roles (e.g., Guest, Member, Moderator, Admin) which dictate their permissions. * **Channel Permissions:** Specific rooms can be locked behind passwords or restricted to specific roles. * **Moderation Tools:** Authorized users have the network authority to send `Kick`, `Ban`, or `ServerMute` commands, which the server enforces by dropping the target's network connections or ignoring their UDP packets. ## 8. Security and Resiliency The application is designed to survive hostile network conditions and protect user privacy. * **Stateful Auto-Reconnect:** If the TCP control lane drops due to a network hiccup, the client enters a "Reconnecting" state. It will silently attempt to re-establish the connection and re-join their previous voice channel without requiring user interaction. * **Encryption-in-Transit:** All TCP control traffic (text chat, logins) is wrapped in TLS. * **Voice Encryption:** UDP voice packets are encrypted using a lightweight symmetric key (like ChaCha20 or AES-GCM) exchanged securely during the initial TCP handshake, preventing packet-sniffing on public networks. ## 9. Self-Hosting and Web Administration The server is designed for decentralized, user-hosted deployment. It compiles into a single, standalone executable that requires no external dependencies (no separate web servers or database installations). * **Tri-Port Architecture:** The single server binary binds to three ports simultaneously: * TCP Control Lane (Client connections) * UDP Voice Lane (Audio routing) * HTTP Web Server (Admin dashboard) * **The Web Dashboard (`axum`):** A lightweight, embedded web server provides a visual interface for server owners to manage their instance from any web browser. * **Embedded Assets (`rust-embed`):** The entire HTML/CSS/JS frontend for the admin dashboard is compiled directly into the Rust server binary. The server hosts its own admin panel from memory. * **Admin REST API:** The web dashboard communicates with the core server via secure HTTP endpoints (e.g., `GET /api/users`, `POST /api/kick/:id`), protected by standard JWT authentication. This API interacts directly with the live concurrent state (`DashMap`) of the voice server.