6.1 KiB
General Concept: Rust Voice Communication App
1. Core Philosophy
The application operates on a "Switchboard and Walkie-Talkie" model, designed for instant, drop-in voice communication.
- The Switchboard (Server): A central routing hub. It maintains the blueprint of all channels and tracks which users are in which rooms. It does not process audio; it strictly relays data to the correct destinations.
- The Walkie-Talkies (Clients): The desktop applications. They capture local microphone input, compress it, send it to the server, and decompress incoming audio for playback.
2. The User Experience (Core Features)
- Persistent Room List: A static hierarchy of voice channels displayed on the side panel.
- Drop-In Audio: No ringing or answering. Users click a room and are immediately broadcasting and receiving audio.
- Text Chat: A synchronized text channel for every voice room, allowing users to share links and messages with current occupants.
- Active Speaker Indicators: Visual cues (e.g., green outlines) next to user avatars that illuminate when voice data is being transmitted.
- Hardware Controls: Easily accessible global Mute (microphone) and Deafen (headphones) toggles.
3. The Two-Lane Network Architecture
To guarantee a responsive UI while preventing robotic, lagging audio, the app utilizes two simultaneous network streams:
- The Control Lane (TCP): Slow but 100% reliable. Used for text messages, channel movements, authentication, and state updates. Ensures critical data is never lost.
- The Voice Lane (UDP): Blazing fast but unreliable. Blasts compressed audio packets continuously. If a packet drops, the client discards it and moves to the newest data, prioritizing real-time delivery over perfect quality to prevent audio delay.
4. Cross-Platform Strategy
The app is natively compiled for Windows, macOS, and Linux from a single Rust codebase.
- Audio I/O: Handled via the
cpalcrate to interface seamlessly with WASAPI (Windows), CoreAudio (Mac), and ALSA/PulseAudio (Linux). - User Interface: Powered by
eguiandeframe, rendering natively via the system's preferred graphics API (DirectX, Metal, Vulkan/OpenGL). - Global Hotkeys: Handled via OS-specific registry hooks to capture Push-to-Talk events even when the application is minimized.
5. WebAssembly (Wasm) Plugin System
A secure, language-agnostic extension framework that allows users to modify the client's behavior without altering the core Rust binary.
- The Sandbox: Plugins run inside an isolated Wasm runtime. A malicious or broken plugin can crash its own sandbox but cannot crash the main voice client or access unauthorized system files.
- Language Agnostic: Users can write plugins in Python, JavaScript, Go, or Rust, compiling them down to a
.wasmfile. - Event Hooks: The core application broadcasts specific triggers into the sandbox (e.g.,
OnUserJoinChannel,OnAudioFrameCaptured), allowing plugins to react to network events, manipulate local audio streams (e.g., voice changers), or automate chat functions.
6. Audio DSP Pipeline (Quality Control)
Raw microphone input is inherently messy. Before audio is compressed and sent to the network, it must pass through a local Digital Signal Processing (DSP) chain to ensure professional voice quality.
- Acoustic Echo Cancellation (AEC): Prevents the user's microphone from re-broadcasting audio coming from their own speakers.
- Noise Suppression: Filters out continuous background noise (e.g., keyboard clacking, computer fans) using a lightweight algorithm (like WebRTC DSP or RNNoise).
- Voice Activity Detection (VAD) / Noise Gate: Automatically stops transmitting network packets when the user is not actively speaking, saving massive amounts of bandwidth.
7. Identity and Authorization
The system employs a strict Role-Based Access Control (RBAC) architecture to maintain order within the server.
- The Hierarchy: Users are assigned roles (e.g., Guest, Member, Moderator, Admin) which dictate their permissions.
- Channel Permissions: Specific rooms can be locked behind passwords or restricted to specific roles.
- Moderation Tools: Authorized users have the network authority to send
Kick,Ban, orServerMutecommands, which the server enforces by dropping the target's network connections or ignoring their UDP packets.
8. Security and Resiliency
The application is designed to survive hostile network conditions and protect user privacy.
- Stateful Auto-Reconnect: If the TCP control lane drops due to a network hiccup, the client enters a "Reconnecting" state. It will silently attempt to re-establish the connection and re-join their previous voice channel without requiring user interaction.
- Encryption-in-Transit: All TCP control traffic (text chat, logins) is wrapped in TLS.
- Voice Encryption: UDP voice packets are encrypted using a lightweight symmetric key (like ChaCha20 or AES-GCM) exchanged securely during the initial TCP handshake, preventing packet-sniffing on public networks.
9. Self-Hosting and Web Administration
The server is designed for decentralized, user-hosted deployment. It compiles into a single, standalone executable that requires no external dependencies (no separate web servers or database installations).
- Tri-Port Architecture: The single server binary binds to three ports simultaneously:
- TCP Control Lane (Client connections)
- UDP Voice Lane (Audio routing)
- HTTP Web Server (Admin dashboard)
- The Web Dashboard (
axum): A lightweight, embedded web server provides a visual interface for server owners to manage their instance from any web browser. - Embedded Assets (
rust-embed): The entire HTML/CSS/JS frontend for the admin dashboard is compiled directly into the Rust server binary. The server hosts its own admin panel from memory. - Admin REST API: The web dashboard communicates with the core server via secure HTTP endpoints (e.g.,
GET /api/users,POST /api/kick/:id), protected by standard JWT authentication. This API interacts directly with the live concurrent state (DashMap) of the voice server.