Add remaining project files

This commit is contained in:
sam
2026-05-03 10:50:25 +02:00
parent 302fdb5459
commit 989d3bcc9f
16 changed files with 644 additions and 0 deletions

View File

@@ -0,0 +1,74 @@
# General Concept: Rust Voice Communication App
## 1. Core Philosophy
The application operates on a "Switchboard and Walkie-Talkie" model, designed for instant, drop-in voice communication.
* **The Switchboard (Server):** A central routing hub. It maintains the blueprint of all channels and tracks which users are in which rooms. It **does not** process audio; it strictly relays data to the correct destinations.
* **The Walkie-Talkies (Clients):** The desktop applications. They capture local microphone input, compress it, send it to the server, and decompress incoming audio for playback.
## 2. The User Experience (Core Features)
* **Persistent Room List:** A static hierarchy of voice channels displayed on the side panel.
* **Drop-In Audio:** No ringing or answering. Users click a room and are immediately broadcasting and receiving audio.
* **Text Chat:** A synchronized text channel for every voice room, allowing users to share links and messages with current occupants.
* **Active Speaker Indicators:** Visual cues (e.g., green outlines) next to user avatars that illuminate when voice data is being transmitted.
* **Hardware Controls:** Easily accessible global Mute (microphone) and Deafen (headphones) toggles.
## 3. The Two-Lane Network Architecture
To guarantee a responsive UI while preventing robotic, lagging audio, the app utilizes two simultaneous network streams:
* **The Control Lane (TCP):** Slow but 100% reliable. Used for text messages, channel movements, authentication, and state updates. Ensures critical data is never lost.
* **The Voice Lane (UDP):** Blazing fast but unreliable. Blasts compressed audio packets continuously. If a packet drops, the client discards it and moves to the newest data, prioritizing real-time delivery over perfect quality to prevent audio delay.
## 4. Cross-Platform Strategy
The app is natively compiled for Windows, macOS, and Linux from a single Rust codebase.
* **Audio I/O:** Handled via the `cpal` crate to interface seamlessly with WASAPI (Windows), CoreAudio (Mac), and ALSA/PulseAudio (Linux).
* **User Interface:** Powered by `egui` and `eframe`, rendering natively via the system's preferred graphics API (DirectX, Metal, Vulkan/OpenGL).
* **Global Hotkeys:** Handled via OS-specific registry hooks to capture Push-to-Talk events even when the application is minimized.
## 5. WebAssembly (Wasm) Plugin System
A secure, language-agnostic extension framework that allows users to modify the client's behavior without altering the core Rust binary.
* **The Sandbox:** Plugins run inside an isolated Wasm runtime. A malicious or broken plugin can crash its own sandbox but cannot crash the main voice client or access unauthorized system files.
* **Language Agnostic:** Users can write plugins in Python, JavaScript, Go, or Rust, compiling them down to a `.wasm` file.
* **Event Hooks:** The core application broadcasts specific triggers into the sandbox (e.g., `OnUserJoinChannel`, `OnAudioFrameCaptured`), allowing plugins to react to network events, manipulate local audio streams (e.g., voice changers), or automate chat functions.
## 6. Audio DSP Pipeline (Quality Control)
Raw microphone input is inherently messy. Before audio is compressed and sent to the network, it must pass through a local Digital Signal Processing (DSP) chain to ensure professional voice quality.
* **Acoustic Echo Cancellation (AEC):** Prevents the user's microphone from re-broadcasting audio coming from their own speakers.
* **Noise Suppression:** Filters out continuous background noise (e.g., keyboard clacking, computer fans) using a lightweight algorithm (like WebRTC DSP or RNNoise).
* **Voice Activity Detection (VAD) / Noise Gate:** Automatically stops transmitting network packets when the user is not actively speaking, saving massive amounts of bandwidth.
## 7. Identity and Authorization
The system employs a strict Role-Based Access Control (RBAC) architecture to maintain order within the server.
* **The Hierarchy:** Users are assigned roles (e.g., Guest, Member, Moderator, Admin) which dictate their permissions.
* **Channel Permissions:** Specific rooms can be locked behind passwords or restricted to specific roles.
* **Moderation Tools:** Authorized users have the network authority to send `Kick`, `Ban`, or `ServerMute` commands, which the server enforces by dropping the target's network connections or ignoring their UDP packets.
## 8. Security and Resiliency
The application is designed to survive hostile network conditions and protect user privacy.
* **Stateful Auto-Reconnect:** If the TCP control lane drops due to a network hiccup, the client enters a "Reconnecting" state. It will silently attempt to re-establish the connection and re-join their previous voice channel without requiring user interaction.
* **Encryption-in-Transit:** All TCP control traffic (text chat, logins) is wrapped in TLS.
* **Voice Encryption:** UDP voice packets are encrypted using a lightweight symmetric key (like ChaCha20 or AES-GCM) exchanged securely during the initial TCP handshake, preventing packet-sniffing on public networks.
## 9. Self-Hosting and Web Administration
The server is designed for decentralized, user-hosted deployment. It compiles into a single, standalone executable that requires no external dependencies (no separate web servers or database installations).
* **Tri-Port Architecture:** The single server binary binds to three ports simultaneously:
* TCP Control Lane (Client connections)
* UDP Voice Lane (Audio routing)
* HTTP Web Server (Admin dashboard)
* **The Web Dashboard (`axum`):** A lightweight, embedded web server provides a visual interface for server owners to manage their instance from any web browser.
* **Embedded Assets (`rust-embed`):** The entire HTML/CSS/JS frontend for the admin dashboard is compiled directly into the Rust server binary. The server hosts its own admin panel from memory.
* **Admin REST API:** The web dashboard communicates with the core server via secure HTTP endpoints (e.g., `GET /api/users`, `POST /api/kick/:id`), protected by standard JWT authentication. This API interacts directly with the live concurrent state (`DashMap`) of the voice server.