Why not WebRTC with Opus + NIP-29? I debated the idea with my agent and here's the summary. Care to hear your thoughts on this.
For a NIP-29 group chat app, the cleanest “voice chat” implementation is:
1) Transport: WebRTC (real-time, NAT traversal, battle-tested)
Why: you get P2P audio with congestion control, jitter buffers, echo cancellation, device access, and NAT traversal (ICE/STUN/TURN) out of the box.
Codec: Opus (use this unless you have a very specific reason not to). It’s effectively the WebRTC audio codec and is designed for low-latency interactive audio like VoIP / in-game chat.
https://developer.mozilla.org/en-US/docs/Web/Media/Guides/Formats/WebRTC_codecs2) Signalling: use Nostr for session setup (offers/answers/ICE candidates)
Nostr is great as the signalling plane (exchange SDP + ICE candidates), while WebRTC carries the media.
A common approach:
When a user taps “Join voice” in a NIP-29 room, you publish encrypted signalling messages (offer/answer/candidates) to the other participant(s) (or to a coordinator) via Nostr.
There’s already discussion/precedent for doing WebRTC signalling over Nostr DMs.
https://github.com/nostr-protocol/nips/issues/7713) Topology decision (this is the real product choice)
A) Small groups (2–6 people): Mesh P2P
Everyone connects to everyone (N×(N−1) streams).
Pros: simplest infra (no media server), very “sovereign”.
Cons: scales poorly; uplink dies fast as people join.
B) Anything beyond small groups: SFU (Selective Forwarding Unit)
Each client sends one upstream audio stream to the SFU; SFU forwards to others.
Pros: scales to larger rooms; better UX on mobile/weak uplinks.
Cons: you run infra (but it can be self-hosted; still fits the Nostr ethos if you keep it modular).
If you want “Discord-like” rooms, you want an SFU.
4) Practical codec settings (sane defaults)
With Opus in WebRTC you typically ship:
48 kHz Opus, variable bitrate, 20 ms frames (WebRTC defaults are usually fine)
Enable DTX (silence suppression) for bandwidth savings in speech rooms
For “music mode” rooms, allow toggling off aggressive noise suppression/AGC (WebRTC defaults are voice-optimised)
5) Encryption & identity
WebRTC media is encrypted (DTLS-SRTP).
Nostr keys can authenticate who is allowed into the voice room (tie it to NIP-29 membership/moderation rules). NIP-29 is explicitly about relay-managed closed groups.
https://github.com/nostr-protocol/nips/blob/master/29.mdTL;DR implementation pick
WebRTC + Opus for audio
Nostr for signalling (offers/answers/candidates), encrypted
Mesh for tiny rooms, SFU for real rooms