Voice Gateway Versioning
| Version | Status | WebSocket URL Append | Change |
|---|---|---|---|
| 8 | Recommended | ?v=8 | Added server message buffering; missed messages re-delivered on resume |
| 7 | Available | ?v=7 | Added channel options opcode |
| 6 | Available | ?v=6 | Added code version opcode |
| 5 | Available | ?v=5 | Added video sink wants opcode |
| 4 | Available | ?v=4 | Changed speaking status to bitmask from boolean |
| 3 | Deprecated | ?v=3 | Added video |
| 2 | Deprecated | ?v=2 | Changed heartbeat reply to Heartbeat ACK opcode |
| 1 | Deprecated | ?v=1 | Initial version |
Connecting to Voice
Step 1: Retrieve Voice Server Information
Send an Opcode 4 Gateway Voice State Update to the main Gateway to join a voice channel:- Voice State Update — contains a
session_id - Voice Server Update — contains the
tokenandendpoint
Bot users respect the voice channel’s user limit, if set. When the voice channel is full, you will not receive the Voice State Update or Voice Server Update events. Having
MOVE_MEMBERS permission bypasses this limit.Step 2: Establish the Voice WebSocket Connection
Connect towss://ENDPOINT?v=8. The endpoint from Voice Server Update does not include wss:// — you must prepend it manually.
Send Opcode 0 Identify with your credentials:
Heartbeating
After connecting, the voice server sends Opcode 8 Hello with the heartbeat interval:Since voice gateway version 8, heartbeat messages must include
seq_ack — the sequence number of the last numbered message received from the gateway. See Buffered Resume for details.Establishing a Voice UDP Connection
Using theip and port from the Opcode 2 Ready payload, open a UDP connection. Optionally perform IP Discovery to determine your external IP and port, then send Opcode 1 Select Protocol:
Transport Encryption and Sending Voice
Voice data must be encoded with Opus (stereo, 48kHz sample rate) and sent with an RTP header followed by encrypted Opus audio.Encryption Modes
| Mode | Key | Nonce | Status |
|---|---|---|---|
| AEAD AES256-GCM (RTP Size) | aead_aes256_gcm_rtpsize | 32-bit incremental integer, appended to payload | Available (Preferred) |
| AEAD XChaCha20 Poly1305 (RTP Size) | aead_xchacha20_poly1305_rtpsize | 32-bit incremental integer, appended to payload | Available (Required) |
| XSalsa20 Poly1305 Lite (RTP Size) | xsalsa20_poly1305_lite_rtpsize | 32-bit incremental integer, appended to payload | Deprecated |
| AEAD AES256-GCM | aead_aes256_gcm | 32-bit incremental integer, appended to payload | Deprecated |
| XSalsa20 Poly1305 | xsalsa20_poly1305 | Copy of RTP header | Deprecated |
| XSalsa20 Poly1305 Suffix | xsalsa20_poly1305_suffix | 24 random bytes | Deprecated |
| XSalsa20 Poly1305 Lite | xsalsa20_poly1305_lite | 32-bit incremental integer, appended to payload | Deprecated |
secret_key (32-byte array) used for transport encryption:
Voice Packet Structure
| Field | Type | Size |
|---|---|---|
| Version + Flags | Single byte value of 0x80 | 1 byte |
| Payload Type | Single byte value of 0x78 | 1 byte |
| Sequence | Unsigned short (big endian) | 2 bytes |
| Timestamp | Unsigned integer (big endian) | 4 bytes |
| SSRC | Unsigned integer (big endian) | 4 bytes |
| Encrypted audio | Binary data | n bytes |
Speaking
Before sending any audio, you must send at least one Opcode 5 Speaking payload to set the initial speaking mode and update your SSRC. The speaking flags are a bitfield:| Flag | Value | Meaning |
|---|---|---|
| Microphone | 1 << 0 | Normal transmission of voice audio |
| Soundshare | 1 << 1 | Transmission of context audio for video, no speaking indicator |
| Priority | 1 << 2 | Priority speaker, lowering audio of other speakers |
The
delay property should be set to 0 for bots.Voice Data Interpolation
When there is a break in the sent data, do not simply stop transmitting. Send five frames of silence (0xF8, 0xFF, 0xFE) before stopping to avoid unintended Opus interpolation with subsequent transmissions.
Resuming Voice Connection
When your client detects a severed connection, open a new WebSocket and send Opcode 7 Resume:Buffered Resume
Since voice gateway version 8, the gateway can resend messages that were lost during a disconnect. To support this:- The gateway includes a
seqfield on messages that may need re-delivery. - Clients must track the last
seqvalue received. - Include
seq_ackin both Heartbeat and Resume payloads.
seq_ack can be omitted or set to -1. The gateway handles sequence number wrap-around automatically.
IP Discovery
Most routers on the Internet mask UDP ports through NAT. Use IP discovery to find your external IP and port for receiving voice. Send the following UDP packet (all numeric fields are big endian) to your voice port:| Field | Description | Size |
|---|---|---|
| Type | 0x1 for request, 0x2 for response | 2 bytes |
| Length | Message length excluding Type and Length fields (value 70) | 2 bytes |
| SSRC | Your SSRC as an unsigned integer | 4 bytes |
| Address | Null-terminated string in response | 64 bytes |
| Port | Unsigned short | 2 bytes |
End-to-End Encryption (DAVE Protocol)
Since September 2024, Discord is migrating voice and video in DMs, Group DMs, voice channels, and Go Live streams to use end-to-end encryption (E2EE) via the DAVE protocol. The most thorough documentation on the DAVE protocol is available in the Protocol Whitepaper. Discord’s open-source library libdave can assist your implementation. When a call is E2EE, all members exchange keys via a Messaging Layer Security (MLS) group. This group derives per-sender ratcheted media keys to encrypt/decrypt media frames.Binary WebSocket Messages
Some DAVE protocol opcodes are sent as binary WebSocket messages rather than JSON. Binary messages have the following format:| Field | Description | Size |
|---|---|---|
| Sequence Number | OPTIONAL (server → client only) big-endian uint16 sequence number | 2 bytes |
| Opcode | Unsigned integer opcode value | 1 byte |
| Payload | Binary message payload (format defined by opcode) | Variable bytes |
Indicating DAVE Protocol Support
Include the highest DAVE protocol version you support in Opcode 0 Identify asmax_dave_protocol_version. Sending 0 or omitting the field indicates no DAVE protocol support.
The voice gateway specifies the selected version in Opcode 4 Session Description under dave_protocol_version.
Protocol Transitions
Transitions occur when upgrading/downgrading E2EE, changing protocol versions, or when the MLS group changes. The flow:- Server sends a prepare transition opcode (Opcode 21, 24, 29, or 30)
- Client prepares local state for the transition
- Client sends Opcode 23 DAVE Transition Ready
- When all participants are ready (or a timeout is reached), server sends Opcode 22 DAVE Execute Transition
- Media senders begin using the new protocol context
Downgrade (Protocol Version 0)
Downgrade (Protocol Version 0)
Announced via Opcode 21 DAVE Prepare Transition. Occurs when a non-DAVE client joins the call. After execution, senders stop sending E2EE-formatted media.
Protocol Version Change and Upgrade
Protocol Version Change and Upgrade
Announced via Opcode 24 DAVE Prepare Epoch, which includes the
epoch for the upcoming MLS epoch.epoch = 1means a new MLS group is being created. Participants must:- Prepare a local MLS group with parameters for the DAVE protocol version
- Generate and send Opcode 26 DAVE MLS Key Package
epoch > 1means the protocol version of the existing MLS group is changing
MLS Group Changes
MLS Group Changes
When participants must change, existing members receive Opcode 29 DAVE MLS Announce Commit Transition and new members receive Opcode 30 DAVE MLS Welcome. Both include the transition ID and the binary MLS Commit or Welcome message.Existing members apply the commit to progress their local MLS group state, then send Opcode 23 DAVE Transition Ready. Welcomed members send the same opcode after successfully joining the group from the Welcome message.
Audio Frame E2EE
Transport encryption operates at the packet level. DAVE E2EE operates at the frame level — the full contents of OPUS frames are end-to-end encrypted using AES128-GCM.Payload Format
| Field | Description | Size |
|---|---|---|
| E2EE OPUS Frame | Ciphertext for E2EE OPUS frame | Variable bytes |
| AES-GCM Auth. Tag | Truncated AES128-GCM AEAD Authentication Tag | 8 bytes |
| ULEB128 Nonce | ULEB128 synchronization nonce | Variable bytes |
| ULEB128 Unencrypted Ranges | ULEB128 offset/length pairs of unencrypted data | Variable bytes |
| Supplemental Data Size | Unsigned integer bytes size of supplemental data | 1 byte |
| Magic Marker | 0xFAFA marker to assist with protocol frame identification | 2 bytes |
The ULEB128 unencrypted ranges field is empty (0 bytes) for OPUS frames because the full contents are encrypted.