Skip to main content
Voice connections operate similarly to the Gateway connection, but use a different set of payloads and a separate UDP-based connection for voice data transmission. Because UDP is used for both receiving and transmitting voice data, your client must be able to receive UDP packets, even through a firewall or NAT (see UDP Hole Punching). Discord voice servers also support IP Discovery to help clients find their external UDP IP/port for NAT traversal.

Voice Gateway Versioning

Versions below 4 and the default version behavior were discontinued as of November 18th, 2024. Connections without a version or with a version less than 4 are rejected. Always specify ?v=8 (recommended) in your WebSocket URL.
VersionStatusWebSocket URL AppendChange
8Recommended?v=8Added server message buffering; missed messages re-delivered on resume
7Available?v=7Added channel options opcode
6Available?v=6Added code version opcode
5Available?v=5Added video sink wants opcode
4Available?v=4Changed speaking status to bitmask from boolean
3Deprecated?v=3Added video
2Deprecated?v=2Changed heartbeat reply to Heartbeat ACK opcode
1Deprecated?v=1Initial version

Connecting to Voice

Step 1: Retrieve Voice Server Information

Send an Opcode 4 Gateway Voice State Update to the main Gateway to join a voice channel:
{
  "op": 4,
  "d": {
    "guild_id": "41771983423143937",
    "channel_id": "127121515262115840",
    "self_mute": false,
    "self_deaf": false
  }
}
The Gateway responds with two events that you must both receive before proceeding:
  1. Voice State Update — contains a session_id
  2. Voice Server Update — contains the token and endpoint
{
  "t": "VOICE_SERVER_UPDATE",
  "s": 2,
  "op": 0,
  "d": {
    "token": "my_token",
    "guild_id": "41771983423143937",
    "endpoint": "sweetwater-12345.discord.media:2048"
  }
}
Bot users respect the voice channel’s user limit, if set. When the voice channel is full, you will not receive the Voice State Update or Voice Server Update events. Having MOVE_MEMBERS permission bypasses this limit.
Never cache or save voice server results — Discord’s voice platform is widely distributed and servers change. When switching channels within the same guild, the endpoint may remain the same but the token changes. You cannot reuse the previous session during a channel change.

Step 2: Establish the Voice WebSocket Connection

Connect to wss://ENDPOINT?v=8. The endpoint from Voice Server Update does not include wss:// — you must prepend it manually. Send Opcode 0 Identify with your credentials:
{
  "op": 0,
  "d": {
    "server_id": "41771983423143937",
    "user_id": "104694319306248192",
    "session_id": "my_session_id",
    "token": "my_token",
    "max_dave_protocol_version": 1
  }
}
The voice server responds with Opcode 2 Ready, containing your SSRC, UDP endpoint, and supported encryption modes:
{
  "op": 2,
  "d": {
    "ssrc": 1,
    "ip": "127.0.0.1",
    "port": 1234,
    "modes": ["aead_aes256_gcm_rtpsize", "aead_xchacha20_poly1305_rtpsize"],
    "heartbeat_interval": 1
  }
}
The heartbeat_interval in the Opcode 2 Ready payload is erroneous and should be ignored. Use the heartbeat_interval value from the Opcode 8 Hello payload instead.

Heartbeating

After connecting, the voice server sends Opcode 8 Hello with the heartbeat interval:
{
  "op": 8,
  "d": {
    "heartbeat_interval": 41250
  }
}
Send Opcode 3 Heartbeat at that interval to keep the connection alive.
{
  "op": 3,
  "d": {
    "t": 1501184119561,
    "seq_ack": 10
  }
}
Since voice gateway version 8, heartbeat messages must include seq_ack — the sequence number of the last numbered message received from the gateway. See Buffered Resume for details.
The server acknowledges with Opcode 6 Heartbeat ACK:
{
  "op": 6,
  "d": {
    "t": 1501184119561
  }
}

Establishing a Voice UDP Connection

Using the ip and port from the Opcode 2 Ready payload, open a UDP connection. Optionally perform IP Discovery to determine your external IP and port, then send Opcode 1 Select Protocol:
{
  "op": 1,
  "d": {
    "protocol": "udp",
    "data": {
      "address": "127.0.0.1",
      "port": 1337,
      "mode": "aead_aes256_gcm_rtpsize"
    }
  }
}

Transport Encryption and Sending Voice

Voice data must be encoded with Opus (stereo, 48kHz sample rate) and sent with an RTP header followed by encrypted Opus audio.

Encryption Modes

ModeKeyNonceStatus
AEAD AES256-GCM (RTP Size)aead_aes256_gcm_rtpsize32-bit incremental integer, appended to payloadAvailable (Preferred)
AEAD XChaCha20 Poly1305 (RTP Size)aead_xchacha20_poly1305_rtpsize32-bit incremental integer, appended to payloadAvailable (Required)
XSalsa20 Poly1305 Lite (RTP Size)xsalsa20_poly1305_lite_rtpsize32-bit incremental integer, appended to payloadDeprecated
AEAD AES256-GCMaead_aes256_gcm32-bit incremental integer, appended to payloadDeprecated
XSalsa20 Poly1305xsalsa20_poly1305Copy of RTP headerDeprecated
XSalsa20 Poly1305 Suffixxsalsa20_poly1305_suffix24 random bytesDeprecated
XSalsa20 Poly1305 Litexsalsa20_poly1305_lite32-bit incremental integer, appended to payloadDeprecated
The deprecated encryption modes were discontinued as of November 18th, 2024. The voice gateway will reject connections using deprecated modes. You must support aead_xchacha20_poly1305_rtpsize. Prefer aead_aes256_gcm_rtpsize when available.
The nonce must be stripped from the payload before encrypting and before decrypting the audio data.
After selecting your mode, the voice server responds with Opcode 4 Session Description containing the secret_key (32-byte array) used for transport encryption:
{
  "op": 4,
  "d": {
    "mode": "aead_aes256_gcm_rtpsize",
    "secret_key": [251, 100, 11, "..."],
    "dave_protocol_version": 1
  }
}

Voice Packet Structure

FieldTypeSize
Version + FlagsSingle byte value of 0x801 byte
Payload TypeSingle byte value of 0x781 byte
SequenceUnsigned short (big endian)2 bytes
TimestampUnsigned integer (big endian)4 bytes
SSRCUnsigned integer (big endian)4 bytes
Encrypted audioBinary datan bytes

Speaking

Before sending any audio, you must send at least one Opcode 5 Speaking payload to set the initial speaking mode and update your SSRC. The speaking flags are a bitfield:
FlagValueMeaning
Microphone1 << 0Normal transmission of voice audio
Soundshare1 << 1Transmission of context audio for video, no speaking indicator
Priority1 << 2Priority speaker, lowering audio of other speakers
{
  "op": 5,
  "d": {
    "speaking": 5,
    "delay": 0,
    "ssrc": 1
  }
}
You must send at least one Opcode 5 Speaking payload before sending voice data, or you will be disconnected with an invalid SSRC error.
The delay property should be set to 0 for bots.

Voice Data Interpolation

When there is a break in the sent data, do not simply stop transmitting. Send five frames of silence (0xF8, 0xFF, 0xFE) before stopping to avoid unintended Opus interpolation with subsequent transmissions.

Resuming Voice Connection

When your client detects a severed connection, open a new WebSocket and send Opcode 7 Resume:
{
  "op": 7,
  "d": {
    "server_id": "41771983423143937",
    "session_id": "my_session_id",
    "token": "my_token",
    "seq_ack": 10
  }
}
On success, the voice server responds with Opcode 9 Resumed:
{
  "op": 9,
  "d": null
}
If the resume fails (e.g. invalid session), the WebSocket closes with a Voice Close Event Code. Follow the full Connecting to Voice flow to reconnect.

Buffered Resume

Since voice gateway version 8, the gateway can resend messages that were lost during a disconnect. To support this:
  • The gateway includes a seq field on messages that may need re-delivery.
  • Clients must track the last seq value received.
  • Include seq_ack in both Heartbeat and Resume payloads.
{
  "op": 5,
  "d": {
    "speaking": 0,
    "delay": 0,
    "ssrc": 110
  },
  "seq": 10
}
If no sequenced messages have been received, seq_ack can be omitted or set to -1. The gateway handles sequence number wrap-around automatically.

IP Discovery

Most routers on the Internet mask UDP ports through NAT. Use IP discovery to find your external IP and port for receiving voice. Send the following UDP packet (all numeric fields are big endian) to your voice port:
FieldDescriptionSize
Type0x1 for request, 0x2 for response2 bytes
LengthMessage length excluding Type and Length fields (value 70)2 bytes
SSRCYour SSRC as an unsigned integer4 bytes
AddressNull-terminated string in response64 bytes
PortUnsigned short2 bytes

End-to-End Encryption (DAVE Protocol)

Since September 2024, Discord is migrating voice and video in DMs, Group DMs, voice channels, and Go Live streams to use end-to-end encryption (E2EE) via the DAVE protocol.
Starting March 1st, 2026, Discord will only support E2EE calls for all audio and video conversations in DMs, GDMs, voice channels, and Go Live streams. Implement DAVE support as soon as possible.
The most thorough documentation on the DAVE protocol is available in the Protocol Whitepaper. Discord’s open-source library libdave can assist your implementation. When a call is E2EE, all members exchange keys via a Messaging Layer Security (MLS) group. This group derives per-sender ratcheted media keys to encrypt/decrypt media frames.

Binary WebSocket Messages

Some DAVE protocol opcodes are sent as binary WebSocket messages rather than JSON. Binary messages have the following format:
FieldDescriptionSize
Sequence NumberOPTIONAL (server → client only) big-endian uint16 sequence number2 bytes
OpcodeUnsigned integer opcode value1 byte
PayloadBinary message payload (format defined by opcode)Variable bytes
Sequence numbers are only sent server → client. All server-sent binary opcodes include a sequence number, used when resuming.

Indicating DAVE Protocol Support

Include the highest DAVE protocol version you support in Opcode 0 Identify as max_dave_protocol_version. Sending 0 or omitting the field indicates no DAVE protocol support. The voice gateway specifies the selected version in Opcode 4 Session Description under dave_protocol_version.
Clients must retain backwards-compatibility with all non-discontinued DAVE protocol versions. The voice gateway selects the lowest shared protocol version for the call.

Protocol Transitions

Transitions occur when upgrading/downgrading E2EE, changing protocol versions, or when the MLS group changes. The flow:
  1. Server sends a prepare transition opcode (Opcode 21, 24, 29, or 30)
  2. Client prepares local state for the transition
  3. Client sends Opcode 23 DAVE Transition Ready
  4. When all participants are ready (or a timeout is reached), server sends Opcode 22 DAVE Execute Transition
  5. Media senders begin using the new protocol context
Announced via Opcode 21 DAVE Prepare Transition. Occurs when a non-DAVE client joins the call. After execution, senders stop sending E2EE-formatted media.
Announced via Opcode 24 DAVE Prepare Epoch, which includes the epoch for the upcoming MLS epoch.
  • epoch = 1 means a new MLS group is being created. Participants must:
    • Prepare a local MLS group with parameters for the DAVE protocol version
    • Generate and send Opcode 26 DAVE MLS Key Package
  • epoch > 1 means the protocol version of the existing MLS group is changing
When participants must change, existing members receive Opcode 29 DAVE MLS Announce Commit Transition and new members receive Opcode 30 DAVE MLS Welcome. Both include the transition ID and the binary MLS Commit or Welcome message.Existing members apply the commit to progress their local MLS group state, then send Opcode 23 DAVE Transition Ready. Welcomed members send the same opcode after successfully joining the group from the Welcome message.

Audio Frame E2EE

Transport encryption operates at the packet level. DAVE E2EE operates at the frame level — the full contents of OPUS frames are end-to-end encrypted using AES128-GCM.

Payload Format

FieldDescriptionSize
E2EE OPUS FrameCiphertext for E2EE OPUS frameVariable bytes
AES-GCM Auth. TagTruncated AES128-GCM AEAD Authentication Tag8 bytes
ULEB128 NonceULEB128 synchronization nonceVariable bytes
ULEB128 Unencrypted RangesULEB128 offset/length pairs of unencrypted dataVariable bytes
Supplemental Data SizeUnsigned integer bytes size of supplemental data1 byte
Magic Marker0xFAFA marker to assist with protocol frame identification2 bytes
The ULEB128 unencrypted ranges field is empty (0 bytes) for OPUS frames because the full contents are encrypted.

Key Derivation

Each sender has a ratcheted per-sender key, with a new ratchet created per MLS group epoch. The initial secret is an exported 16-byte secret from the MLS group. Keys are retrieved via a generation counter derived from the most-significant byte of the 4-byte nonce. See the Sender Key Derivation section of the protocol whitepaper for full details.

Nonce

The protocol uses at most a 4-byte truncated nonce, expanded to the required 12-byte AES-GCM nonce by setting the 8 most significant bytes to zero.

Authentication Tag

The AES128-GCM authentication tag is truncated to 8 bytes. Remove the 4 least significant bytes from the full 12-byte tag if your implementation always returns the full tag.