Pallas Athena

Implementing WebRTC Applications in Python Part I: Session Description Protocol

Objective

The purpose of this article is to introduce a series blog-posts I'm writing on Web Real-Time Communications (known as WebRTC). WebRTC embodies a set of free and open standards that enable devices connected over the Internet to communicate in real time. Taken together, WebRTC standards and protocols enable voice and video calls, live streaming, file sharing and much, much more.

A detailed discussion of all the standards used in WebRTC would be quite an undertaking -- definitely too much to cover in a single blog post. So I'm carving out the coverage of this large topic into a series in which I'll explore implementations of these standards in the context of developing a streaming-media reference-application using python.

Introduction

If there's one thing I've learned over the years, it's the importance of standards. This became strikingly clear to me in the early '90s as the Internet and World Wide Web exploded into mainstream awareness. Back then, web pages were still in their infancy, and few people outside of universities or research institutions even knew about email. Then, almost overnight, the World Wide Web became "the next big thing," unleashing world-shaking disruptions and economic transformations.

If I were to ask, "Who invented the Internet?" I'd likely get a range of answers, with few people today remembering a specific name. Back when I used to teach my course, Exploring Internet Development, at Boston College, I would conduct an informal experiment. At the start of each semester I'd ask my students: "Who here has heard of O.J. Simpson?" Nearly every hand would go up. Then I'd follow with: "Who's ever heard of Tim Berners-Lee?" The room would fall silent.

And yet, Berners-Lee's contribution to humanity is arguably on par with Gutenberg's invention of the printing press. By introducing HTML (a groundbreaking standard for text-based markup), and creating the first web browser, Berners-Lee revolutionized how the world accesses and consumes information. Unlike Gutenberg's printing press though, Berners-Lee's invention wasn't a patentable mechanical device but rather a set of standards. Open standards to be precise. These standards became the foundation of an interconnected world and helped propel the Information Age to unprecedented heights.

Through the development of these standards Berners Lee and many others paved the way for the emergence of the World Wide Web and the world-wide adoption and further development of the Internet. The world simply would not be where it is today without the wide-spread adoption of standards.

Much more recently, the importance of standards was brought back home to me when I undertook the architecture and implementation of a system enabling peer-to-peer (P2P) communcations for a company building security devices Interconnected over the Internet. While I'm not at liberty to get into the specifics, suffice it to say that an effort that might have taken months to complete took merely a handful of days thanks to the recent development and application of a relatively new set of WWW standards -- those revolving around Web Real-Time Communcations, or, WebRTC.

WebRTC: Emerging Standards for Peer-to-Peer Communications

So, what exactly is WebRTC?1 WebRTC (Web Real-Time Communications) is not just one but rather a set of open standards, protocols and APIs that transcends any one particular product or platform. Its purpose is to enable Internet connected users to communicate in real-time by defining protocols for steaming data among interconnected peers (P2P communications).

P2P Illustrated
Figure 1. The P2P network architecture enabled by WebRTC.

Figure 1 illustrates the P2P network communications architecture enabled by WebRTC. Network connected devices can communicate as peers over the WWW with protocols designed to enable Network Address Translation (NAT) and firewall traversal using specialized relay servers where necessary. WebRTC supports a vast range of use-cases from simple messaging across the Internet of Things to streaming video, screen-share, real-time collaboration tools, multi-player gaming ~~ the possibilities are endless.

Officially standardized in 2021 through efforts conducted under the auspices of the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF), WebRTC is now supported by most major browsers and communications platforms vastly facilitating development efforts revolving around network communications.

WebRTC Protocols and Standards

Among others, core WebRTC protocols include:

  • SDP (Session Description Protocol): used to exchange information about the media streams (audio, video, data) being transmitted, including codecs, bitrates, and encryption parameters,

  • ICE (Interactive Connectivity Establishment): a framework enabling WebRTC peers to discover and connect with each other, even when they are behind different types of NAT devices or firewalls,

  • STUN (Session Traversal Utilities for NAT): helps peers determine public IP addresses and ports enabling the establishment of direct channels for communication, and

  • TURN (Traversal Using Relays around NAT): which provides a standardized means to create relay servers that can be used as a fallback option when direct connections are not possible due to strict NAT configurations.

Additional standards include:

  • RTP (Real-time Transport Protocol): the underlying protocol for transmitting media over the Internet, along with

  • VP8, VP9 and Opus: audio and video codecs used for encoding and decoding a/v streams.

That's a lot of protocols! But, taken together, they support and facilitate the development of a vast range of application use-cases!

Session Description Protocol

Again, a comprehensive discussion of all the standards associated with WebRTC is too much for a single blog post and instead I'll cover a number of standards over the course of this series. But to kick things off, I'll open up the discussion with an exploration of SDP2 which is fundamental to all WebRTC applications.

Session Description Protocol

Session Description Protocol (SDP) is a well-defined format for conveying sufficient information to discover and participate in a multimedia session.

SDP is fundamental inasmuch as that it paves the way for any sort of P2P data transmission over the Internet. SDP defines a handshake; the procedure enabling devices to negotiate the parameters of the transmission. I'm a visual thinker and so to better understand the protocol I've created a diagram to illustrate the way the SDP handshake works.

INSERT THREADED UI DIAGRAM
Figure 2. Illustration of the SDP signalling protocol.

Let's call the peers participating in the exchange senders and receivers. The WebRTC session begins with the exchange of information between the peers (referred to as signaling) in the form of SDP offers and answers. These structured text documents contain all the meta-information about a multimedia session, such as the media types, codecs, transport protocols, and other relevant parameters. If you're curious about the format, have a look at a couple of sample documents which I've captured using a python test GUI I whipped up for this effort. I've added the samples as an appendix.

Figure 2 illustrates the protocol.

  1. The sender creates an offer which includes information the receiver will need to receive the transmission. The sender sets the offer as its local description and then sends it to the receiver.

  2. On receiving the offer, the receiver sets it as its remote description and generates an answer which it sets as its local description.

  3. Then The receiver sends the answer to the sender which sets it as its remote description.

With this in mind we can start looking at a series of python code examples embodying these concepts.

Coding WebRTC Applications in Python Example 1: Signaling

WebRTC signaling objects and methods are defined on the RTCPeerConnection interface. The interface defines local and remote session description attributes (encapsulated in the offer and answer documents).

A python implementation of the RTCPeerConnection interface is available in the aiortc package. aiortc is a python library that provides WebRTC functionality using the asyncio framework. It allows for asynchronous management of WebRTC connections, handling signaling, and managing media streams. I'll be relying on this package over the course of this blog series.

The first set of abstractions I find useful in using aiortc are python classes encapsulating sender and receiver attributes and methods. Both sender and receiver objects will hold RTCPeerConnection references. To start things off, I defined two classes; DataSender and DataReceiver for simple data transmission over a data channel. Later in this series I'll do a deep dive into transmission involving streaming media.

The DataSender Class

The following sample code includes fragments from the DataSender class relevant to the present discussion.

class DataSender : 
    '''
    This class encapsulates signalling associated with a 
    WebRTC 'sender' (i.e., the object associated with 
    transmitting data from *producer* to *consumer*). 
    '''

    def __init__ ( self ) :
        self.pc = RTCPeerConnection()
        # continue initialization ...

    async def handle_offer_request ( self ) : 
        self.data_channel = self.pc.createDataChannel( "nn channel 1" )
        offer = await self.pc.createOffer()
        await self.pc.setLocalDescription( offer )
        offer_desc = self.pc.localDescription
        return offer_desc

    async def handle_answer( self, answer_description ) :
        await self.pc.setRemoteDescription( answer_description )

    # Continue DataSender custom methods...

    

As illustrated in Figure 2 WebRTC signaling starts with an 'offer' from the sender. In this implementation, the DataSender class is responsible for handling "sender side" signaling events. The relevant methods are handle_offer_request and handle_answer.

  1. handle_offer_request assumes it will receive a request from some entity for an offer describing the nature of the transmission it is prepared to send 4 . DataSender handles the request by:

    • generating an offer using its instance of RTCPeerConnection, and

    • setting the returned offer description object as the RTCPeerConnection's localDescription attribute.

    Important

    Notice how handle_offer_request returns the offer encapsulated in an aiortc RTCSessionDescription object. The offer will be used later by the receiver class to generate an SDP answer.

  2. handle_answer. Once the offer is sent off to the receiver object it will generate an answer which will be sent back to DataSender. handle_answer completes the SDP exchange by setting the answer description as its remote description for the session.

The above fragments encapsulate a basic signaling example from the sender side of the exchange. Next let's look at the receiver side.

Tip

The reader may have noticed that this simplified example has only one RTCPeerConnection instance to participate in an exchange. The example can be readily extended to handle multiple recipients by adding additional RtcPeerConnection instances. In other words, the cardinality on RtcPeerConnection's for sender to recipients is 1:N .

The DataReceiver Class

Next we have the DataReceiver class.

class DataReceiver : 
    '''
    This class encapsulates signalling associated with a 
    WebRTC 'receiver' (i.e., the object associated with 
    consuming data from a producer). 
    '''
    def __init__ ( self ) :
        self.pc = RTCPeerConnection()
        # continue initialization ...

    async def handle_offer ( self, offer_description ) : 
        await self.pc.setRemoteDescription( offer_description )
        answer = await self.pc.createAnswer()
        await self.pc.setLocalDescription( answer )
        answer_desc = self.pc.localDescription
        return answer_desc

    # Continue DataReceiver custom methods...

    

Again we use an RTCPeerConnection to handle the SDP. Here, we handle the sender's offer by:

  1. Setting it as the receiver RTCPeerConnection's localDescription,

  2. Generating an answer, and

  3. Returning the answer encapsulated in an RTCSessionDescription object so that it can be sent back to the sender.

So the receiver class encapsulates handling an SDP exchange on the receiver side of the transmission.

Discussion

In the above examples, we see a very basic implementation of an SDP exchange defined using aiortc RTCPeerConnection's on python sender and receiver classes. If you're new to WebRTC and especially if you're new to networking concepts in general that may seem like a lot to take in! But as the examples show aiortc provides nice implementations to handle a lot of the low-level details required to generate the offer/answer session descriptions. You just have to know how to use them and what to expect when you do so!

Since it may be a lot to digest I'll leave off for now and pick up from here in subsequent posts where I'll get into STUN, NAT traversal using TURN, and ultimately streaming media. But before leaving off, it's well worth saying a few words about RTCPeerConnection state.

RTCPeerConnection State Changes

WebRTC peers transition through many states over the life cycle of a connection. It's very important to understand these states and associated state-transition triggers. The RTCPeerConnection interface defines a high-level read-only property, connectionState, which can be used to inspect the state of a peer connection over the course of its life cycle for purposes of development, error-handling and trouble shooting. The following diagram illustrates the possible states reflected by this property.

RtcPeerConnection States
Figure i: RTCPeerConnection object states.
  1. new: The connection object has been created but there is not yet any network activity associated with it.

  2. connecting: Participating WebRTC peers are negotiating transmission parameters.

  3. connected: A connection between peers has been successfully negotiated and is operational.

  4. failed: The connection could not be established.

  5. disconnected: The connection is temporarily disconnected due to network issues.

  6. closed: The connection is closed.

When a change in connection state is triggered a connectionstatechange event is dispatched to the RTCPeerConnection object owning the connection. The following code fragment shows how to handle the event using an aiortc.RTCPeerConnection instance.

# Given an RTCPeerConnection instance, pc, define a callback 
# to handle connection-state transitions ...

@pc.on("connectionstatechange")
async def on_connection_state_change():
    print(f"Connection state changed: {pc.connectionState}")
    if pc.connectionState == "connected":
        # handle transition to connected state...
    elif pc.connectionState == "failed":
        # handle transition to failed state...
        await pc.close()
    elif pc.connectionState == "disconnected":
        # handle transition to disconnected state...
    elif pc.connectionState == "closed":
        # handle transition to closed state (may required 
        # release of allocated resources...)
    

Notice how "cleanup" should be performed on transition of the connection to 'closed' state ensuring robust handling of transitions while preventing resource leaks.

Summary and Next Steps

In summary, this blog-post is the first in a series exploring WebRTC application development in python. The scope of this post was limited to covering SDP -- arguably the most fundamental protocol in WebRTC since it sets the stage for many types of data transmission between peers. In this post we saw:

  1. A high-level description of the SDP protocol,

  2. Sample python code implementing SDP signaling using aiortc, and

  3. A discussion of RTCPeerConnection states over the life cycle of a WebRTC session.

Subsequent articles in this series will explore network traversal (using STUN and TURN), media streaming, and real-time integration of machine learning in python WebRTC applications.

End notes

  1. Over the course of researching my needs for a WebRTC project (in which I'm engaged at the time of this writing) I came across a "blog post" purporting to explain WebRTC. The problem is the post contains a lot of misinformation -- information which has the potential to mislead decision makers and impact progress on the development and adoption of standards and supporting technologies. Consequently, I feel the need here to "set the record straight" regarding some key points.

    The blog post in question asserts that WebRTC is "...not a protocol" instead it is a "project". Actually, half the point of the present post is that WebRTC is a set of protocols and standards, open standards, designed to support the development of decentralized (P2P) web-based communications. The article goes on to assert that the WebRTC "project" is owned by Google. The fact that WebRTC standards are open means that they are not "owned" by any one individual or company. WebRTC standards and protocols may be adopted by anyone anywhere who is willing and able to provide standards-compliant implementations. Implying otherwise insults the invaluable effort and work of individuals and organizations who contribute to open standards and the development of open software systems from which all companies across the board benefit.

  2. SDP: Session Description Protocol .

  3. Electron is a framework for building beautiful cross-platform desktop applications using HTML, JavaScript and CSS.

  4. SDP in-and-of-itself does not specify a mechanism for requesting an offer. A framework for doing so is defined in RFC3264 (commonly referred to as the offer/answer model). For present purposes assume an external entity issues an offer request. Later in the series I'll be covering cases where the external entity is a user device (e.g., a desktop computer).

Appendix 1: Sample SDP Documents

This appendix provides a sample SDP document in order to help gain better understanding of the SDP format. The document comprises an offer defining parameters for information exchange using a WebRTC datachannel.

Sample SDP Offer

v=0
o=- 3939468060 3939468060 IN IP4 0.0.0.0
s=-
t=0 0
a=group:BUNDLE 0
a=msid-semantic:WMS *
m=application 47419 DTLS/SCTP 5000
c=IN IP4 192.168.1.158
a=mid:0
a=sctpmap:5000 webrtc-datachannel 65535
a=max-message-size:65536
a=candidate:92f48e2b25b0cf96833e74c0e6d4b612 1 udp 2130706431 192.168.1.158 47419 typ host
a=candidate:0d0b6cc018d79c453c3f8b96c8c6a899 1 udp 1694498815 71.184.100.169 47419 typ srflx raddr 192.168.1.158 rport 47419
a=end-of-candidates
a=ice-ufrag:WnTS
a=ice-pwd:TNMCpvqGZ4terU2c1tbn6w
a=fingerprint:sha-256 AB:FA:49:9D:BD:DA:63:82:E6:C4:CA:D2:06:8C:15:6D:A5:2D:B6:3D:32:6A:F7:B0:EE:FF:82:5D:3A:9B:B0:88
a=setup:actpass

Analysis

SDP documents are a text-based machine-readable format defined so as to provide sufficient information for network data transmission of a range of media types. Without getting too deep into the details, cursory examination of the specimen reveals some key properties of the format:

  • Media Type: The media type offered in the transmission (in this sample webrtc-datachannel

  • IP Address and Port: Key for transmission over the internet, participating peers must provide IP and port addressing.

  • Transport Protocols: Required to set up media transport mechanisms.

  • ICE (Interactive Connectivity Establishment) Candidates: Used for network traversal.

  • DTLS Fingerprints: SHA-256 fingerprints provided for DTLS authentication.

In summary, this sample SDP offer describes a WebRTC data channel. It specifies the network parameters, security mechanisms, and data channel capabilities required for the transfer. The offeror is proposing to establish a secure data channel using DTLS and SCTP protocols. The ICE parameters facilitate network traversal to establish the connection.

This type of SDP offer is commonly used in WebRTC applications for various purposes, including:

  • file transfer
  • real-time messaging:, and
  • custom protocol implementation.

Resources

  1. W3C WebRTC Specification

  2. Session Description Protocol

  3. RFC3264: An Offer/Answer Model with the Session Description Protocol (SDP)

  4. aiortc

  5. RTCPeerConnection interface

  6. RTCPeerConnection: connectionState property