Live Media Streaming
A lot has changed since the internet came into the picture. This global network of computer networks, largely based nowadays on platforms of wireless communication, provides ubiquitous capacity of multimodal, interactive communication. Before internet, copying data onto an auxiliary device from one machine and then dumping it on the second one was the usual go-to data transfer mechanism. Nowadays, content can be placed on a server and anyone with appropriate access rights can access it and that is why the invention of internet has been the most crucial technology transformation till date.
Intenet did solve several problems but it introduced many more along the way for exa- Dev community had to agree on a universal data transfer format, data compression mechanisms were required, security vulnerabilities needed to be addressed, and communication had to be secure, etc. And that’s how data marshaling/unmarshaling, communication protocols, OAuth, TLS, and Codecs(for compression/decompression) were invented.
Then came the age of data/media streaming, in which data was published from one place and being subscribed on other end. After that, the age of live streaming came, and then the age of 2-way video communication. Streaming protocols were required to meet these requirements.
Streaming protocol - is a set of standards to deliver media across the web and since raw media content tends to be large so codecs are used to compress/decompress media content. Streaming protocols operate within the confines of OSI layers. One layer, in particular, deserving attention here is the transport layer. It is responsible for the transmission of content to the end user/device. There are two ways for content to be transmitted — via the TCP or UDP. The main difference between these two is that TCP forces the communicating devices to establish a connection to transfer data. UDP, on the other hand, ignores this step. On a more practical level, UDP transmits small bits of information relatively faster than TCP. However, this comes with a price. Due to the absence of multiple handshakes and confirmation steps between the devices, the data can’t be transmitted in a strict order. Moreover, the receiving side may not get some pieces at all. This can sometimes result in minor issues with the quality. There are several live streaming protocols, each serving its won purpose. The most popular ones are — RTMP and WebRTC.
RTMP -
RTMP stands for Real-Time Messaging Protocol. It’s been in the industry for more than 20 years and still remains the most-used protocol for media streaming. It maintains a persistent, stable connection and allows for low-latency communication the standard stream delay is around five to 30 seconds, but it can be lowered to two or three seconds. Also, the protocol currently uses the H.264 video codec and AAC audio codec. It is TCP-based which means the chances of data loss are very low but have to compromise with latency.
WebRTC -
WebRTC is a modern protocol, it uses UDP, and allows for quick lossy data transfer. It supports VP8, VP9, H.264, and H.625 video codecs and Opus audio codec. WebRTC is a combination of the following buildings blocks-
Signaling channel: A resource that enables applications to discover, set up, control, and terminate a peer-to-peer connection by exchanging signaling messages. Signaling messages are metadata that two applications exchange with each other to establish peer-to-peer connectivity. It basically allows applications to securely connect with each other for peer-to-peer live media streaming. The signaling component includes the REST APIs and a set of Websocket APIs.
Peer: Any device or application that joins the webRTC session as a participant.
Session Traversal Utilities for NAT (STUN): A protocol that is used to discover your public address and determine any restrictions in your router that would prevent a direct connection with a peer. It basically enables applications to discover their public IP address when they are located behind a NAT or a firewall.
Traversal Using Relays around NAT (TURN): This component manages TURN endpoints that enable media relay via the cloud when applications can’t stream media peer-to-peer.
Session Description Protocol (SDP): A standard for describing the multimedia content of the connection such as resolution, formats, codecs, encryption, etc. so that both peers can understand each other once the data is transferred.
SDP Offer: An SDP message sent by an agent which generates a session description in order to create or modify a session. It describes the aspects of desired media communication.
SDP Answer: An SDP message sent by an answerer in response to an offer received from an offerer. The answer indicates the aspects that are accepted. For example, if all the audio and video streams in the offer are accepted.
Interactive Connectivity Establishment (ICE): A framework that allows your web browser to connect with peers.
ICE Candidate: A method that the sending peer is able to use to communicate. It basically is IP and port pairs.
How peer 2 peer communication works in WebRTC: -
Let’s take the scenario of two peers, A and B, who are both using a WebRTC peer-to-peer two-way media streaming. What happens when A wants to connect to B?
To connect to B’s application, A’s application will generate an SDP offer that contains what codecs to use, whether this is an audio or video session, and a list of ICE candidates.
To build the list of ICE candidates, A’s application makes a series of requests to a STUN server. The server returns the public IP address and port pair that originated the request. A’s application adds each pair to the list of ICE candidates, in other words, it gathers ICE candidates. Once A’s application has finished gathering ICE candidates, it can return an SDP.
Next, A’s application must pass the SDP to B’s application through a signaling channel over which these applications communicate. The transport protocol for this exchange is not specified in the WebRTC standard. It can be performed over HTTPS, secure WebSocket, or any other communication protocol.
Now, B’s application must generate an SDP answer. B’s application follows the same steps A used in the previous step: gathering ICE candidates, etc. B’s application then needs to return this SDP answer to A’s application.
After A and B have exchanged SDPs, they then perform a series of connectivity checks. The ICE algorithm in each application takes a candidate IP/port pair from the list it received in the other party’s SDP and sends it a STUN request. If a response comes back from the other application, the originating application considers the check successful and marks that IP/port pair as a valid ICE candidate.
After connectivity checks are finished on all of the IP/port pairs, the applications negotiate and decide to use one of the remaining, valid pairs. When a pair is selected, media begins flowing between the application.
If either of the applications can’t find an IP/port pair that passes connectivity checks, they’ll make STUN requests to the TURN server to obtain a media relay address. A relay address is a public IP address and port that forwards packets received to and from the application to set up the relay address. This relay address is then added to the candidate list and exchanged via the signaling channel.