WebRTC is a technology which allows audio and video connections to be established directly from Internet browsers. It was published by Google in 2011 as an open source project and since then has been constantly developed further and standardized. This technology is integrated in modern browsers such as Opera, Chrome and Firefox. And because it enables communication without the need for any other devices, it is becoming increasingly popular; the browser only needs access to the headset. There is no need to install any soft phones or desktop phones. But how does WebRTC actually work?
Architecture of WebRTC
WebRTC (Web Real-Time Communication) is made available in the form of JavaScript APIs. Browsers which support WebRTC enable not only audio and video connections to be established, but also other functions such as chat, screen sharing and file transfers. The APIs in question are:
- MediaStream -> Access of browser to camera and headset
- RTCPeerConnection -> Audio and video transmissions
- RTCDataChannel -> Transmission option for generic data
Not all of these APIs work with every platform, however; some browsers have only implemented parts of these up to now. Check the release notes of the individual browsers to find out the current status of implementations.
In addition to the above-mentioned options, WebRTC also requires signaling of course, in order to control transmissions, replace network configurations and negotiate codecs. This signaling is not explicitly included in the WebRTC standard: you can choose which type of signaling you wish to use. There are several types available, including SIP and XMPP. These protocols are transferred within secure connections which can only be accessed by the two computers involved in the process. WebSockets are often used for this purpose. (WebSockets is a technology which enables Web servers to keep connections to clients open and to use them to send asynchronous messages to those clients.)
External Servers for WebRTC
Although WebRTC connections are actually peer-to-peer connections, external servers are often used, at least for establishing a connection. They are:
- STUN servers (Session Traversal Utilities for NAT)
These servers are used to locate one's own public IP address (outside one's own network), in other words "under what address am I seen by the outside world?"
- ICE servers (Interactive Connectivity Establishment)
These servers determine the best possible connection option for connections between two computers. More specifically, they first attempt to connect the computers directly, if that is not possible, they determine the public IP addresses of the computers and then try it that way. If this also fails, they interpose a TURN server.
- TURN servers (Traversal Using Relay NAT)
If a direct connection between two computers is not possible, TURN servers can be used as relays. All (S)RTP data traffic is then routed over these types of servers.
How are WebRTC connections established?
These tools enable web browsers to establish connections to other WebRTC partners. And this is how it works (in simple terms):
Setting up a signaling channel
- Independently of WebRTC, browser A first sets up a signaling channel to B in some way or other. A web server may be involved in the process, via which the user can select the appropriate remote station, but this does not have to be the case. Perhaps browser A knows the IP address of B and can therefore directly establish the signaling connection. This “signaling” is not included in the WebRTC standard and is performed independently, in a non prescribed manner. The result of this process is in any case an established signaling channel through which the remote stations can exchange messages.
Determination of own accesses
- The remote stations then determine their so-called “ICE candidates”. ICE candidates are combinations of IP addresses and ports via which the computers can be reached internally or externally, i.e. outside any firewalls. For this purpose, STUN and ICE servers must be involved, namely to determine one's own public IP addresses (STUN) and to find and test external accesses (ICE).
- A technology known as "hole punching", among others, is also used. The servers make the session data which they use for communication with one of the remote stations available to the other remote station. The latter can then use the feedback channel of the other remote station's connection to the server to break through any firewall.
- The result of this process is therefore a list of IP addresses and ports which can be used for the media channel.
Communication on the media stream
- The two peers then decide on the ICE candidates to be used, i.e. the IP addresses and ports to be used for transmitting media data. This also includes a number of additional connection properties such as the codec to be used. The aim is as quick a peer-to-peer connection as possible, without detours if possible.
- If, however, the found ICE candidates of both sides are not compatible, a direct peer-to-peer connection cannot be established. In these cases, the process involves a TURN server as the relay via which the media stream can run.
Setup of media connection
- If the processes above have been successfully completed, the media channels found can be used for communication. This is then performed with the RTP or the SRTP protocol.
Possibilities with WebRTC
There are many applications for WebRTC. Whether simple browser-browser communications, online games, screen sharing applications or full-fledged attendant solutions: it is generally expected that the WebRTC technology will become widely accepted in the near future.
Additional articles regarding WebRTC
Getting Started with WebRTC
WebRTC in the real world: STUN, TURN and signaling
An Intro to WebRTCs NAT/Firewall Problem