WebRTC Conference

Version 36 (Dan Pascu, 07/08/2017 12:49 pm)

1 18 Adrian Georgescu
h1. SylkServer WebRTC Video Conference
2 2 Adrian Georgescu
3 20 Adrian Georgescu
https://webrtc-test.sipthor.net
4 20 Adrian Georgescu
5 4 Adrian Georgescu
6 2 Adrian Georgescu
h2. Design
7 2 Adrian Georgescu
8 12 Dan Pascu
Two types of conferences are being supported: ad-hoc conferences and moderated conferences.
9 12 Dan Pascu
10 12 Dan Pascu
h3. Ad-hoc conferences
11 12 Dan Pascu
12 12 Dan Pascu
An ad-hoc conference is a conference where all participants have the same status and no one is controlling what other are participants are doing. The participants are rendered in a matrix or up to 3x3 depending of how many participants are in the room. The layout switches automatically for everybody as participants join or leave.
13 12 Dan Pascu
14 12 Dan Pascu
The conference room has a fixed total bitrate configured by the server, that can be specified per room or globally with the max_bitrate setting in webrtcgateway.ini (see below). This bitrate is shared by all participants in the room, meaning that the more participants are in the room, the less bitrate each participant will use for the video stream they send, keeping the total room usage constant to the value configured by max_bitrate. The bitrate adjustment per participant is done automatically by sylkserver as participants join or leave the room, by diving the available bitrate among the number of participants. The end result of this is that each participant will send a fraction of max_bitrate (which is determined by the number of participants in the room) and will always receive a total combined of max_bitrate from all the participants in the room, no matter how many participants are in the room. The formula to compute the bitrate per participant is shown below:
15 12 Dan Pascu
16 12 Dan Pascu
<pre>
17 12 Dan Pascu
participant_send_bitrate = max_bitrate / max(number_of_participants - 1, 1)
18 12 Dan Pascu
</pre>
19 12 Dan Pascu
20 12 Dan Pascu
Using this formula we can make sure that each participant always receives max_bitrate traffic in incoming video streams, independent of the number of participants. The traffic send/received by each party can be expressed like (considering N to be the number of participants and N>1):
21 12 Dan Pascu
22 12 Dan Pascu
<pre>
23 12 Dan Pascu
participant_sent_traffic     = max_bitrate / (N - 1)
24 12 Dan Pascu
participant_received_traffic = max_bitrate
25 12 Dan Pascu
26 23 Dan Pascu
sylkserver_sent_traffic      = max_bitrate * N            (participant_received_traffic * N)
27 23 Dan Pascu
sylkserver_received_traffic  = max_bitrate * N / (N - 1)  (participant_sent_traffic     * N)
28 12 Dan Pascu
</pre>
29 12 Dan Pascu
30 14 Dan Pascu
h3. Moderated conferences
31 14 Dan Pascu
32 14 Dan Pascu
A moderated conference is a conference where a moderator can decide the flow of the conference. The moderator is the first participant to join the conference. The moderator has the ability to see a list with all the participants, can select 1 or 2 of them to be the active speakers and also has the ability to mute other participants (audio and/or video). The moderator can also change the active speakers at any time.
33 12 Dan Pascu
34 16 Dan Pascu
The other participants will see the selected active speakers in full-sized video and the other participants as thumbnails. They will not be able to choose which other participant to watch, the conference view in their browser will be controlled by the moderator that decides who is the active speaker that everybody else sees on their screen in full-sized video.
35 16 Dan Pascu
36 16 Dan Pascu
The active speakers selected by the moderator will have their bitrate set to either max_bitrate (for 1 active speaker) or max_bitrate/2 (for 2 active speakers), while everybody else will have their bitrate set to a low value (64kb/s), just enough to have them represented in small thumbnails on other participant's screens.
37 15 Dan Pascu
38 2 Adrian Georgescu
h2. Features
39 2 Adrian Georgescu
40 17 Dan Pascu
h3. Ad-hoc conferences
41 1 Adrian Georgescu
42 17 Dan Pascu
Ad-hoc conferences are best suited for conversations with family/friends, since bandwidth/bitrate is managed automatically and does not involve a dedicated person to control the flow of the conference. However they can also be used for any other video conferences that imply a free-flowing type of discussion where any participant can jump into the conversation at any time.
43 11 Adrian Georgescu
44 17 Dan Pascu
h3. Moderated conferences
45 11 Adrian Georgescu
46 17 Dan Pascu
Moderate conferences are best suited for a business environment, where participants have to make some sort of presentation in front of the other participants and a moderator is assigned to control the flow of the conference and give the microphone to the appropriate participant, while the others are just watching the active speaker. They can also be used for a conference with 2 active participants that are having a public debate on a subject, while every other participant is just watching it and eventually asking questions.
47 7 Adrian Georgescu
48 2 Adrian Georgescu
h2. Configuration
49 2 Adrian Georgescu
50 10 Dan Pascu
Sylkserver allows the maximum bitrate and video codec to be configured, globally or per room with the following settings in webrtcgateway.ini file:
51 10 Dan Pascu
52 10 Dan Pascu
<pre>
53 10 Dan Pascu
; Maximum video bitrate allowed per sender in a room in bits/s. This value is
54 10 Dan Pascu
; applied to any room that doesn't define its own. The value is any integer
55 10 Dan Pascu
; number between 64000 and 4194304. Default value is 2016000 (~2Mb/s).
56 10 Dan Pascu
; max_bitrate = 2016000
57 10 Dan Pascu
58 10 Dan Pascu
; The video codec to be used by all participants in a room. This value is
59 10 Dan Pascu
; applied to any room that doesn't define its own.
60 10 Dan Pascu
; Possible values are: h264, vp8 and vp9. Default is vp9.
61 10 Dan Pascu
; video_codec = vp9
62 10 Dan Pascu
</pre>
63 10 Dan Pascu
64 10 Dan Pascu
65 1 Adrian Georgescu
h2. Client support
66 1 Adrian Georgescu
67 25 Dan Pascu
h2. Things that were explored
68 25 Dan Pascu
69 28 Dan Pascu
In order to implement bandwidth management and CPU load optimizations we have explored a couple of things, some of which proved fruitful, while with others were abandoned or proved to be not very helpful for our goal.
70 28 Dan Pascu
71 29 Dan Pascu
The original idea we started with was to have each client send two video streams, one low resolution, one high resolution and let the other participants switch between them based on their need (use the high resolution video if the participant was viewed in full or the low resolution video if he was displayed as a thumbnail. As we progressed, we quickly discovered that this setup was a lot more complicated to manage than we have anticipated. Every participant would open 2 sessions to the conference room just to publish their low and high resolution streams, which made them appear duplicated in the conference. Special means needed to be employed to associate two such distinct sessions coming from the same device and present them as a single entity. This had to be done in each client, which meant that older clients would not be able to deal with this setup and they would automatically display every participant duplicated.
72 29 Dan Pascu
73 29 Dan Pascu
In addition this setup would increase the upload bandwidth of each participant 1.5 times, going against the idea of reducing the used bandwidth.
74 29 Dan Pascu
75 29 Dan Pascu
The advantages of this model were the reduced download bandwidth and reduced CPU utilization that resulted from only having to process one high resolution video stream while all the other video streams would be low resolution, which were overshadowed by the higher upload bandwidth being used, by the more complicated room management that was required to deal with devices connecting twice per participant in the room and by the inability to have older devices join such a conference room.
76 29 Dan Pascu
77 29 Dan Pascu
While we were working on this we also run into a technical limitation on Firefox, which was unable to provide 2 video streams of different resolutions at the same time. When we tried to obtain 2 video streams, one low resolution one high resolution, the moment we requested the second stream with a different resolution, the first stream's resolution was updated to match the second and we ended up with 2 streams with the same resolution. This was a limitation in Firefox that we couldn't overcome, so at this point in addition to the issues mentioned above with this mode, we were also facing the prospect of dropping Firefox support and only have our solution work with Chrome.
78 29 Dan Pascu
79 29 Dan Pascu
While we were contemplating our choices here we discovered that there was a mechanism by which a WEBRTC client could be constrained to limit its sending bandwidth and this mechanism could be employed dynamically during a call to make the device's sending bitrate high or low as desired without any need to renegotiate the session. This mechanism uses REMB packets which are control packets sent through RTCP and will make a browser adjust its send bitrate on the fly as requested. The good news was that both Chrome and Firefox supported this. This bit of information changed everything and we realized we could use this to build a better solution, which was a lot less complicated and more effective.
80 29 Dan Pascu
81 35 Dan Pascu
At the same time we realized that the initial model that the webrtc client used, where in a conference room the client would display one participant in full and the others as thumbnails, and then let the user switch which participant to view by clicking on a thumbnail to display that participant in full, was not very useful for a large category of uses, namely users having a group video chat with friends/family. In this case the user is not expected to click a thumbnail to switch to another participant and only be able to see one participant at a time, but instead they would like to see all participants at the same time.
82 29 Dan Pascu
83 36 Dan Pascu
As a result of all this, we went we decided to give up on the original idea with 2 streams of a different resolution per participant and completely change our model. We came up with the 2 models mentioned before: the ad-hoc conference model and the moderated conference model.
84 29 Dan Pascu
85 29 Dan Pascu
h3. The ad-hoc conference model
86 29 Dan Pascu
87 29 Dan Pascu
The ad-hoc conference mode was supposed to be used for a group chat with friends/family where one expects to see all the other participants on the screen at the same time and any participant can jump into the conversation at any time. In this model we decided to display all participants in a matrix, so everyone is visible at any time. Initially the matrix is just 1x1 when there are just 1 or 2 people in the room, but it can grow up to a 3x3 matrix that can accommodate up to 10 participants (9+yourself). This model proved to be favored by the idea of using REMB to limit send bitrate, because the more participants on screen, the smaller their video would be, which aligned perfectly with the idea of having a constant room bitrate that is shared by all participants: the more participants, the lower their bitrate would be and also the lower their video frame would be on screen compensating for the reduced quality of their video stream.
88 29 Dan Pascu
89 29 Dan Pascu
In order to compare the bandwidth used by this mode and the original model we attempted (the one with 2 video streams per participant, one lowres, one hires), lets consider the bitrate used by an HD stream (1280x720) playing at 30fps. This bitrate is ~2.0-2.4Mb/s, and let's call this B. We have found that for a thumbnail sized video stream of 180x120 pixels at 30 fps, the bitrate requirement was still very high, in the range or B/3 to B/2. As a result in the original model each participant had to send anywhere between 1.2*B to 1.5*B. At the same time, because only one participant was big on screen and all others were thumbnails, each participant would receive B + (N-1)*B/2 = B*(N+1)/2, where N is the number of participants. In the ad-hoc conference model, as mentioned before, each participant receives B and sends B/(N-1).
90 29 Dan Pascu
91 29 Dan Pascu
In order to compare these numbers, lets consider B = 2Mb/s and N=9.
92 29 Dan Pascu
93 29 Dan Pascu
In the original model, each participant would have sent 1.5*2 = 3Mb/s and would have received 2*(9+1)/2 = 10Mb
94 29 Dan Pascu
In the ad-hoc conference model, with B being set as the room maximum bitrate, each participant would send 2/(9-1) = 0.25Mb/s and would receive 2Mb/s
95 29 Dan Pascu
96 29 Dan Pascu
These numbers show how the ad-hoc conference model with controlled bitrate per participant is a lot more competitive as far as bandwidth management goes compared to the original model we started with, being 5-12 times more efficient in the amount of data sent/received.
97 29 Dan Pascu
98 29 Dan Pascu
In addition the ad-hoc conference also provides a much better user experience allowing all participants to be visible on screen at once.
99 29 Dan Pascu
100 29 Dan Pascu
Another thing we noticed with Chrome, while using VP8 as a codec, was that with more than 3 participants in a room, Chrome started to dynamically adjust the resolution of the video being sent, fluctuating between HD and VGA resolution, depending bitrate it was allowed to use and the amount of movement in the encoded video stream. This was an added bonus because it meant that with more participants in a room that would impose a lower bitrate value per participant we expected Chrome to to this more often, and thus we could achieve not only improved network bandwidth usage, but also lower CPU usage.
101 29 Dan Pascu
102 29 Dan Pascu
Unfortunately Firefox did not have this behavior, Firefox would maintain the original resolution value requested when the stream started for the whole duration of the call regardless of the bitrate limitation being imposed on it. In order to compensate for this we tried to request resolution adjustments based on the number of participants in the room, in order to reduce the CPU usage when the number of participants in a room increases. Unfortunately this did not prove successful, because doing this does not yield reliable results. Sometimes Firefox will switch resolutions without a problem, some other times the camera will attempt to switch resolutions and will not reopen at the new resolution, which results in the video stream not being sent anymore (it freezes on the last frame before the resolution change was attempted). This result seems random and we could not determine what causes it or how to fix it. It is also worth mentioning that this is a problem we noticed with Firefox running on OSX on a Macbook Pro with a built-in camera. We do not know if a similar problem exists for external cameras or on different operating systems (Linux or Windows).
103 29 Dan Pascu
104 29 Dan Pascu
Still the idea of exploring this feature is still open because it is a much better solution than Chrome's automatic resolution adjustment, because it yields more reliable and consistent results. Chrome switches resolution based on other factors than just bitrate and it doesn't seem to do it often enough to be effective. In addition we have found that with VP9 as a codec, Chrome would not lower the resolution of a video scream even when the bitrate is as low as 256Kb/s.
105 29 Dan Pascu
106 29 Dan Pascu
h3. The moderated conference model
107 28 Dan Pascu
108 30 Dan Pascu
Since the ad-hoc model is not best suited for every application, we also considered the moderated conference model. In this model a moderator would control the flow of the conference. The moderator is the first participant that joins the conference. The moderator would be able to see a list with all the participants, decide who is the active speaker and mute audio/video per participant when needed. In this model only 1 or at most 2 participants can be active speakers at a time and who they are is decided by the moderator. Participants cannot select what other participants they see on screen. This is decided by the moderator which selects the active participants that will be shown on everyone's screens, while all others are shown as thumbnails.
109 30 Dan Pascu
110 30 Dan Pascu
With 1 active speaker the conference is suitable for cases like when some people need to give a speech or show a presentation for others to watch. In this case the moderator simply switches the active participant by giving the next speaker their stage time.
111 30 Dan Pascu
112 30 Dan Pascu
With 2 active speakers at the same time, the conference can be used for example for having a public debate on a subject, where the active speakers debate the subject while the rest of the participants just watch the debate, or ask questions if needed.
113 30 Dan Pascu
114 30 Dan Pascu
In this model, each active speaker will have their bitrate limited by max_bitrate / number_of_active_speakers, while everyone else will just have a very low bitrate value (64Kb/s) so they can be displayed as thumbnails.
115 30 Dan Pascu
116 30 Dan Pascu
Considering B the bitrate for an HD stream @30fps, N the number of participants in the conference and AS the number of active speakers:
117 30 Dan Pascu
118 30 Dan Pascu
Each active speaker will send B/AS
119 30 Dan Pascu
Everyone else will send a constant 64Kb/s
120 30 Dan Pascu
Everyone in the room will receive B + (N-AS)*64Kb/s
121 30 Dan Pascu
122 30 Dan Pascu
For B=2Mb/s, N=10, AS=2 we have:
123 30 Dan Pascu
Each active speaker send 1Mb/s
124 30 Dan Pascu
Everyone else sends 64Kb/s = 0.064Mb/s
125 30 Dan Pascu
Everyone in the room will receive 2Mb/s + (10-2) * 0.064Mb/s = 2.512Mb/s
126 30 Dan Pascu
127 30 Dan Pascu
As can be seen, these numbers also show that the moderated conference model is also a lot more efficient that the original model with 2 streams per participant.
128 30 Dan Pascu
129 31 Dan Pascu
h3. Mobile device considerations
130 31 Dan Pascu
131 32 Dan Pascu
Because mobile devices have both more limited resources and more limited screen space available, we consider using the following technique for small mobile devices:
132 32 Dan Pascu
133 32 Dan Pascu
For both ad-hoc and moderated conferences, the mobile client will only display 1 or at most 2 participants in full view. For a moderated conference they are already decided by the moderator, while for an ad-hoc conference the user can select 1-2 of the participants to be seen. For the other participants the device will pause their video streams and not show thumbnails for them, but instead show them as static icons or just display them in a list. By doing this, the mobile device not only prevents screen clutter allowing for a more efficient use of the limited screen space, but by pausing the other participant's video streams, it will dramatically reduce it's CPU usage because it will not need to receive and decode their video streams just to display them as thumbnails.
134 32 Dan Pascu
135 32 Dan Pascu
By using this technique, a mobile device will only have to deal with decoding and displaying 1 or at most 2 video streams which is fully within the device's processing capabilities, regardless how many participants are in the conference room.
136 32 Dan Pascu
137 1 Adrian Georgescu
h2. Measurements
138 1 Adrian Georgescu
139 21 Adrian Georgescu
These load measurements were done on a Macbook Pro 15" with a 2.3GHz Intel Core I7 CPU, while having 7 participants in the room with each using 336Kb/s. The measurement shows the CPU usage in Firefox web browser with the aforementioned conditions, for the specified video codecs and resolutions which are used by all participants:
140 1 Adrian Georgescu
141 10 Dan Pascu
<pre>
142 10 Dan Pascu
 * H264/VGA - 150% CPU
143 10 Dan Pascu
 * H264/HD  - 250% CPU
144 10 Dan Pascu
 * VP9/VGA  - 220% CPU
145 10 Dan Pascu
 * VP9/HD   - 350% CPU
146 10 Dan Pascu
</pre>
147 6 Adrian Georgescu
148 31 Dan Pascu
As far as CPU utilization goes, most efficient codec is H264 (presumably because it has hardware accelerated support on a lot of devices), followed by VP9 and last is VP8.
149 6 Adrian Georgescu
150 31 Dan Pascu
In a conference with 2 participants both sending HD video (1280x720 @30fps), on the same laptop mentioned above we noticed the following CPU load values in Firefox:
151 31 Dan Pascu
152 31 Dan Pascu
<pre>
153 31 Dan Pascu
 * VP8  - 130% CPU
154 31 Dan Pascu
 * VP9  - 100-110% CPU
155 31 Dan Pascu
 * H264 - 50-70% CPU
156 31 Dan Pascu
</pre>
157 31 Dan Pascu
158 1 Adrian Georgescu
h2. Conclusions
159 1 Adrian Georgescu
160 34 Dan Pascu
We consider that the ad-hoc and moderated conference models offer much better results that the original two-streams-per-participant idea. In addition not only do they offer a better and more natural user interface, they also allow for more control from the server that can decide both the codec to be used and the bitrate limit per room, thus controlling the quality of the call in a single place.
161 1 Adrian Georgescu
162 1 Adrian Georgescu
For now we consider a room with a 2Mb/s bitrate limit using VP9 to be the best compromise between quality and resources being used. For the moment we cannot recommend H264 despite the huge improvement it would provide, especially for mobile clients, because we have found some compatibility issues for the mobile clients, where the mobile client would display a green screen for any incoming video stream with H264.
163 33 Dan Pascu
164 33 Dan Pascu
h2. Remaining tasks
165 33 Dan Pascu
166 33 Dan Pascu
 * sylkserver: control and feedback interface for moderator
167 33 Dan Pascu
 * janus: patch to request full frames when a paused video is resumed
168 33 Dan Pascu
 * Rebuild mobile version
169 22 Dan Pascu
170 22 Dan Pascu
h2. Software that was modified
171 26 Dan Pascu
172 26 Dan Pascu
In order to implement the bandwidth management and CPU load optimizations the following software was modified:
173 26 Dan Pascu
174 26 Dan Pascu
# sylkserver https://github.com/AGProjects/sylkserver
175 26 Dan Pascu
# sylk-webrtc https://github.com/AGProjects/sylk-webrtc
176 1 Adrian Georgescu
# sylkrtc.js https://github.com/AGProjects/sylkrtc.js
177 27 Dan Pascu
# python-application https://github.com/AGProjects/python-application
178 27 Dan Pascu
# python-sipsimple https://github.com/AGProjects/python-sipsimple