Video Blueprint

Design and implement Ticket [#255]


The goal is to implement the support for p2p and multi-party video sessions is the SIPSIMPLE middleware. VP8 and H.264 codecs must be supported.

The video stream should behave like the audio stream at the transport level: SRTP and ICE should be usable. It must comply with the IMediaStream interface, the same way AudioStream does.


In the picture below the integration of the video support into SIPSIMPLE is shown divided into it's different components:

  • PJSIP: a new transport is needed. Instead of creating a whole new transport, which would require to implement SRTP and ICE again, a transport adapter will be implemented. A transport adapter sits between a real transport and a pjmedia_stream. To the stream, this adapter will look like a media transport, and to existing media transport, this adapter will look like a stream. The benefit of this approach is we can use the same adapter for both kind of media transports, that is the UDP and ICE media transport. This is exactly the approach that was taken for SRTP. This transport adapter will be responsible for encoding/decoding the video information.
  • Middleware:
  • VideoTransport: the VideoTransport will use RTPTransport to carry the video data and will be responsible for building the SDP for the video stream. A new option will be added to RTPTransport so that it starts the video transport adapter instead of the regular transport when needed.
  • VideoStream: implements IMediaStream interface. Will export a plugable mechanism so that the application layer can access the video data and display it in a window for example, similar to ExternalVNCViewer on MSRP streams.
  • Application: the application will receive the video data from the stream and 'paint' it on a window.

Video Acquisition

First approach will be to create pjmedia_videostream object which will do the video acquisition at low level and pass it to transport_video.


This milestones should be achieved in order to get video working:

  • SDP negotiation: make SIPSIMPLE able to generate and negotiate a valid video SDP.
  • Null video: SIPSIMPLE will generate a valid RTP payload with dummy data that will be exchanged after a successful SDP negotiation.
  • Video reception an still image sending: add the ability to receive and display the remote video stream and generate a valid video stream from an still image.
  • Video acquisition: send real video data.

Encoder and Acquisition Choices

  • libVLC can be used for encoding and decoding the video data. It has ctypes based Python bindings, that should be used at the application level to display remote video. Acquisition will most likely be done in C. A libVLC shared library will need to be built with all necessary module statically linked: h264 encoder/decoder, core modules, etc.
  • ffmpeg may be used instead of libvlc, is more lightweight

Video Mixer

For multi-party conferencing scenario, the party that is the mixer must overlay its own video with the inputs from the conference participants RTP video streams into a new composed screen that is then sent back to the participants. is a multi-layer video mixer C++ with Python bindings

The active speaker must be rendered more prominently than the listening parties. The video overlay must be dynamically changed by detecting the party that speaks louder from the RTP audio stream. The active speaker takes the top side of the screen while the other participants must be rendered in thumbnail mode horizontally.

videostream-components.png (13.1 kB) Klaus Darilion, 09/01/2009 11:40 pm (1.3 kB) Klaus Darilion, 09/04/2009 03:23 pm (683 Bytes) Klaus Darilion, 09/04/2009 03:23 pm (1.2 kB) Klaus Darilion, 09/04/2009 03:36 pm

videostream-classes.png (17.9 kB) Klaus Darilion, 09/04/2009 03:37 pm

video_design.png (56.4 kB) , 05/07/2010 04:34 pm