What Is In An IP Video Stream?

I am hoping some of the brain power on here can help me (us) understand what makes up an IP Video Stream. I know we have H.264 as a compression method and MJPEG. There is ONVIF which is supposed to foster compatibility and standardization. Then we have the upmteen protocols listed in the spec sheets - TCP/IP, UDP/IP, RTP(UDP), RTP(TCP), RTCP,RTSP, NTP, HTTP, HTTPS, SSL, DHCP, PPPoE, FTP, SMTP, ICMP, IGMP, SNMPv1/v2c/v3(MIB-2), ARP, DNS, DDNS, QoS, PIM-SM, UPnP, Bonjour...

But if we pull back the "sheets" and deconstruct a typical IP video stream, what is it that makes it all work with different VMS providers and/or browsers. At least with a video signal I understand the 1V P-P and YC waveform. Though I have had some experience and am familiar with the OSI 7 layer communications model, I don't completely understand how IP Video streaming communications work. Help me (us) understand the fundamentals here and what to care about and what not to...

Do you know what an API is? What makes cameras work with different VMSes are APIs.

Vendors build APIs for their products. Third parties get those APIs/SDKs and then use them to connect.

Think of ONVIF as a type of API but an open one which lots of companies all agree to use.

See our API / SDK Tutorial.

As part of an integration using an API, certain protocols may be used. In video surveillance, the most common is RTSP (e.g., it is the 'foundation' of ONVIF).

To me, though, what's more important is understanding bandwidth consumption, which has a big impact on server specification, network congestion and storage use. TCP vs UDP or HTTP vs HTTPS has far less common impact than bandwidth load. Background, see: Advanced Camera Bandwidth Test Results, IP Camera Bandwidth / Storage Shootout, Bandwidth vs. Image Quality Shootout, etc.


Any desire/capacity to setup a wiki:glossary of various terms with industry focused definitions? Member powered but IPVM approved. Min Effort -> Max Effect. Besides its just good fun to read glossaries (and indices)!

The best way to disect your data to see what you're sending is to use Wireshark.

It shows you perfectly how a package is build up and what is used on which OSI layer.
You're asking a really broad question though. As ONVIF, H.264 and RTSP are completly different subjects. You could basicly decode it so far back that eventually you will reach the Manchester Coding.

My advice would be, to pick up Wireshark and play around.

Here is a broad way that it works in my own words.

Start with the Internet Protocol (IP). Inside that "protocol tunnel" are various other protocols. For example, Video can be compressed with a codec called H.264 or another codec. Audio can be compressed with codec called ACC, MP3 or etc.. Once the Video and Audio is encoded it needs to be assembled into a container (just a wrapper and not the coding) that can be unknown or said to be proprietary... but I will list a few that we see daily for other streaming video services such as .mkv, .avi, .asf, .mp4 and etc.. Sometimes a container can both be a codec and a container but I won’t get into that. Once the container is built, you need to transport it using a transport method like RTP (Not only one). If the client wants to interact with the server, then you would also need RTSP (Not only one).

Once this gets to the client... then the client unpacks it and etc..

Michael, what are your goals? What do you wish to accomplish with this understanding?

The alphabet soup you list encompases broad topics in IT, so you're going to have to focus.

Most of the protocols you cite are intrinsic to any network devices communicating with each other. If you want to know what they do, a good primer on networking is in order.

Some of these are about APIs used to install, configure, and operate the IP camera. A lot of that depends on the manufacturer and head end system. Much is normalized by ONVIF these days.

A couple are related actually streaming the video over the network. Focus on RTP/RTSP.

Thanks for all the great feedback.

Steve - good question on goals. I guess what I really want to know is more about ONVIF and what the real issues are that inhibit compatibility. If you were associated with AV (Audio Video) and the HDMI nightmare in the beginning, there were tons of issues with incompatibility. Took us all a while to decipher the problems (and there were many) and be able to spec reliable systems and diagnose problems.

With IP CCTV, ONVIF seems to be the great hope for wide compatibility between VMS and Cameras. If this is a reasonable assumption then I would like to dig deeper into the details and better understand what I am dealing with. If not then where should I be digging?