H.264 vs MJPEG - Quality and Bandwidth TestedBy: John Honovich, Published on Jul 28, 2010
Encoding video optimally is critical for IP video surveillance systems. Should you choose H.264 or MJPEG? While industry momentum certainly favors H.264, when and how to best use H.264 are important questions.
We believe the 3 key questions in considering H.264 vs MJPEG are:
- How much bandwidth savings does H.264 provide over MJPEG?
- What type of image quality differences can you expect between H.264 and MJPEG?
- What differences in computing load will you experience between H.264 and MJPEG?
This report provides our results and recommendations on the first two questions - bandwidth and image quality.From our tests, we recommend the following 3 key criteria to understand H.26/MJPEG's impact on quality and bandwidth:
- Determine the complexity of the scene being captured/recorded
- Determine the streaming mode / control of the camera being used
- Determine the ratio of total frames to I frames (for H.264)
The tests reveal no magic numbers - the answer is not 80% bandwidth savings or 10% less quality (or any other single value for X or Y). While H.264 generally reduces bandwidth consumption significantly, it depends on multiple factors (including complexity, streaming mode, frame rate and i frame rate). VBR vs CBR selection is especially important, having a especially large impact on use and performance. Finally, while H.264 can deliver the same visible image quality as MJPEG, depending on the settings you use (especially streaming mode), you can easily generate worse quality.
Inside our premium report, we examine and explain each of these elements in-depth with a series of sample videos and tutorial video screencasts. The video introduction below overviews the approach we took:
Premium members should allocate 1-3 hours to read the full report
All surveillance video is compressed / encoded (whether it is MJPEG, MPEG-4, H.264, etc.). The only question is how much and what type of compression/encoding is performed.
The main difference between H.264 and MJPEG is that MJPEG only compresses individual frames of video while H.264 compresses across frames. For MJPEG, each frame of video is compressed by itself, just as if you were compressing a series of JPEG images manually (ergo Motion JPEG). For H.264, some of the frames are compressed by itself (called an I or initizaliation frame) while most of the frames only record changes from the previous frame (called P or progressive frames). This can save a significant amount of bandwidth compared to MJPEG which encodes each frame anew.
[Note: This description above is for introductory purposes only. It is reductionistic and only mentions elements most common in today's IP video surveillance systems.]
Compression is limited by complexity. All compressions depends on discovering patterns and representing those patterns by shorter codes/messages. The more complex or the more seemingly random a pattern is, the less likely it is for a pattern to be compressed (or the harder it is to accomplish this). While H.264 can compress 'more' than MJPEG, this principle is an in important element in understanding variances in performance for video encoding.
Stream Analyzer: In this test, we used a free stream analyzer from Avinaptic to examine the H.264 streams. Download the Avinaptic software. In our screencast case study, we examine a test video clip showing variations in complexity and quality (download the stream analyzer test sample).
Criteria 1: Determine the complexity of the scene being captured/recorded
Whether you use MJPEG or H.264, it is important to know the complexity of your scene. However, it is even more important to understand this when using H.264. This is because variation in bit rates for H.264 are more significant than those for MJPEG (even though in absolute terms for our tests, H.264 bandwidth consumption was always lower at any complexity).By scene complexity, we mean how much activity is occurring in the scene of video that you are capturing. For instance, a person talking in front of a white wall is far less 'complex' than a crowded stadium. In general, the more color, shapes, sizes, objects and movements in a scene, the more complex that scene will be.
The more complex a scene is the more bandwidth will be needed to maintain the same quality level. This is inherent to all CODECs.
Equally important, the complexity of a scene can change depending on the time or day or the time of year. For instance, a group of people meeting in a lunch room is a far more complex scene than that same lunchroom on Saturday when the office is closed. To maintain the same quality, all CODECs will require more bandwidth for the period when a group of people meet than when the lunch room is unoccupied.
More complex scenes are often the most important scenes within video surveillance as they reflect activity and potentially problems (or at least activities of interest).
In the video below we show you bandwidth consumption differs for a variety of common scenes:The impact of complexity on bandwidth differs significantly between MJPEG and H.264. In our tests, with MJPEG, the difference in bandwidth needed for the least to most complex scenes only differed by a factor of 3. However, in our tests with H.264, the amount of bandwidth needed varies by about 20 times.
The graph below documents the relationship between scene complexity and bandwidth consumed for MJPEG and H.264 VBR codecs tested:
The H.264 range variance results from differences in how MJPEG and H.264 compresses video. H.264's bit rate benefits are maximized with less complex scenes as it maximizes H.264's ability to compress across frames. By contrast, MJPEG does not compress across frames so it gains less from less complex scenes. However, since MJPEG does need to compress individual images and since more complex images still require more bandwidth, MJPEG bandwidth demands does increase but more modestly than H.264.It is a fallacy that MJPEG's bandwidth demands are constant or that the stream size does not vary with complexity. Many manufacturers set their MJPEG streams to fixed image sizes, giving the impression that MJPEG is inherently fixed. They can safely do this because the variance in bandwidth size due to complexity is relatively limited for MJPEG. However, this does expose some modest level of quality loss (or bandwidth inefficiency).
Even though H.264 offers significant bandwidth savings across the board, the broad range of needed bitrates for different scene complexities introduces a design problem that was not significant with MJPEG.
Criteria 2: Determine the streaming mode / control of the camera being used
The most important aspect of streaming mode is understanding the use of Constant Bit Rate (CBR) vs. Variable Bit Rate (VBR) for H.264. While our tests show that fixed image size can be used fairly safely for MJPEG without thought or planning, such an approach is risky and problematic for H.264.
With Constant Bit Rate, the IP camera will maintain the same bandwidth level regardless of the scene's complexity. If bandwidth is insufficient to match the complexity, quality will be sacrificed.
With Variable Bit Rate, the IP camera will keep adjusting the bandwidth level to hold the quality steady with the scene's complexity.
In the video below, we show how VBR bit rate changes rapidly when the scene changes but with CBR it always stay the same:
Resolution does not ensure quality. This is critical to appreciate and an important practical issue in the use of CBR streaming. All video in surveillance is compressed and that compression loses some of the original data (called lossy compression). What's key here is how 'lossy' the compression is. This is controlled by the quantization level - The higher the level, the lossier the compression and the worse visibly the video will appear.
With CBR, if the bandwidth is insufficient, the IP camera provider will reduce quality. This can be accomplished in two ways - reduce visible quality of a given frame or reduce the number of frames streamed. Manufacturers vary in what options they provide and what the default choice is. In the video below, we show examples of different manufacturers naming conventions, defaults and options for CBR quality degradation.
As a side note, with CBR, if the bandwidth level is 'too high' for the a low complexity scene, the quantization level will be lowered, providing for a more lossless compression. Technically the bits are not wasted but practically they are, as the reduction in loss often provides no visible benefits for the user/application.
If the IP camera provider using CBR decides to reduce the visible quality of a given frame this will be done through an increase the quantization level
Raising the quantization level is at the heart of why H.264 video can look worse than MJPEG video. In the video below, we show H.264 CBR streams at multiple bit rates. Using a stream analyzer, we show how the quantization level varies and how this is correlated with changes in visible video quality.
Should You Use CBR or VBR?
Deciding on whether to use CBR or VBR for H.264 is perhaps the most important question in using H.264. The choice has significant impacts on bandwidth savings, visual quality and infrastructure planning.
If you use CBR, you simplify the planning of your infrastructure - specifically it becomes simple to design and ensure that your IP video works properly over your networking devices (e.g., switches, routers) and with your computers (e.g., servers running VMS software, storage appliances). Your infrastructure needs are a simple multiplication of total streams times stream size.
The two major downsides of using CBR are (1) potential quality degradation or (2) infrastructure inefficiency. If you set the CBR rate too low for your scene's complexity, you will lose either frames or quality levels (as described above). If you set the CBR rate too high, you will waste storage and require more networking resources than needed for your video.
VBR has the opposite strengths and weaknesses. With VBR, you can be confident that the quality of your video will be maintained. Also, the total amount of storage and bandwidth will be minimized as the bit rate will rise and fall to match the scene's complexity.
The big problem for VBR is that it makes infrastructure planning more difficult. Planners need to estimate and accommodate worse case scenarios or risk service problems.
While CBR and VBR are the two main approaches for streaming H.264, manufacturers implementation can be modified or combine the two. In our tests, we found 2 'hybrid' approaches that may be of interest:
- Sony's CBR, when set at a bit rate too low for the configured frame rate/resolution, will silently 'over-ride' your bit rate setting and choose a bit rate that can be significantly higher. Our hypothesis is that the encoder determines the bit rate is just too low to provide anywhere close to acceptable quality. While this makes sense, it can be an unpleasant surprise if you have designed your network to meet the configured CBR rate.
- Panasonic offers a bandwidth range with minimum and maximum bit rates. As such, this is a constrained VBR that fits a restricted range of bit rates. This provides some flexibility to handle higher complexity scenes. However, in our tests, this feature did not appear to work as specified.
Criteria 3: Determine the ratio of total frames to I frames
A big part of H.264's power comes from P frames. These are the frame that only 'predicts' the progress or changes over the last fraction of a second. While the size of P frames may vary (depending on scene complexity/changes), usually P frames are far smaller than i frames. In the video below, we show the size of I and P frames in a stream analyzer:
[Note: there are B frames as well but the overwhelming majority of video surveillance H.264 implementations only use P frames currently.]
The more P frames there are to I frames, the more bandwidth savings that H.264 provides. For instance, a 30fps with only 1 I frame per second requires far less bandwidth than a 30fps with 30 I frames per second (meaning every frame is an I frame and no P frames are streamed. However, both produce savings over MJPEG, though the all I frame streaming has a much lower bandwidth reduction compared to MJPEG. In the video below, we demonstrate these tradeoffs:
Knowing how the ratio of total frames to I frames can be difficult but is quite useful. It depends both on what the IP camera manufacturer supports and what the VMS chooses to implement. In our experience, the most common I frame rate is 1 per second (regardless of how many frames per second total are streamed). However, IndigoVision standardizes on 1 I frame per 4 seconds and Axis Camera Station defaults to 1 I frame per 32 p frames (meaning that if you record at 3 fps, an I frame will only be sent/generated every 10 seconds).
VMS Systems tend to want shorter I frame intervals to improve video playback / display. The longer the I frame interval, the more likely there will be delays or problems in displaying live or recorded video. Since P frames only 'describe' a part of the image, it is generally not possible to display a full image on-screen until an I frame appears. If, for example an I frame is only generated every 30 seconds, this can create usability problems.
Sample Performance With Different Setting Combinations Used
Providing a singular metric or data point on H.264 quality or bandwidth compared to MJPEG is irresponsible. It obscures important design elements that the professional users needs to appreciate and address.
In the final section of our report, we provide 9 setting combinations that demonstrate different tradeoffs in the key driving parameters of video encoding performance: (1) scene complexity, (2) I frame rate, (3) Frame rate and (4) CBR vs VBR.
Scene Complexity Tradeoffs
Daytime Indoor - H.264 bandwidth was 200 Kbps. By contrast, MJPEG bandwidth was 11.8Mbps, a 59x difference, and a 98% bandwidth savings. Image quality between both codecs were similar; we did not note any significant variances.
Daytime Indoor with movement - H.264 bandwidth was 790 Kbps. MJPEG bandwidth was 13.71Mbps, a 17x difference, and a 94% bandwidth savings. No significant image quality differences observed.
Daytime Outdoor - H.264 bandwidth was 2.63 Mbps. By contrast, MJPEG bandwidth was 39.23Mbps, a 15x difference, and a 93% bandwidth savings. No significant differences in image quality observed.
Night Indoor (1 lux) - H.264 bandwidth was 720 Kbps. By contrast, MJPEG bandwidth was 13.27Mbps, a 18x difference, with a 95% bandwidth savings. Although both images look similar in quality, the MJPEG scene has a little more visible noise, while the H.264 image is a little softer.
Night Indoor (pitch black) - H.264 bandwidth was 2.92 Mbps. By contrast, MJPEG bandwidth was 15.15 Mbps, a 5x difference, and 81% bandwidth savings. A totally black image in itself is not a complex scene, but the random camera noise on screen (which vary between vendors) raises the complexity significantly. Although it is not apparent on the exported clips or screencaps, we witnessed the MJPEG scene suffer from significantly more camera noise than the H.264 scene.
Night Outdoor - H.264 bandwidth was 1.89Mbps. By contrast, MJPEG bandwidth was 17.57 Mbps, a 9x difference, with 89% bandwidth savings. No significant image quality differences observed.
I Frame Rate Tradeoffs
30fps, 30 i frames per second indoor daytime - In this scenario, H.264 bandwidth was 3.48 Mbps. By contrast, MJPEG bandwidth was 11.8 Mbps, which shows a 3x difference. Although having a maximum I frame ratio, this scenario did not show any visible quality gain, but still having a bandwidth savings of 71% from the MJPEG scenario.
30fps, 1 i frame per second indoor daytime - In this scenario, H.264 bandwidth was 280 Kbps. By contrast, MJPEG bandwidth was 11.8 Mbps, which shows a 42x difference, and a 98% bandwidth savings. This being a default i frame configuration for many camera vendors, this scenario has no significant variances in MJPEG image quality.
1fps, 1 i frame per second outdoor daytime - This scenario shows the H.264 bandwidth at 1.1Mbps. Contrasting this to a 1fps MJPEG scenario, with bandwidth consumption at 1.3 Mbps, a 1.2x difference, and 15% bandwidth savings. No obvious visual differences in quality. Contrast this to the example above for 30fps, 30 i frames. While the ratio of I frames to total frames is the same (1:1), the scene above is indoor daytime - delivering significantly enhanced bandwidth reduction because of the relatively simpler scene.
30fps, 1 i frame per second outdoor daytime - H.264 bandwidth was 2.63 Mbps. Contrasting this to a 1fps MJPEG scenario, with a bandwidth consumption of 39.23 Mbps, a 15x difference, and 93% bandwidth savings . Quality wise, no obvious visual differences, but h264 clip is running slightly less frames at 27.99fps.