Are Camera-Side Analytics More Accurate?

After reading several instructive articles regarding video analytics detailing what the pros/cons of camera-side vs. server side placement are, I didn't notice any mention of the effect of analytics being done pre-compression vs. post-compression.

Certainly since only the camera (or encoder) could do pre-compression analytics, I would think that this could be a major factor in performance, depending how much compression was involved, of course.

Practically speaking, with no compression artifacts, wouldn't it allow settting a much lower threshold for alarms, say with motion detection?

Related, I have heard that most analytics typically reduce any MP resolution images to D1 before processing. If this is the case, given two compressed streams from the same camera, one high-res/high-comp and the other one low-res/low-comp, would the low-res be more accurate and with less FA than the high-res one, at VMD for example?

To my understanding, no IP cameras does analytics on 'pre-compressed' images / videos.

Can you explain / clarify what IP cameras you have confirmed do analysis on pre-compressed video?

... no IP cameras does analytics on 'pre-compressed' images /videos.

Well that would explain why it's not mentioned then.

So, to be clear, are you saying that the camera actually decodes its own H.264 stream, right after encoding, and then performs analytics on the decoded image?

Honestly it didn't even occur to me that they would compress before analytics. But due to the greater use of SOC architectures, or because of proccessing limitations/latency concerns perhaps, the camera might have to perform the analytics in parallel with an encoded stream. Maybe there are even better reasons. Thanks for the checking the assumption I didn't even know I made.

Though there are at least some major IP camera manufacturers the claim to work on the uncompressed image data.

For instance Sony's DEPA's write up

4.2.2 Pre-Compression Image Processing Yields Higher Accuracy

Compression artifacts that generate digital noise and trigger false alarms are an inherent shortcoming of digital security systems that consolidate analytic functions at the back-end. DEPA’s workflow eliminates this design issue by carrying out front-end processing inside the cameras. These analytics take place before video data is compressed for transfer over the network. Consequently, the object data concerned is not affected by digital noise regardless of the compression type and ratio used later.

and DvTel offers this

Because our video analytics are edge-based, they reduce the use of server resources and images are processed before compression, which increases image quality. The bottom line is increased detection capability and fewer false alarms under adverse light conditions.

Finding camera analytics that say they work on compressed images is harder, not because they don't exist but because they are not bragging about it...

"It depends".

There are lots of different applications for analytics and a lot of ways to go about analyzing video. Some do simple pixel movement/differentiation, some do pattern analysis, some are manually calibrated, some are self-learning. There are still many platforms that downsample video to CIF resolution, and a few that can process higher resolutions (up to 720p or 1080p).

I'm not aware of any camera-side analytics that operate on a full uncompressed stream (based on the way the "image" is created in the camera, that would be hard to do). But camera-side processing *can* have the advantage of more control over the stream. If, for example, the camera was built with analytics in mind, the analytics engine could theoretically have access to a dedicated internal "stream", which can be adjusted based on the scene (scale resolution up or down, change compression aspects, and so forth) so that the analytics are not dependant on the parameters of stream optimized for bandwidth or storage or whatever.

Analyzing live video to detect objects of interest can be somewhat computationally intensive. Given "Moore's Law", it's practical to have a processor embedded in the camera that can handle this process at the edge. You eliminate network issues, and various contentions on a PC (OS issues, resource limitations) by running the analytics within the camera.

There are also hybrid approaches, where objects are detected in-camera, and then meta-data is streamed back to a server to handle the "rules" part of the problem.

Having analytics on the camera does not make them "more accurate", but it will *typically* indicate a slightly more refined or advanced system. You may be relatively limited in terms of camera choices, but the software is probably slightly more optimized to the hardware. Still, there are some PC-based analytics that beat some camera-based analytics, so none of this is an absolute.

This is a little bit like asking "Is all wheel drive better than 4 wheel drive", the reality is that there can be a LOT of minor distinctions that change what means "best" to you. What's really important is "Do you get the desired results within your budget?".

If you're evaluating analytics options, I would encourage you to look at both types. Pay attention to setup/calibration time (initially and ongoing), ease of tweaking/tuning the system over time, and coverage area/distance per camera (this can be a big one, I've seen some $50 solutions that cover 1/10th the range of a $200 solution, but look cheap on paper)

Good info, B, thanks. Couple questions...

I'm not aware of any camera-side analytics that operate on a full uncompressed stream (based on the way the "image" is created in the camera, that would be hard to do).

I understand why it might not be feasible to work on the full stream, if what you mean by 'full' is full resolution. But I'm having a harder time understanding why the embedded analytic can't extract meta-data from the frame before the lossy compression has created artifacts.

What do you mean exactly by 'based on the way the "image' is created'? Are you referring to a standard CMOS rolling shutter style readout? The h.264 encoding libraries I've played with work no lower than at the frame level, you populate some some two dimensional array/structure with chroma and luminance values and you say encodeFrame(_frameData), or some such construct. So someone as naive as me might conclude that one could use that bitmapped frame as the input to the analytic engine before or in parallel to the compression.

Which camera-side analytics are you aware of that definitely work with the image data AFTER first being compressed? It's hard to tell because those who compress first don't brag. So far, in addition to Sony and DVTEL, Axis claims to provide uncompressed image data to Analytic Plug-ins:

Fredrik Nilsson, general manager of Axis Communications, explains further: “We are not the expert analytics company, but we are experts in partnering and will open up the cameras’ processing power and access to uncompressed video to our video analytics partners...

Random Prediction: In the coming years, deep knowledge of Analytics will be one of the few ways that top integrators will be able to differentiate themselves from the rest of the pack, and thereby justify a sizable premium.

Can you share with us an example of the difference in quality between a 'pre-compression' image and a post-compression image?

Surely there's a big difference between a CIF, VGA and 1080p image. That is a big problem, for example, Axis camera analytics, see: License Plate Recognition Axis App Tested.

Although I haven't an accessible example to post, our ADP system always shows visible compression artifacts, with distinct boundaries between small rectangles across the image. I expected that these rectangular pixel blocks are associated with the H.264 compression algorithm. I also imagined that your readers would have often seen and would be familiar with these sorts of compression-based artifcats.

No, that's just a phantom attack of 9x9 macroblocks from the planet Cosine. ;)

Again, show me the quality difference between the two. Then we can better evaluate whether it has any impact on analytic performance.

In talking with AgentVi for their on-camera-app analytics for Samsung cameras, I was told that their analytics are run on the "raw" video stream before compression. I don't have any documentation, etc. to back this up, just what I was told...

Ok. I am playing with a setup where I am using Milestone Client to simultaneously view two streams from a single Axis M3006 with the same FOV, but different compression levels, one high, and one low. Setting the motion detection sensitivity the same on both streams, it makes the high compression trigger from wavy artifacting, while the low compression one is unaffected.

So far it seems, the compression difference has to be quite great for this to occur, as I believe you are suggesting. As I try to quantify the bounds and make it more rigorous, I would ask, is this a valid setup/test? If not, tell me what to add/change so I don't waste time by make something non-instructive, otherwise I'll have something shortly.

I'd recommend default compression vs lowest compression, since default is what most people use. Btw, we already did this here: Resolution vs Compression Tested.

The net/net is, if you lower the compression as much as you can the video looks modestly better and can increase one 'resolution' level, so a lowest compressed 720p video may look like a default 1080p video. But it's not as if default has tons of artifacts and low compression eliminates them. They are pretty close.

You look at our Axis analytics test today. Ostentsibly this is on the 'raw' / 'uncompressed' video but it still has lots of issues, more so than Axis cameras connected to an Avigilon Rialto (using compressed video). The basic issue is that variance in algorithm sophistication are much wider / more significant than compression / resolution differences.