Thanks John for direction and it is much helpful. In recent years, seems M&A events (eg VideoIQ, OV..) are hot but no technical jump in this field. We ever installed VA for smoke detection but failed to pass customers' test process because too many preconditions or limitations in the field.
Would not describe M&A activity as hot. VideoIQ was the 'best' or at least one of the best and it did not sell for much. The only really impressive deal (from a valuation perspective) was the OV patents, and that was because Avigilon sees money in enforcement / litigation of patents.
Is there merit in reviewing Analtyics providers in accordance with their Analytics specialisation? -
e.g. Search - Tag & Trace (including processing lots of recorded and /or live video) vs. Scenario based, Events & Alarms Analytics (e.g. virtual trip wire) vs. Business Intelligence (people & vehicle counting, heat maps)
Note: the scenes are very uncrowded. The video is also 3 years old - be keen to know if this is still considered 'leading edge'or has it become mainstream?
Are there people out there having success with 'tag and track' searches in crowded scenes such as train stations, airports, large busy shopping malls? Does it work arcross live and recorded video - using camera navigation/sequence logic (i.e. tracking a suspect when they leave one camera field of view and pick them up again from another adjacent cameras.)
Are there people out there having success with 'tag and track' searches in crowded scenes such as train stations, airports, large busy shopping malls?
No, they are not.
There are several problems with getting something like this to work in real life. A few key things that make it hard to do this in any manner that an actual customer would be happy with:
1) White-balance and color saturation across multiple cameras. As you start to cover the kinds of scenes you mentioned (train stations, malls, city areas), you run a very high probability of having white balance and lighting affect the overall appearance of the person. The "red jacket" the suspect is wearing might be bright red in one scene, dark burgundy in the next and so forth. The color difference across the cameras may also not be consistently predictable, such as any camera that gets natural light instead of 100% indoor even lighting. This makes it very hard to do even basic cross-camera tracking.
2) Object detail. Using facial recognition is completely out of the question, no system has the pixel density to use face-rec, and that only works from the front anyway. A normal camera system won't have enough pixel density across the entire image for things like this to work well. Customers are reluctant to deploy cameras that would have the detail required to really make this feasible (assuming it existed in the first place, which it does not).
3) Lack of object classification. The majority of the anlytics systems out there do not have any true object classifier. The majority of their categorization is based on simple scaled size and shape metrics. Because of this they don't really know when they are looking at a person, or just a large upright object. As a subset of this they also don't know when they are looking at a full person, or just an upper torso of a person (like you'd often have in a crowded scene). This makes it hard to define what the object looks like. Can we look for a guy in a red jacket and yellow pants, or are we only going to see the red jacket? What if we start with the red jacket and in a later image see that he has yellow pants? Can the algorithm recnogize that it is the same person, so that we don't start tracking the wrong person?
4) General lack of diversity in a large crowd. This is especially true in colder climates where a lot of people wear outer jackets that are frequently blue, black, or brown (or minor shades thereof). Unless the person being tracked has some VERY unique feature, there is a high probability as you look across cameras that you see additional people with a similar-enough appearance to be confusing. This is amplified by point #1 above.
5) Shifting object appearance. This kind of goes with #3 above. Camera FOVs tend to vary greatly, so the person doesn't have a consistent profile appearance. When the system has trouble even knowing positively that the object is a person (or not), it's even more difficult when you a clear fully-body profile shot in one view, and then a steep overhead shot in another view.
Unfortunately the large multi-camera sites that would probably most benefit from a function like this are exactly the kinds of scenarios that are guaranteed to make it impractical to implement.
You don't hear much about this application in mainstream analytics because it's not practical to implement and deliver reliably. It's not worth the frustration and support costs that come with it when there are more realistic analytics applications.
Though, considering the continual increase in the adoption of ever greater resolution cameras, as well as the plunging price of a pixel, we need look at these claims in earnest every so often, lest we miss true advances. In addition to the hardware capabilities, software is getting progressively smarter at using less data (sub-pixeling) to accomplish more than before.
For instance, 5 years ago, such videos would likely have to be "staged" like a movie. IMHO, today they could do similar things by controlling the environment unrealistically and cherry picking the footage where the analytic does the "right" thing.
Still not something you would want to buy, but I think they are making real advances...
The cameras have increased in resolution, but that doesn't mean the analytics have.
Well, there are ample VCA-solutions that conducts the analytics fully server-side. For those solutions where they are not limited by the CPU in the camera, many can run video analytics on full resolution. It quickly becomes rather ineffective in most situations though, as the bandwidth and server requirements explode utilizing the full resolution (whitch naturally drives cost and copmlexity).
In the company that I worked for previously that developed high-end analytics (OPAX, no longer in business) we did see a range improvement with going higher resolution, but the improvement is not linear (and naturally amyway limited by illumination and weather, factors that get tougher the longer the distance). For most practical purposes, the full resolution is not needed in order to provide reliable detection when we're talking cross-line, moving in area, loitering etcetera...
Full resolution would be more important once we're looking into more advanced analytics that will measure more advanced behaviour or stronger multi-camera tracking of the objects (think something similar to biometrics, but measuring limb-length, walking pattern etcetera...).
The white balance issue is a good one. And something most people don't think about until you actually try to write the software to do this sort of thing. The 'color' of an object changes considerably under different lighting conditions.
Another major challenge is when two objects 'merge' then separate--it's a very difficult problem to determine if they crossed paths or came together and then went in opposite directions.
Also keep in mind there can be a huge difference in analytic performance when you feed a clip to a system with lots of horsepower and give it plenty of time to process the video vs when you have a real time requirement and limited processing power (often the case on a camera or a busy NVR). We did quite a bit of multi camera tracking work a few years ago and found it to be promising with unlimited processing power, less so in the context of marketable platforms.