Subscriber Discussion

What Are The Most Important Problems That We Can Now Solve With Deep Learning? Part 1.

Avatar
Murat Altu
Apr 19, 2018
AxxonSoft

In previous discussion, I explained the reasons behind the incredible progress in video-analysis: Deep Learning hardware acceleration. Imagine now that we could have video-analytics working better than humans. What kind of features would be available to us? What would be the most important problems we would want to solve?

Some years ago, I had an interesting public discussion with John in one of the threads here. I had promoted forensic searching as the most important whilst John promoted real-time analytics. His argument was strong: real-time analytics can potentially prevent crime, forensic search - only investigate it. My argument came from a business perspective: real-time analytics would only serve systems that need to respond to situations. Meanwhile, investigative capabilities would be required for almost every system. Put another way, forensic search is necessary in every system; real-time analytics only in some.

So, let’s see what we can improve in both fields: real-time alerts and forensic search. Let’s start with forensic search. Real-time alerts will be the topic of the next discussion.

Forensic search

As far as forensic search was concerned, it was easy to sell unique features: face search, LPR search, color search, size, direction, dwell time, number of objects search and so on. This is what you never expect from typical VMS or NVR. So, this is an area where we can win. All great features to impress, but when it came to real-world installation, there was one big problem. To generate metadata to enable forensic search, we needed 50 times more resources on the server side. You can connect 1,000 cameras per server without analytics, but just 20 if you want to decompress and analyze all streams. So, not much business.

Hardware acceleration is the key

As I explained in previous discussion, the deep-learning booster was hardware acceleration. And not only on server side. Camera processors in every camera have become very powerful! One of the such processors HiSilicon chip 3516AV200 has two A7@800MHz cores plus one A17@1.25GHz core – so this chip can run powerful analytics!

What does this mean for forensic search adoption? Metadata can be generated on the camera side with zero additional CPU utilization on the server side. This means you can connect 1,000 cameras to one server – at no additional cost for the inclusion of these highly valuable features. Now, that is business!

All these years, we had no progress in user experience with archive search. Till now, just as 20 years ago, users could only choose camera number, data/time and playback. That was all. But because of hardware acceleration in video analysis on the camera side, users can now access many more tools to increase the efficiency of their work. Like TimeCompressor and MomentQuest from AxxonSoft.

To enable that features at zero CPU utilization on server side we at AxxonSoft have Bosch IVA analytics integrated, as well as Axis ACAP module and Dahua DHOP module for open platform cameras. We have also integrated HikVision’s metadata from thermal cameras, this approach will be implemented in the new 5 and 7 series, which will be released this year. What this ultimately means is that the solution is already there: a powerful forensic search at zero cost! So, finally, we have a dramatically improved user experience when it comes to searching archives. And in one of our next discussions, I’ll demonstrate how effective this technology is in different scenarios, from policing through to retail.

(1)
(8)
JH
John Honovich
Apr 19, 2018
IPVM

My argument came from a business perspective: real-time analytics would only serve systems that need to respond to situations. Meanwhile, investigative capabilities would be required for almost every system. Put another way, forensic search is necessary in every system; real-time analytics only in some.

The counter from a business perspective: real-time analytics generate far more value than investigation capabilities.

Contrast:

We stopped the [terrorist|robber|murder] using our real-time analytics.

We figured out who the [terrorist|robber|murder] was using our investigation capabilities.

It is good to figure things out after the fact but it is far less valuable than stopping things from occurring.

This is why overwhelmingly security end-users value real-time alerting over search. Could you imagine an intrusion system / burglar alarm provider that says "We can't let you know if someone is breaking in but we're great at letting you know who did it afterwards"? This is a real business issue.

The counter to my counter is that it's a lot harder to do real-time analytics accurately so to your point of what 'we can now solve', investigation is more feasible but we should not delude ourselves that most security end-users strongly preferring stopping incidents, not figuring out who did it afterward.

(4)
(1)
Avatar
Mark McRae
Apr 19, 2018
Inaxsys Security Systems

I think that John is correct in that there is far more business opportunity in real-time analytics than in forensic search. We are requested for real-time facial recognition (black list = catch the bad guy; white list = identify the good guy) on a weekly basis. For facial recognition, real-time identification is a more powerful business driver than forensic search (imagine getting an SMS message when your best customer walks through the door...)

The same goes for LPR- being able to open a gate using live LPR (integrated with access control) is more of a business driver than finding a license plate of someone who drove by yesterday.

Massively reducing bandwidth and storage space because of neural network live scene analytics is a huge $$ benefit over finding the blue car pickup truck that drove by yesterday.

I think a huge business potential in post-recording forensic search is with retail analytics (people counting, heat maps, POS integration, etc...)

(2)
JH
John Honovich
Apr 19, 2018
IPVM

We are requested for real-time facial recognition (black list = catch the bad guy; white list = identify the good guy) on a weekly basis.

Mark, what do you do in those cases? 

Avatar
Mark McRae
Apr 20, 2018
Inaxsys Security Systems

Until last week, nothing.

We have an offering of post-recording facial recognition and have been selling it successfully for the last year. We have been promising an "eventual" live facial recognition offering, knowing that we had a solution coming. At ISC West, we demoed our initial offering of live facial recognition (white list/black list with a limited list of faces) and customer reception was very positive (it is very impressive to have my face recognized from 20 feet away in a crowded booth).

It is not fully completed yet but should a "packaged" offering ready to sell by the end of June (I hope).

The "white list" feature is generating WAY more demand than the "black list" feature and we have multiple meetings set up (once the full version is complete) with national retail chain customers who would like to rapidly identify returning VIP customers as they enter the store. Marketing has a lot more money to spend than Security in these retailers.

(2)
(4)
JH
John Honovich
Apr 20, 2018
IPVM

Mark, thanks!

Question - with the 'post-recording facial recognition', do you charge extra for that? If so, roughly how much? I am curious how willing to pay people are for the feature?

Also, I'd be interested to hear later this year how the whitelist / VIP usage goes in 2 areas: (1) How often and how much does the user care if the system does not notify them of someone on the whitelist? (2) How often and how much does the user care if the system misidentifies someone not on the whitelist as being on the whitelist?

Avatar
Mark McRae
Apr 20, 2018
Inaxsys Security Systems

MSRP for facial recognition post-analytics is $350 per camera (this includes the actual camera recording license and the facial analytics add-on). This can be added to any camera keeping in mind that it has a reasonably big impact on the processor of the recorder (about 5-7% of the processing power of an i7 7th generation with 16 GB of RAM, per camera). The hardware calculation offloading (graphics cards, Movidius, etc...) that M. Altuev is writing about should have a major impact on a small recording machine's ability to do these complex calculations.

To your second question- any false identification would be a problem, even on the White List. It would be embarrassing for the sales manager of a BMW dealership to greet you with "Welcome back, Mr. Honovitch!" when you are Mr. Jones and have never set foot in the place...

For this to work, the facial analytics engine will need to be very accurate, practically flawless. the only way to get this accuracy will be to control the acquisition of the image of the person (camera at a precise height to improve accuracy; camera facing the subject  and seeing the subject's face and eyes clearly, etc...). Luckily, when doing White List identification, it is not very difficult to correctly place the camera and the subjects will not be trying to hide their faces.

JH
John Honovich
Apr 20, 2018
IPVM

MSRP for facial recognition post-analytics is $350 per camera (this includes the actual camera recording license and the facial analytics add-on).

Mark, what is the delta or just cost for facial analytics? Ergo, if a user just wants camera recording, how much less is that?

Luckily, when doing White List identification, it is not very difficult to correctly place the camera and the subjects will not be trying to hide their faces.

One challenge I've found is wide, open entrances. For stores or areas with wide openings, the person can come from different angles / directions and can make it harder.

Avatar
Mark McRae
Apr 20, 2018
Inaxsys Security Systems

MSRP for a license without post-recording analytics is $200. So the addition of post-recording facial analytics is $150 per camera.

For the placement of cameras to optimize success: if the view is uncontrolled (people coming in from all sorts of angles, very wide field of view, people wearing sunglasses, people looking down at their phones, etc...), then the results will not be as good. Each site will need to be evaluated and customer expectations will need to be managed. That being said, current result on facial recognition even with partial facial captures are surprising in their accuracy.

We'll know better within the next few months as tests of the deep analytics engine progress. I am very enthusiastic on the results so far

(2)
Avatar
Mark McRae
Apr 20, 2018
Inaxsys Security Systems

This is an example of the results of the real-time facial analytics at the show. The subject tested the engine from multiple angles and with sunglasses and the similarity was over 90% from a static face-on image without sunglasses. The result surprised all of us at the show.

(1)
JH
John Honovich
Apr 20, 2018
IPVM

Mark, thanks for sharing the example, that helps!

Keep in mind there is a tradeoff of such a high percentage match - false positives. If it matches at 94% to a sharp angle, partially obscured face with sunglasses, it may be equally likely to match with other people.

And if you are in a public place, there are going to be other guys with sunglasses and facial hair, etc. If it never matches those people to Steve, then you are in good shape. Otherwise, the risk is high false alerts.

Avatar
Mark McRae
Apr 20, 2018
Inaxsys Security Systems

We'll be doing serious stress testing on this between now and the summer. I have a busy shopping centre that has agreed to be the guinea pig for our tests so we'll have a very good idea of the limitations/opportunities.

One thing we do not want to do is over-promise and under-deliver. Many customers we've spoken to have had a disappointing experience with facial recognition that has not worked as promised in the past.

(1)
Avatar
Brian Karas
Apr 19, 2018
Pelican Zero

I think we may still be a ways off from deep learning solving "important" problems. We are more at the stage of deep learning being able to deliver on the basic promises and expectations of video analytics set 10 years ago (and not even all of those, for example "object left behind" detection).

The problem in current deep learning implementations is that the systems still lack context, which is critical for it to solve the kinds of problems that are truly challenging.

Deep learning, as used for surveillance video, is effectively a high-speed, highly efficient pattern matcher. The problem, is that when it lacks context, it searches for patterns in places it shouldn't, and eventually will "see" a pattern where it does not truly exist.

As we try to have deep learning systems find more patterns, and get more nuanced in their detections, error rates have a tendency to creep up. As an example, the Dahua AI "seeing" a young girl in the Axis display shows how this can be a problem. Detecting and analyzing faces is more specific than detecting and analyzing people. While the Dahua system (or any other) would be less likely to falsely detect a full-body appearance of a person, as we train the system to detect "sub sets" of a person (faces, limbs, etc.), chances of error go up dramatically as the deep learning system is searching for a wider variety of patterns, it is more likely to falsely identify them in everyday scenes.

There are many examples and publications already on adversarial AI examples, where researches can manipulate images and objects to be reliably falsely classified. In other examples, images of "noise" were classified as various objects.

We need to extend deep learning in surveillance to understand the context of a scene, to have a better understanding of what objects might be likely/unlikely/impossible to exist in a given scene before we can begin to solve some of the more challenging problems in surveillance.

To be clear, I am impressed by the advancements of deep learning, I think that on average video analytics are much better today than they were 10 years ago, but not to the point that we should be inviting new challenges before we have fully addressed the previous/existing ones.

(1)
(3)
U
Undisclosed #1
Apr 19, 2018
IPVMU Certified

We need to extend deep learning in surveillance to understand the context of a scene, to have a better understanding of what objects might be likely/unlikely/impossible to exist in a given scene before we can begin to solve some of the more challenging problems in surveillance.

Disagree.  Having “a better understanding of what objects might be likely/unlikely/impossible to exist in a given scene” as a prerequisite for computation is how rule-based systems operate, not ones based on deep learning.

Avatar
Brian Karas
Apr 19, 2018
Pelican Zero

How so? How does a system understanding the environment in which it is operating equate to rules-based systems?

 

UE
Undisclosed End User #2
Apr 19, 2018

Where AI kicks in instead of rule-based systems and determining humans are the main threat for moder earth.

(1)
U
Undisclosed #1
Apr 19, 2018
IPVMU Certified

The AI will be powerless to mod Earthlings, due to a government mandated “Three Laws” embedded BIOS chip.

(1)
(1)
U
Undisclosed #1
Apr 19, 2018
IPVMU Certified

How does a system understanding the environment in which it is operating equate to rules-based systems?

Perhaps I’m misunderstanding you.  Let me ask then just:

By what method exactly would you propose to enhance the specific knowledge of what is “likely/unlikely/impossible to exist in a given scene” in current systems?

 

Avatar
Brian Karas
Apr 19, 2018
Pelican Zero

There are several ways you can use contextual information in/about a scene to improve object classification in DNN's. It has been fairly widely discussed, this paper is one example that is pretty good in terms of understandability:

segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection

Google for something like "improving deep learning accuracy with context" and should be able to find some other examples.

In simplistic terms, it is a matter of looking at the area surrounding an object and asking "should I expect this object here". The surrounding area could be broad "I appear to be installed in an office lobby, I am unlikely to observe vehicles here", or more narrow "Should I really expect to be seeing the face of a young girl so far above the artificial horizon I have established and so much larger than the average person/face I have observed in this area?"

Like almost any other aspect of deep learning, incorporating contextual data takes more processing power, and more development effort., which is partially why I believe it will still be a few more years before we start to see contextual deep learning classification systems in wide use.

 

(1)
U
Undisclosed #1
Apr 21, 2018
IPVMU Certified

Thanks, good info.

I misread your ‘context’ to be something more than the just the entire visual information in the scene.  

AB
Asaf Birenzvieg
Apr 29, 2018
viisights

Hi John,

viisights (www.viisights.com) mission and vision is exactly what you have describe. understanding of behavior and context to surveillance video streams.

Avatar
Brian Karas
Apr 19, 2018
Pelican Zero

This is a somewhat long, but easily read, perspective on why deep learning is not yet in a state where it can solve "important problems" (in a general sense):

 

Artificial Intelligence — The Revolution Hasn’t Happened Yet

UI
Undisclosed Integrator #3
Apr 20, 2018

What Are The Most Important Problems That We Can Now Solve With Deep Learning?

Are you a manufacturer with nothing new/functional to advertise at ISC?  Problem solved.  You too can soft launch some partially functional Deep Learning analytic while simultaneously promising magical results.

(2)
(1)
UM
Undisclosed Manufacturer #4
Apr 20, 2018

Fascinating stuff! 

The one thing that continues to boggle my thoughts in this whole AI/Deep learning discussion is the ability to anticipate. 

Is ones ability take action prior to an event actually occurring a solely human trait or will silicon one day be able to anticipate? 

A ball just rolled out into the street from between two parked cars. I don't see any kids but I know there's a good chance there could be and the stakes of an accident are life threatening. I think I'll slow down considerably to reduce or eliminate the chance of an accident.

It is a humans ability to anticipate that can mitigate risk/threats in real time. Will AI ever get there? 

AB
Asaf Birenzvieg
Apr 29, 2018
viisights

"the deep-learning booster was hardware acceleration. And not only on server side. Camera processors in every camera have become very powerful! One of the such processors HiSilicon chip 3516AV200 has two A7@800MHz cores plus one A17@1.25GHzcore – so this chip can run powerful analytics!"

not sure i agree:

First, that the booster for deep-learning is HW acceleration - we actually only in the beginning where companies like Nvidia and Intel invest massive resources for better utilization of neural network technology on their chips.

to my opining and from my experience the booster for deep learning is more demand for actionable insights , meaning more advanced analytics , for example: https://www.viisights.com/solutions/ 

Second, what kind of advanced deep learning analytics can be done on the cameras listed and what is their cost comparing to chip commodity ones? although the idea of running advanced analytics at the edge is magical i don't see it wide spread in the next couple of years in deployment that include hundreds and thousand of cameras. i forecast that you will be able to process ,in two years from now, 24x7 stream with advanced analytics (i.e. behavioral recognition) on a server side with a cost of $50-100/stream and then for 1000 camera deployment you have a business.

New discussion

Ask questions and get answers to your physical security questions from IPVM team members and fellow subscribers.

Newest discussions