Healthy Skepticism for Deep Learning Is Prudent

By John Honovich, Published Jul 26, 2017, 09:14am EDT

The hype for deep learning in video surveillance is accelerating.

Between the race to the bottom and dearth of a 'next big thing', certainly pent up demand exists.

But now is the time for a healthy skepticism about when and what deep learning can do. In this note, we examine the hype and real issues that professionals need to consider.

'Revolution' ****

**** *****,*** ******** **** ********** ** "* ********** In ***** *********", *********: 

**** ******** ********** **** ***** **** can ***** ** ******* **.* ******* accuracy ** ******* *****, ***** ************ ******* would ******** ** ******* 95 *******.

************* **** ***** ***** are, **** ***** ***** is *** **** *** the **.* ******* ** being ******** *** *** mentioned.

*** **.*% ****** **********.

Irony ******

*** ***** ** *** the ***** ** *** opening ******** ** *** deep ******** *****:

*** ***** ********* ******** is ***** ******* *** the ************ ****** ** sustained ** * ****** ofpreviously ********* *** **** and delivering too little in the past. [emphasis added]

*** ****** "***** *** cannot ******** *** **** are ********* ** ****** it" ** ********* *********** here.


**** ** ****** ***, 'thought *******' ******** ******* *********** ************ *** ****** *** coming *** **** *** "move ** ****** **** happen ********** *******". ** assured:

** **** ** ** an ******** ********* ***** less **** * **** from *** 

** ********, ** *** both ******** ***** *** *****. We *** ** ** 'entirely ********* *****' **** robots **** ****. *************, for *******, ************** **** ****** * ****** social ***** ****.

Timing ******

** ** **** ** see that ********** *** ************ will ****** ********** ** different ****** **** ******. The ******* **** ********** did *** *** ***** ******.

*** ****** ****** ** predictions.

***** *** / ***** by ***** ****** ********. Bad ********* *** ****. Frustration *** ******** *****.

Deep ******** **********

**** *** * *** important ******* *** * healthy ********** ** **** learning ** *********:

  • ******* ************: **** ******** **** ** **** in ******* ***** ************ applications. ***** **** **** really **** *** ***** ones **** ****** ** still *******. ** ** almost ******* **** *** performance **** **** ************* ****** applications. "**** ********" ** poised ** ** ******** in ********* *** ***** to **********.
  • ***** *********** ********: ***** **** ******** fundamentally ****** **** ********, its *********** **** ** ************* impacted ** **** ** is *** ** ********** **. *** '**** world' ** ******* *** as ***** ************ *** deployed ** ********* *******, regions *** ***** (*.*., people **** *********, ****** differently), ** ** ******* that *********** ******** **** be *****. **** ** not ******* ** *** easily, ******* ** ******* the **** ******** ************ can ** **-******* ** handle ***** *********.
  • ******* ****** ***********: ****** ******* ***** most ******** **** ******* from * *** *********, the ******** *** *********** of **** ******** ***** surveillance ******* ** ***** to **** ************* ***** on *** *********** ***** (which ** ** ***** supply *******), ********* ********* and **** ********* ** videos *** ******* ****** ** train **. **** ** likely ** ****** ** significant *********** ********* ****** vendors **** ** *** of **** *** '**** Learning' ** *** '***'.
  • ******** ***** ******: ****** ***********,*** **** ******** ******** **** significant ******** ***** ******.

Early *******

**** *** *** *****, it ** ********* ** remind ********* **** **** is ***** ***** **** for **** ******** ** video ************. 

* *** ** **** is ***** ***** ** still ********* *****. *** example, ** *** ****** ISC **** **** *****, one **** ******** ******** was ******* *** *** accuracy / *********** ** ***** system. ***** * ****** or ** ** ******** it, *** ***** ****** that *** **** ****** kept ** ******* **. It ****** ***, **** were ****** ******* ** a **** ** *** off ******. ***** ***** ****** will **** ********** **** one *** *** * lot ** **** ******** is **** *****.

Healthy **********

**** **** ********, ** will **** * ****** of ***** *** **** learning ** ****** (**** technically *** ******** ****) for ** ** **** a *** ********** ** video ************. ****** ** early, ********** ***** *** ill-will ******* **** ***** analytic ******** ** *** 2000s, ***** ********* ******** deep ********'* ********** *** the ********** ********* ****** have ** **.

Comments (11)

How do you see intellectual property coming into play regarding the growth and adoption of 'deep learning' in surveillance/security applications??

i.e. can an algorithm me patented?

Intellectual property is a good question.

For example, I never though 'metadata' or the concept of a 'tripwire' could be patented but then Object Video did it.

I do not know the answer but we will ask around for what different players are claiming (e.g., is deep learning video surveillance 'covered' by AV/OV's patents?). @Karas, see what you find.

i.e. can an algorithm me patented?

Yes, examples of this would be video compression algorithms (e.g.: H.265 Licensing Fees Examined / CEO Interview), Google's PageRank algorithm, etc. Though they are generally not defined as "algorithms" specifically.

Many of the analytics patents revolve around rules (tripwires, etc.) applied to classified objects. Deep Learning in analytics, at least so far, has been largely focused on the classifier portion, doing a better job at identifying what is "human" (vehicle, boat, dog, etc) in the video. There are anomaly detection applications as well, but those are not necessarily doing anything unique (you cannot patent detecting statistical rarity).

You could patent a specific Deep Learning network implementation, but it would not be a very strong patent. This is very roughly equivalent to patenting a smart phone navigation app. You could do it, but there are so many ways to build a navigation app that someone could build a competing app without infringing your patent at all. Labelled datasets used to train DNN's are not patentable, but there are some cases emerging of them being licensed for commercial use (many are available for non-commercial/research purposes).

You could create a very rich, well-labelled dataset and attempt to license that to various companies for training purposes as a source of revenue, but you could not force someone to use it (or pay you a license fee). Also, if someone used the exact same images, but did the tagging on their own, you could not collect a license fee (this is assuming the images themselves are not owned/copyrighted by you).

Deep Learning is being applied in so many areas to so many products that it is unlikely that blocking patents will come about specifically related to deep learning that impact the security industry.

Where you could see more patent activity in security is in net-new products/applications built on top of capabilities that deep learning makes possible. For example, you might be able to patent a new multi-camera suspect auto-tracking feature in a VMS that is primarily possible due to a DNN making detecting a person, or spotting a consistent clothing pattern possible.

that is a great comment... specifically:

"you cannot patent detecting statistical rarity"

in my limited understanding of 'deep learning' - and clue me in if I am wrong - it appears to me as if what we are actually talking about it pretty simple 'if/then' statement based computations.

What has accelerated the 'deep learning' phenomenon is a dramatic increase in computational speed and power.

The faster these if/then statements can be parsed, the more operational applications will benefit from using 'deep learning' to 'know' things that were not knowable without the computational power required to parse this data.

Am I wrong?


The if/then example is not a bad analogy, but it is also not a perfect analogy. I am not sure if you are still talking about patents/patentability of DNN-based applications, which would factor into how far you want to take the if/then analogy.

Deep Learning is not AI (artificial intelligence), it is less about knowing things that were not knowable, and more about pattern detection/recognition, particularly in cases where humans would not do as well or would tire from the task.

You can train a DNN to detect early signs of cancer, but that does not mean it can learn or know how to implement an effective cancer treatment protocol. 

Firstly - I'm not an attorney and the following is not legal advice....

There was a landmark Supreme Court ruling in 2014 (Alice vs. CLS Bank) that ruled you can not patent just an algorithm, or just software. It must be tied to hardware. This is naturally a big deal, and sent shock waves through many software heavy industries. 

Citing examples of pure software patents, like encoding, that predate this ruling to establish the validity of software patents is not relevant. 

Therefore, my understanding is that you can not patent Deep Learning algorithms, but rather they must be tied to hardware like processors and cameras in a way that is fundamental to the invention.


"I am not sure if you are still talking about patents/patentability of DNN-based applications, which would factor into how far you want to take the if/then analogy."

no, actually I am in agreement - I don't think that detecting statistical rarity is patentable either.  Any 'large-enough' data set can be analyzed/parsed to detect statistical rarity.  It's just simple math.

What peaks the interest of customers (i think) is the ability of some 'thing' that can analyze these data sets fast enough to create a practical use case for buying that thing.

As John points out in the piece above, the marketers in our industry seem to be following the same path they took when VCA first emerged - dropping that buzz word into everything they produce.

Increased computational speed accompanied by lowering of traditional power output on a chip will definitely be able to achieve some really cool things.... but if deep learning is over-hyped (as VCA was) as a panacea rather than tying the new capabilities to actually solving existing security problems, imo deep learning has the potential to shoot itself in the foot - like VCA companies originally did before they tied the capabilities to specific security needs.

It is a complex topic, and we are at the early stages.

Where many video analytics products struggled (or outright failed) was in doing accurate object classification. Many products tried various end-around approaches to get to a reliable method of separating human from non-human (pixel blob sizes, blob movement tracking, onboard tilt-sensors coupled with mounting height data to determine rough horizon,  etc.)

If DNN-based approaches can output reliable data on what an object is, then other things can be built on top of that in a simpler/easier/more reliable manner.

In some ways, DNN could be the embedded linux of analytics. Before linux, it was costly to build devices with an embedded operating system, as options were limited, complex, and expensive. Linux solved a stage 1 type of problem for embedded devices, letting companies concentrate on the device/functionality part more than the OS (and no point getting into side-tracks here of cyber security or other things embedded device companies still get wrong).

Applying DNN to analytics could get the industry over a major hurdle, but you still will need strong developers (and marketing/sales) for the downstream stuff that comes after you have figured out what an object in an image is.


I think DNN advesarial attacks should be mentioned as well as one of concerns for security industry.

Its got enough attention to spin its own area of research in DNN community. .


Currently, this is more of a theoretical risk than an actual one, particularly for security applications. This could change if DNN-based analytics become far more mainstream and accessible, as explained below.

I can't/won't comment on the "fool a Tesla" examples because I do not know enough about how autonomous vehicle analytics work at a low level.

For security applications, the adversarial attack would typically be some form of digital camouflage, so that a given object (most likely a person) could evade detection by the system. This could be to make the human register as "cat", or to simply evade detection altogether (probably a better scenario to be invisible vs. just mis-categorized).

There are a few things that make it difficult/less rewarding to do an adversarial attack for surveillance:

  1. Security analytics are generally not open source/downloadable/easily accessed for hacking around on. Manufacturers develop their algorithms in-house, with closed teams. The channel control within security, where most of these products cannot be easily bought online also makes it harder to attackers to get access to a system for R&D/testing purposes. Not saying it is impossible, but it also less likely a casual hacker has the ability to test against many of these products.

  2. Better analytics take several samples of an object before actually giving it a classification. Security systems have the benefit of generally only caring about moving objects. Also, actually analyzing an image is computationally intensive, so it helps to pre-filter things out of the image and only analyze portions that meet some base criteria. This is generally a process of: look for blob motion->analyze for rough size/shape/aspect ratio criteria->analyze for "predictable motion"->then finally run the blob through the DNN for classification. This means you may need to do more than just inject singular static images into the system to affect it.

  3. Every system behaves differently, so developing an attack for one would likely not work on another, unless they were trained with the same dataset (possible, and likely to be slightly more common in the future if other trends of selling/licensing labeled data sets progress), but overall low odds.

Therefore, to fool a halfway decent security-based DNN analytics system, you would need to construct some form of visible camo that provided continuous evasion, from multiple angles and while in motion (which would include lighting variations in most scenes). If the system could properly classify the disguised person for even 1 second, it would generate an alarm.

Froom an external perspective, it is very hard to tell which, if any, analytics are in use (I guess this is also a slight detraction for Avigilon and their prominently-labelled cameras). Carrying out a successful attack with this method would require some degree of inside knowledge and a very highly motivated attacker. In general these are the kinds of applications for which video analytics, of the sort on the market today, are probably not appropriate for the level of risk, those situations are usually the sort that use hyper-active shoot-first-ask-questions-later armed guards.

Yes this is a theoretical exercise, how important this problem would become for security industry has to be seen.

I would guess majority of industry players are using pre-trained existing models such as ImageNet and changing input and output layer for their needs. Would love to see anyone creating and training their own models.

As a rule of thumb: Models that are easy to optimize are easy to perturb.  Linear models lack the capacity to resist adversarial perturbation; Models trained to model the input distribution are not resistant to adversarial examples. Ensembles are not resistant to adversarial examples.

General characteristic of model such as linearity of loss function and limited input features that's what is being used to optimise adversarial attack, therefore argument of security by obscurity not working for this case. (Here the paper on this issue:

We have ourselves experienced such issues with pre-trained, well-known public model generating false human detections in a loading dock from some pipes on a floor.

To be fair, GANs (generative adversarial networks) is an active area of research due to being a nice training technique and not because of attack vector.

Read this IPVM report for free.

This article is part of IPVM's 6,817 reports, 914 tests and is only available to members. To get a one-time preview of our work, enter your work email to access the full article.

Already a member? Login here | Join now
Loading Related Reports