New Era Of Video Analytics Using Deep Learning - Movidius

Movidius is introducing an embedded computer vision system that integrates the TensorFlow technology from Google which actually powers Google's Vision API. This is the state of the art on object detection/classification embedded on a thumb PC.

Rather than a camera just detecting motion and tracking blobs of pixels. The Phatom, using deep learning, can actually describe what is in the image with unprecedented accuracy.

Some mention of it was recently made in the comments of this thread

The bottom line is that deep learning algorithms surpass the performance of any other techniques. I wonder if this could represent a work around the ObjectVideo patents.

It's a very interesting concept.

There are a lot of machine-learning developments, and in Boston last year there was a job fair and conference focused on computer vision applications.

I think we may start to see more progress in security video analytics from the Google and Facebook projects that are tackling a slightly different problem, but in a way that can be leveraged in security.

I totally agree that deep learning will have a significant impact on video surveillance/analytics.

I am not an expert in deep learning but just for fun, last December I downloaded and installed Nvidia's "Digits" application

Not being an expert in Linux, it took me most of the day to install it and get it running... but once I did I created several folders on the hard drive, filled them with different categories of images: people, trucks etc. trained it for 2 hours or so, and then once trained, I downloaded additional images (not belonging to the original training set) and it was able to accurately categorize each image i.e. tell me if it was a person, truck etc.... I was able to do this without any programming skill required, or any real understanding of how deep learning works so I encourage others on the forum to give it a go.

The good thing is I do not think this basic application of deep learning is blocked by patents (I could be wrong though.).

From memory, the problem was that even though the images had been scaled down to about 200 by 200, it still seemed to take the best part of a second to analyse each image. So for an NVR with 60 camera streams at 5 fps, this might be too much. Keep in mind it was using an expensive power hungry, heat generating GPU to do the analysis, which makes things problematic for your run-of-the-mill NVR hardware. There are some cloud based GPU clusters available, and I suspect some of the cloud based analytics services are using these in combination with DL (just a guess).

So I am thinking that for simple tripwire detection and motion detetion, where no scene analysis is required, perhaps the traditional approach (as covered by the OV trip wire patent for example) may be easier to implement at present, as it doesn't require GPU style dedicated hardware and can be done on a traditional CPU. But with processors like the one referenced in your link above, that could be embedded in the camera directly, one would think that would change.

Well, for one day it is really impressive you got a deep learning test environment going!

Speaking of being realistic about deep learning, like processing time, training, etc., I ran into this paper abstract, more research oriented, which alludes (towards the end) to the current short comings that have to be solved in deep learning.

The paper also discusses that we still need the other type of analytics as a complement/potentiator for deep learning. BTW, the author is considered one of the main researchers in computer vision.

Nonetheless, I think we will seeing many products coming out, not just in surveillance, trying to commercialize deep learning.

I downloaded and installed Nvidia's "Digits" application...

From memory, the problem was that even though the images had been scaled down to about 200 by 200, it still seemed to take the best part of a second to analyse each image.

1, thanks for sharing this. We just did a related post: Hikvision Nvidia Supercomputing Partnership

From what we can tell, the focus is primarily doing the deep learning on dedicated machines (whether servers or cloud that run Nvidia' GPUs). I guess some 'super' NVR could be built in the future that embeds multiple GPUs but they do not seem to be mainstream.

Hikvision, e.g., does this in the cloud (private cloud) in China.

From memory, the problem was that even though the images had been scaled down to about 200 by 200, it still seemed to take the best part of a second to analyse each image.

Not being an expert, I could easily have got that wrong...

My understanding is that there are two parts to implementing a deep learning solution:

1) Training. Requires large amounts of data and processing power.

2) Inference. i.e. applying an image to the input to get the output. This requires a moderate amount of processing power compared to (1).

Now, with that in mind, it is interesting to look at Nvidia's new M4 GPU.

Nvidia Tesla M40 and M4

From the video (at 2:36) NVIDIA CEO Jen-Hsun Huang states:

"the Tesla M40 is intended for training"

"The Tesla M4 is intended for inferencing or production of that network"

He claims the M4 is capable of "20 images per second per watt". Wow!

At 50 watts and the ability to fit in a 1u machine, this seems like a prime candidate to include with a standalone VMS so I think we'll see this happen, (although it is hard to find anything regarding pricing on the web).

As for the DGX-1's I think they are primarily meant for research and training rather than inference but they could probably be used for both. GPU clusters could be used for both.

Thanks for the article. Until I read this I didn't think much about an announcement we did. Here is the product:

FLIR BOSON - Techcrunch: "FLIR and Movidius create the smartest thermal camera out there"

Greg from FLIR

Definitely this is ahead of the curve. Is there an SDK for third party development?

It is really a good example of what I expect to see in a future. The breakthrough in the development of the artificial neural networks makes the computer vision area is much more "user friendly". You won't need a huge team of experienced pros to recognize something. Just take the ready to use artificial neural network and use it in your application. We will see the same situation as with the video surveillance market now.

I'm really curious to see where this all goes. 8 years ago, we were all in a tizzy about video analytics, and how the world would never be the same. A lot of new companies popped up, and a lot of investors threw money at them. At the end of the day, it really fizzled, and never delivered on the promises. I'm wondering if Machine Learning will be more of the same.

Let's assume that surveillance systems based on the artificial neural networks can identify objects and people in real time. Plus the same ANNs are used for the cyber surveillance to identify suspicious activities (like surfing for the mass massacres info and an instruction for AR-15) and for digital tracking.

  1. Even if it works reliably in most cases it will be difficult to process timely all cases.
  2. Privacy concerns - not sure that public will agree with the existence of such kind of "Deep Surveillance".
  3. There will always be ways around (encrypted communications, sabotage etc.)
  4. It will need a lot of resources to develop and deploy it.
  5. and so on.

Machine Learning is vital and proposes many interesting opportunities to increase security but I do not think that it is the quick answer to the current needs.

Thus my interest level goes way down. Hoping that others jump on the deep learning bandwagon. We have some test product from another company making use of deep learning inbound for test purposes, will report on what we see. I don't see this particular product as useful to our customers, but the deep learning functionality is fascinating.