Who's Using YOLO For Video Surveillance Analytics?

IPVM

YOLO (You only look once) has become well-known in the AI / deep learning field for providing fast object detection.

For a gentle introduction, see the main developer's TED talk:

And here is YOLO demo in action again from the main developer:

A few people have mentioned it positively in IPVM discussions.

And while YOLO is fast, that is relatively compared to SSD and RCNN, etc. However, for video surveillance hardware, and dealing with numerous video streams, it would still be very computationally intensive. There is 'Tiny YOLO' but that has significantly decreased accuracy.

So, question, anyone using or know of a video surveillance company planning to use YOLO in production? More broadly what models are companies using?

Reactions:

(2)

Undisclosed Integrator #1

Never heard of it...looks pretty good!

Reactions:

Dori Ribak

RBtec Perimeter Security Systems

There are also multiple versions of YOLO and spin offs of YOLO (darknet).

I believe that almost all of the companies use it in one way or another, it's way too good to try to build object recognition from scratch when the newer versions can run amazing analytics on a raspberry pi.

I assume most companies either build a second layer or modify it.

Reactions:

John Honovich

IPVM

In reply to Dori Ribak

spin offs of YOLO (darknet).

No, Darknet is developed by the same developer of YOLO and is used by YOLO as the underlying image classifier.

As for raspberry and YOLO, I would assume they are using Tiny YOLO not full YOLO, like this developer's post here, yes/no?

Reactions:

(1)

Dori Ribak

RBtec Perimeter Security Systems

Raspberry Pi + Deep Learning home security system

Yeah they are running tiny but it's a step that shows where this is going, micro computers i assume can run the full version on low fps.

Also I've saw a way to network them.

Reactions:

John Honovich

IPVM

In reply to Dori Ribak

Also I've saw a way to network them.

No doubt you can, but from your reference:

Raspberry Pi camera module or some USB webcam that works with the Pi
A dedicated computer with an Nvidia GPU (I'll be using a GTX 970) running Debian

That's the challenge. Dedicating a computer for each individual video surveillance stream is a big commitment. And, yes, you can use 'bigger' GPUs, e.g., YOLO v3 paper uses a Titan X, but that's still expensive.

One of the things I wonder is whether most surveillance users even need or want the full YOLO covering the complete COCO dataset.

Reactions:

John Moss

I have a team at Lenel S2 that is pretty far along with YOLO experimentation on both server and client side. No question that it is computationally intensive and one has to rely on GPU support to make that possible. Also, the COCO dataset is not exactly what we want in physical security, so a new library has to be built from raw images. That’s the most computationally intensive part of the whole process; it takes a number of days even on our highly provisioned servers.

While it’s pretty cool tech, as with other analytics, a big piece of the commercial puzzle is providing a user interface that makes it easy for a normal person to deploy analytics in a useful way.

in any event, I’d expect YOLO and similar technologies to show up in a lot of places in the industry as they mature.

Reactions:

(3)

John Honovich

IPVM

In reply to John Moss

John, good feedback, thanks!

Any guess about where the processing is most likely to end up for this? Camera? Recorder? Separate analytic server? Cloud?

Reactions:

John Moss

At this point in time, the neural network processing is too expensive to put in cameras and be cost effective. Cloud would overcome that issue but would introduce latency that is likely unacceptable. That leaves the server or a front end processor. I don’t love adding gear to the solution, so the present focus is on the server.

Reactions:

(2)

(1)

(3)

Sean Huver

•Nov 01, 2018

Defendry

In reply to John Moss

I agree with what John said about the need to construct new and original datasets, and just wanted to add that for applications to work well at scale (e.g. not being swamped with false-positives) one has to go far beyond the number of images per class as seen in COCO, as well as build a large dataset of false positive examples to be used for "negative mining".

There's a vast difference between a deep learning app that appears to work in a 30 second YouTube demo, and a robust model that can be deployed to 1,000+ locations and not swamp a central monitoring station with FPs.

Reactions:

Suresh Yendrapalli

Duranc

yes Tiny Yolo only can be ported at the moment using darkflow converting Yolo weights to tensorflow or implement yolov3 in tensorflow using https://itnext.io/implementing-yolo-v3-in-tensorflow-tf-slim-c3c55ff59dbe.

Also google worked with raspberry pi to make tensorflow.js work on pi3.

https://medium.com/tensorflow/tensorflow-1-9-officially-supports-the-raspberry-pi-b91669b0aa0

Reactions:

Suresh Yendrapalli

In reply to Suresh Yendrapalli

Duranc

There are lot of classes in coco dataset of not much use to surveillance users rather we need to collect and train custom data

Reactions:

John Honovich

In reply to Suresh Yendrapalli

IPVM

Suresh,

Thanks. How many classes are you looking to use? It strikes me that, for video surveillance, usage, it could be reduced to a handful of classes (person, vehicle, perhaps a few others) certainly reducing computational load and perhaps increasing accuracy. Thoughts?

Reactions:

Suresh Yendrapalli

Duranc

John,

We have custom data labelled for 106 classes which is being refined constantly by adding more data as we train as well as include more classes to eliminate false positives that come across in the scene of the camera.

whether fewer classes or more classes the computational load depends on what network you chose for training. for example tiny yolo vs the original yolo weights. End of the training its the weight file that determines what amount of GPU memory is needed. Tiny Yolo takes 1gb vs regular YoloV3 takes 2gb.

As you mentioned in earlier posts its tradeoff between accuracy and speed.

Reactions:

Undisclosed Manufacturer #2

Solution;

do the processing at the edge inside the camera and just send metadata to the server.

Reactions:

John Honovich

In reply to Undisclosed Manufacturer #2

IPVM

That could be a solution, though, e.g. Intel OpenVino does not support YOLO so don't think you can use for Movidius. Run on Ambarella? Hisilicon?

Reactions:

John Honovich

•Oct 21, 2018

IPVM

Here's an interesting benchmark running Tiny YOLO (the relatively low processing version) on movidius and nvidia, results:

Net/net: Even using the relatively big, relatively expensive (for camera standards) NVIDIA P1000, it can not deliver full frame rate video.

However, Myriad 2 is now the older Movidius version, not sure what the Myriad X could deliver.

Reactions:

Arup Mukherjee

Camect, Inc.

Is not being able to deliver full frame rate a big problem? (I would have guessed that in most scenarios, the object(s) to be detected will occur in several frames.)

Reactions:

John Moss

In reply to Arup Mukherjee

You're right that full frame rate is not a problem. But, remember that YOLO needs a fully rendered scene to analyze, like an I-frame, so GOP size might have to be adjusted. Also, consider that if an object is moving and you're drawing the bounding boxes (or otherwise highlighting the object detected), you'll have to keep up with the movement or it will look odd.

Reactions:

Arup Mukherjee

Camect, Inc.

In reply to John Moss

Your second point about the bounding boxes makes perfect sense... Thanks.

On the first point, I'd expect that it's relatively cheap (compared to the cost of running object recognition) for your video decoder to always have a fully-rendered frame available independent of how the incoming video stream itself is encoded, and that it might already be doing that this if the application is doing anything else, like motion detection.

Reactions:

(1)

Nathan Wheeler