Awesome new IPVM tool guys...!
AI Video Tester Released
IPVM has released the world's first AI video tester that lets you see how various AI models (including from Amazon, Google, Microsoft and YOLO) work on your own video.
Why?
While there is lots of hype about 'AI', 'Deep Learning', Neural Networks, CNNs, etc., it is hard to know how well they work. Worse, it is difficult to tell how they will work on your own video rather than marketing demos or generic photos.
How It Works
You can try the AI video tester here by choosing from one of our sample videos or (members can) upload their own video. Below is a sample video from a moderately challenging video surveillance scene analyzed by Amazon Rekognition. Notice it does well generally but periodically thinks it sees a bathtub:
And our Tester maps out where and when each object is seen plus lets you scan the timeline below to see frame by frame which objects are detected, as shown in the gif below:
Use Cases For It
We do not expect everyone to make use of this but here are the main use cases we see:
Research
If you, like us, are doing research into computer vision, this tool is a unique means to quickly and easily gauge the performance of various models. That was the first reason we built it, as we build up our computer vision testing, we wanted a way to do faster and better testing. You will see us using this in upcoming reports.
Education
Most video surveillance professionals know little about AI beyond buzz words (e.g., 'neural nets' and 'layers', etc.) but they have little idea how well they actually perform. This tool makes it easy, instead of spending days and having to know how to set up each of these models, you simply add whatever videos you want and let the tester run it for you.
For example, many of our beta testers were surprised about the results they saw, including how poorly many models worked in many challenging surveillance conditions.
We have an upcoming AI Video Analytics course and this tool will be a core component of the exercises and training.
Comparison
For those looking to use these models in production, this will help directly in making product comparisons. Of course, there is a major limitation, the tester does not include any surveillance manufacturers products yet.
Future Improvements
For now, we are releasing the foundations of the Tester.
The most obvious improvement is more models / systems, from OCR and LPR offerings to various manufacturer's AI systems. While we will add more, we will not ever be able to make it all inclusive since many video surveillance system either do the analytics inside the camera or do not provide sufficient APIs.
Another improvement we are exploring is to make a simple video analysis tool for exported video, i.e., using one or more of these models to help integrators find people or vehicles from long recorded video clips.
We can also add facial recognition to this by using our Tester and adding on a face database component so members can experiment with different systems on their own video before deploying on site.
And, of course, we are definitely open to suggestions from members for improvements.
Try It Out - Give Us Feedback
Try it out, let us know what you think, questions you have and improvements you want.
Really, really, really could have used a tool like this in 2005, for managing video analytics expectations.
Good luck!
I see the bathtub too. What an awesome toool
Tried two uploaded videos: this was a real crime where a person opens a gate, and the 2nd was him stealing a gator tractor. With the 4 models for object detection: the first two models had a lot of hits off light reflections including mirror and airplane. Models 3 and 4 did not detect anything.
How can this tool demonstrate how these models can learn?
Hi Robert, these models do all their learning up-front; they don't learn anything new as new videos are uploaded. This is a process called "supervised learning". The model developer starts with a big dataset of videos and known objects within them, trains the model on these videos, and then packages that model up / wraps it in a bow. Now it'll work the same for each run of the same video every time, it doesn't learn more going forward.
Some vendors tout online learning; learning from new videos as they're uploaded. This might be by way of "unsupervised learning," though I'm not familiar with their approach (something for my radar).
+10,000 for such an educational tool. Perfect for showing "AI" capabilities and cutting through the hype.
Great idea! We'll try it for sure. According to our experience such a general networks were trained on nontypical for CCTV scenes and work worse in comparison with specially trained ones.
Hi Tyler,
We have been working with the Yolo model and have been feeding it 1000s of clips. We have 3 metrics for evaluating the AI: Accuracy, Efficiency, and Fatal Error. We feed events transmitted from cameras with analytics into the AI model and then compare its classification with how operators classify it in Immix. There are 4 outcomes: #a, #b, #c, and #d. See below. Would like to see Accuracy >95%, Efficiency at 30% and Fatal Error AI below 5%. So far it is not ready for real time.
Thoughts?
This looks great, and hopefully it can help manage the, often unrealistic, expectations on what computer-vision is capable of (especially when the input is from CCTV cameras with heavy compression and bad light). I tried a few of the IPVM videos, but I couldn't get the graph to show up (I am on Firefox).
If I am not mistaken, the algorithms used are all "still frame" algorithms that treat each frame as if it was completely unrelated to the previous. AFAIK, YOLO doesn't "remember" that it found a person at x,y in the previous frame. It's just so fast that it is possible to do the classification ROI boxes on 30 fps video (maybe they improved this in v3). There certainly are algos that are designed to use knowledge from the previous frame in the next, but I can't remember which.. maybe I'll dig it out one day.
There's a brief video on the NVIDIA deep-stream running on the $99 Jetson nano board. 8 x 1080p @ 30 fps pretty robust object tracking. It's impressive stuff, and perhaps something for IPVM to look into.
https://www.youtube.com/watch?v=Y43W04sMK7I
AFAIK, YOLO doesn't "remember" that it found a person at x,y in the previous frame
You are right! Object tracking based on objects detection from YOLO-style networks is a different non-trivial task.
There's a brief video on the NVIDIA deep-stream running on the $99 Jetson nano board. 8 x 1080p @ 30 fps pretty robust object tracking.
There are no identifiers of each object, just bounding boxes. So I think it's not a tracker, but a frame independent object detection. One may count all the people at each moment in time, but it's not possible to trace the path of each person.
This tool is current discontinued? When is it expected to reactivate?
Yes, we discontinued it. We do not have any plans to reactivate it for now. It was not used much and it had high ongoing costs. We may reimplement it in the future but no plans currently. My apologies.
Thanks John. Are there any alternatives to gauge the performance of various models?
I don't know. Obviously people can do it themselves one by one but I don't know of any online services that let you try / compare.