Over 20 years ago, the first product we developed was a motion detector. We did not have enough CPU power to compress video and not enough HDD size to record, so we controlled tape recorders (VCR) through IR signals emitted by computers when our AI detected motion; very clever! We saved many tapes! But from that moment, we expected more from video analytics. We wanted it to work more accurately and to tell us more information. However, after 20 years we are still using almost the same algorithm to detect motion! Why?
Two things. First, it is challenging to find a suitable algorithm to do more than just motion detection. Secondly, once you think you have a good idea, it is immediately apparent that you need 10 times more CPU resources for this algorithm! Furthermore, for the past 20 years we have never had enough resources.
When we have a more powerful platform, we want to connect more cameras to that platform. So, now we prefer to add 1500 cameras to one server and don't want to allocate resources for the advanced motion detector. We want cameras to report motion detection, and this is an excellent idea. So, after 20 years we finally don't need server-based motion detectors. This is good progress.
Deep learning approach. What is the difference?
Neural networks were invented 40 years ago. So, what happened to make it have such a huge impact in today's society? During NVidia's development of parallel computing to accelerate graphics, it was accidentally discovered that this architecture works terrifically to accelerate neural networks calculations! Therefore, this made these calculations affordable for everyone! Now almost everyone can train neural networks and get robust recognition. You do not need to invent algorithms anymore, and you do not need to compete for the same CPU resources; you have a very precise technique to train your network to detect anything, and then run this network on separate processors. Intelligent!
Why is deep learning better than traditional algorithms?
In the traditional approach, you need to invent an algorithm in order to be able to detect something. Additionally, your algorithm cannot be perfect because you cannot predict all possible outcomes. The deep learning approach is different; you simply show pictures during the training process and tell AI what they are. Neural Network will adjust neurons and then will be able to recognize accordingly. Simple. This is exactly how the human brain works. I'll give you an example using facial recognition: all traditional FR algorithms were based on different ideas of how to determine which faces are similar. Translated to computer language: what parameters of the faces correspond to similarity? Is it the distance between the eyes or the nose shape? Or is it something else? We do not know. We do not actually know how to compare faces and decide which faces are the same; especially in different conditions! However, with deep learning and neural networks, we do not need to know - we can simply show different faces and tell the computer which faces are the same. That's it!
It is important to note that everything depends on your data set. If you have a set with a wide range of different faces, views from different angles, different ages, different lighting environment and so on, your network will be trained to recognize in different conditions. Moreover, you can train with different resolutions. This is the area where the human is not trained at all. So, that means that the neural network should perform better than the human brain - and we can already see evidence of this; a human cannot remember a million faces and immediately sort them by similarity. AI can!
What to expect next?
After the surprising results of Nvidia, we can see a very good progress towards implementing neural acceleration in different hardware. Intel acquired Movidius, and it works great! HiSilicon announced new camera chips with embedded neural acceleration processors. What does it mean? Every camera will have trained neural network to do what you want it to do. So, after 20 years of no progress in motion detection, we have made an incredible jump to be able to detect anything - even better than human eye! Yes, it's hard to believe, but who believed before computers age in Turing's prediction that machines will be able to calculate better than a human ? No one have a doubt now.
But, still, the fundamental question has not been answered - what can we really do with this new technology? What are the benefits we can enjoy? What kinds of features can we explore? In which area the efficiency of security can be improved? I'll give my thoughts in the next discussion.