There were some reasonable doubts whether DL facial recognition will work in the conditions closer to real. So, I want to support my first post and here is the example I can share with IPVM members that I hope is much closer to what we will see on the market soon.
Because of privacy issues, I cannot use real footage but this one is a part of probably the most realistic dataset.
The used fragment includes 25 people. The source dataset has frame rate of 30 fps and the image resolution is 800X600 pixels.
Used PC is a laptop Acer V3-772G- 9820 (Intel Core i7-4712MQ 2.30GHz, NVIDIA GeForce GTX 850M, 8 GB, DDR3L SDRAM).
Watch List consists from 13K plus images. 29 of them named as ID### and these images are mug shots of people from the dataset. Other images in a watch list are from the Faces in the Wild dataset and they are named as WRONG(#####).
Previous generation of facial recognition engines that I had chance to test, and test on this dataset as well, had issue with the using of mug shots (in most cases I had to use cropped faces from other video fragments that were made with the same camera, i.e. images had to be from one source) and could more or less reliably identify around 5 people on this video fragment.
The new DL based generation of FR can identify 22 from 25 on a top desktop PC and on my laptop it can reliably identify 16 (with the screen capture I had worse result and had to stopped identification to show you results of identification) on this video fragment.
Of course, it is not perfect and makes mistakes and moreover there are many things that still need a lot of efforts, especially resources consumption and processing speed, but the progress and results I see are very promising.
"There were some reasonable doubts whether DL facial recognition will work in the conditions closer to real."
In your video, the people are walking / looking straight ahead in an hallway in a narrow FoV. These results may very well be much better than old results but the scenario tested is a rather optimistic one compared to the 'real' 'conditions' end users expect. Yes/no?
John, I saw many different real conditions and there were better and there were worse. Anyway it's a rhetorical question - the better material you have the better / more reliable results you will get. It's true for any part of video surveillance. If to compare FR to LPR then it is obviously more demanding because it needs more details (face features are not dark lines on the light background) but the idea is the same. If you cannot see details of a license plate number / face then it is hard to expect that it will be recognized by the computer.
The used video fragment is a good example and it is not really an optimistic. There are angled, bad quality, dark, small and so on faces on this video.
Here is one of the best face of a person from that video that gets the best result. It has preliminary 60pix between eyes on this image.
As you can see it gets 9 of 10 on the 13K dataset. If we will be less picky and accept results with accuracy level 6 of 10 then we can get results on the worse images (by size, quality, angles etc.) like this
But the worse quality we deal with, the smaller threshold we use - the less reliable results we get and start to get more false matches. However, most of them look similar and I see much less number of obvious mistakes (man vs woman etc.).
New DL FR engines work very good. They won't find a black cat in a dark room but with the reasonable material they will provide with the good result.
Main issue as I mentioned previously is still performance. But taking into account a number of teams working on it right now I am pretty sure that we will see products from the different vendors this year.
I know teams and companies that work on recognition of facial expressions, age, gender, race etc. But cannot comment on it because I did not test it myself. And frankly speaking facial recognition is first of all about verification and identification of a person.
Probably I can add that those tasks are more complex than FR itself because for example:
signs of emotions are not so obvious and either it has to be a huge study material from all over the world (do not know about large datasets, saw only small and limited) or detection of micro-movements (top hardware for each face... sounds expensive) or probably both.
age - main issue with age is that we do not have enough datasets (images of people with 5 - 10 - 15 - 20 years difference) to train and test the reliable age identification.
I did not test it. If IPVM test shows that it is not good then probably it has some issues.
I can be wrong. The number of companies in this area is growing pretty fast and I do not have a goal to know all of them. However, personally, I do not think that it is possible.. reasonable to expect to have a camera with a good facial recognition all in one for $200, so far. Even previous generation of FR in most cases needs serious computing resources that means additional ~$50-60 board (small quantity price). DL versions will probably need in something like Nvidia Tegra plus the cost of FR license/cost of development if own. Maybe it will be when somebody create a SoC.
Do not know what will be the price for the DL versions from T1. But in 2010-2015 price tag for the real-time facial recognition solution from T1 players was around $6,000 on average per camera (many nuances and not for a home use). There were similar products with $2-3K price tags. On the reasonable low end prices were in hundreds per camera but you would have to know what to use to avoid issues. Of course, it is possible to take an open source codes of facial recognition and use them for free but the quality will be.. let's say an average.