There were some reasonable doubts whether DL facial recognition will work in the conditions closer to real. So, I want to support my first post and here is the example I can share with IPVM members that I hope is much closer to what we will see on the market soon.
Because of privacy issues, I cannot use real footage but this one is a part of probably the most realistic dataset.
The used fragment includes 25 people. The source dataset has frame rate of 30 fps and the image resolution is 800X600 pixels.
Used PC is a laptop Acer V3-772G- 9820 (Intel Core i7-4712MQ 2.30GHz, NVIDIA GeForce GTX 850M, 8 GB, DDR3L SDRAM).
Watch List consists from 13K plus images. 29 of them named as ID### and these images are mug shots of people from the dataset. Other images in a watch list are from the Faces in the Wild dataset and they are named as WRONG(#####).
Previous generation of facial recognition engines that I had chance to test, and test on this dataset as well, had issue with the using of mug shots (in most cases I had to use cropped faces from other video fragments that were made with the same camera, i.e. images had to be from one source) and could more or less reliably identify around 5 people on this video fragment.
The new DL based generation of FR can identify 22 from 25 on a top desktop PC and on my laptop it can reliably identify 16 (with the screen capture I had worse result and had to stopped identification to show you results of identification) on this video fragment.
Of course, it is not perfect and makes mistakes and moreover there are many things that still need a lot of efforts, especially resources consumption and processing speed, but the progress and results I see are very promising.