I may have missed this in other analytics testing, but how about facial recognition of people of color? Perhaps NA and Europe may detect them better, but in Asia, there are way less people of color and perhaps less testing of them.
Also, people of color in low light......I assume have white complexion is much easier to "see."
Awesome reporting. I'm curious to know if you were to take photos at different angles and then upload those into the database if you would get more matches say when people are looking to the right or left as they walked down the hall.
Having multiple reference images with a person turning their head in various directions does help when compared to a single reference. Having the database structure handles multiple images may be an issue.
I'm curious to know if you were to take photos at different angles and then upload those into the database if you would get more matches say when people are looking to the right or left as they walked down the hall.
I tested this using a picture of my head tilted down and a picture of me looking at the camera and it didn't make a difference what reference image I used, it would still show a ~90% similarity rating if my face was clearly seen during the walk.
Even when walking past the camera with my head tilted down, similarity rating dropped similar to what we show in the report but it was the same for both images.
John, you may wish to look at the Facepro system by Panasonic. The exercise that you mentioned is not necessary. Facepro uses deep learning AI instead of machine learning. Hence, it is able to handle more variations of face angles.
I'm curious how this stacks up against other companies doing this (the Hikvision report was definitely enlightening), such as Anyvision or Gorilla. Even the Axis Demographic, any thoughts on a competitive shoot out?
With all the press Amazon facial recognition got, I'm curious to see what passes for "average" in the industry. I've dealt with customers that are looking for 50% accuracy just to get security guard's attention, and other customers where under 90% is useless.
Good job IPVM team. This technology comes off as a fairly simplistic 2D approach with all the inherent drawbacks of uncontrolled pose and lighting. For biometric surveillance applications, I no longer consider simple 2D to be viable. Now with controlled and repeatable enrollment, followed by database search of same controlled images? Sure. But I am sceptical that even "Deep Learning" and "AI" can deliver a sub 1% EER in an unconstrained surveillance engagement. Better sensors (e.g., RGB-D) are required to achieve the kind of results that will be acceptable and practical.
Good report! I would like to see a comparison between embedded facial recognition solutions, like Dahua, Hikvision and Axis (with Ayonix). It would be great! Ayonix claims to do 3D FR embedded in any Artpec6 Axis camera.
Great information...thank you. Is it generally felt then that facial recognition technology is in its infancy and not yet ready for primetime? Is Dahua's technology a good first step in the right direction? Most innovative technology follows the "technology gap" theory, and is this the first stage typified by early adaptors?
Does this seem like this is the future of camera technology development? When analytics started to come of age, it became more reliable and easier to deploy, yet it still does not appear to create great confidence among most end users.
If analytics, coupled with facial recognition, reach high confidence levels, it's not hard to believe that every system will require these features.
Interesting report - and highlights that facial recognition (applied appropriately) is steadily coming into the mainstream if you can now get a workable if limited capability on a $1k camera from a vendor like Dahua.
Some of the issues here (such as lower confidence for angled faces) are true for most of current generation of lower-end facial recognition using fairly standard cameras - and that's where it comes down to your application, ie:
- do you have cooperative/ compliant subjects in an overt capture mode (who can be directed to look at camera in a controlled way, eg. for allowing access to whitelist), or non-cooperative / unaware subjects (covert capture, for catching criminals and for finding blacklist)?
- is the intended purpose more for information / data collection / general security (where false negatives aren't a big deal, and you can play with confidence levels - or is it for a biometric authentication / identification where false positives are intolerable)?
On the first point, even the best facial recognition system can be defeated in an uncontrolled scene if people know the camera is there or really want to avoid them. There are some novel approaches for drawing the attention of the subject to look at the camera (the most notable being the Dubai immigration aquarium!) that can address this to some degree.
On the second point, you'd never use a camera like this - or most standoff CCTV-based facial recognition systems - in isolation for a critical biometric authentication application (I'd hope!).
Hi, I did same test with same model in those days. Firstly my version is a bit later firmware than I see in Rob Kilpatrick test. Firstly results are correct. What is not correct is usage. The test show where real users will do mistakes when will use such biometric systems, exactly same as IPVM did. In fact my test was done under controlled condition to understand why real results so different from claimed. Main reason is that users believe this recognition works under any conditions. Unfortunately not, it has quite strictly limited range for resolution/shutter/angle/time in scene/threshold.
In this I fully agree with Skip Cusack comment - biometric such kind need at least minimal controlled environment. To Skip Cusak: I have to say that I was very surprised how system is able be precious. So far my test is I have to believe that under controlled environment such AI really can reach 1% EER. However really under controlled. F.e. idea of multiple loading same face under a bit different conditions does make sense and improves a lot result.
My results (trying to shorten as possible): Positive recognition (all people in database): From dataset of 736 frontal faces 98,9%. Faces up to horizontal angle 22,5deg still 99%. However angles 45 deg or worse going down the hill, say to compare faces turned into angle 75 deg is nonsense. (note 90 deg means that you are looking left or right, 0 deg is straight into camera).
Negative recognition (NONE of recognized people in database), 452 faces: 96% (mean 18 faces was wrongly detected as person in database).
Both positive and negative depends on correctly set Similarity Threshold. Yes, Dahua has in default set 82, but for reall application you MUST set your own. It is VERY sensitive number to get reasonable results. Results above was used with 85. After test was clear that ideal value to separate negative and positive recognition will be 87. Here are results which you will get if Similarity result setup too high or too low: Negative recognition (mean no one from face on display was in database): 71: 20% - mean 80% was wrongly detected as are in databes fro this threshold. 80: 55% 82: 83,7% 85: 96,7% 87: 100% - however this you cannot use for positiove recognition cause threshold for people in DB cannot be so hig.
Yes, to add same person multiple time allowing resolution rate, but it must be joined with increasing Similarity Threshold to reach higher precious rate.
So why the real test is just about 90% ??
This is cause I found that has very significant result those things: resolution, shutter, time in scene. Resolution of captured face really cannot be bellow recommended 150x150. Do not forget that camera is just 2Mpix so when recording on longer distance to get much smaller part of faces is not hard. In fact it results that lenses must be used f>10mm. This also results that scene cannot be for this camera FOV about 2,9-3,3m.
Shutter - evaluated was influence of movement blur. Found that quite small blur have horribly high influence on results. In fact results shows that using 1/100 and shorter is necessary! So it will not work under IR far walking persons. However not for color reason, but for blur reason cause in night working with much longer shutter.
In fact those two things practically disallowing to use camera for more than 10deg from movement axis, cause higher difference between walking direction and camera angles makes blur much worse. Results when camera is installed at ceil will be logically low. This is point why f.e. usage of current cameras as source for FR will be problematic. Yes, for such cases you can make datasets of people from top.
BTW to make such database is not so complicated as in comment people crying (even Dahua can make more comfort there, I agree). There are stored all snapshot in your computer as far as you have opened GUI and then you can take them, sort, label and upload into database. Simple script solving it. So if you know all YOUR people will be recorder from roof you have to use in database all them looking down.
Also time in scene plays role. Even camera is able to catch at one time 16 faces, each face need to stay in picture minimally 0,7s. F.e. it means that this face must be during that time within DOF (yeah again...shutter/iris/light)! If not you have blurred image (if any). Tested extensively, smaller time destroying results.
Anyway even to application of FR cameras is not as easy as to use IPC with motion detection, is still unbelievably useful, definitely for Police. I have to note that mentioned IVSS and IVS series really have interesting capacities. Also I have to note especially for security job - software not only provide result YES or NO. It provide result also on question - "Show me person similar to this one on my photo" and operator can in seconds change similarity threshold and limit number of results. So then is not necessary if was persons recognized on 80% or 85%, cause he can additionally limit area, time etc. and will get limited set of persons really similar searched one. So then check instead of thousand hours of video just few tens snapshots is easy!
BTW there is question about face color or race - can make you sure it does not play any role. It is matter of training dataset for neural network and current datasets are big mix of all races. Yes, author can make wrong a lot of things but this is not case.
How I tested: photo datasets used among CV/FR community, are available. Results above mentioned for classical FERET database. Datasets played on monitor, camera watching to them.
Version I have under test was: Device Type IPC-HF8242F-FR System Version 2.622.0000000.9.R, Build Date: 2018-08-23 WEB Version V22.214.171.1241486 ONVIF Version 16.12(V126.96.36.1994996) Algorithm Version S(1.7.2(V1.5.0))-K(31391) Safe Baseline Version V1.3
Richard, thanks for the detailed feedback. You mention:
The test show where real users will do mistakes when will use such biometric systems
Main reason is that users believe this recognition works under any conditions. Unfortunately not, it has quite strictly limited range for resolution/shutter/angle/time in scene/threshold.
While I agree that enforcing 'strictly limited range' will improve performance, overwhelmingly 'real users' are not going to accept that. Confining to the 'strictly limited range' increases complexity and cost in setup and restrict how and where it can be used. Your thoughts?
already price of equipment is limiting who is interested in. So I guess 'real users' are those for which such system make real reason for such money. My opinions is to tell others that technology is in fact great, but has application limits. There is no reason to inflate AI bubbles but on other hand is no reason to say the thing is crap. Simply real installers, as for every IVS systems, should in first think about installation position for it. And I hope it can save some of them away of application disaster.
It will be same for all future system called "AI" - marketing dep. are usualy not able to say details of applications cause do not understand them and documentations is very weak, sometime intentionaly not to frighten customer away...
F.e. it is pitty that I did not get into hands that Hikvision AI system which you have tested as unsucesfull. With understanding how the designer made constraints and trained neural networks I am convinced that it is possible to make it work also ;-)) Yes, you will answer that system should be more user friendly - unfortunately it means in CV more GPU/VPU power, which means higher cost. So we are where we are.
My toughts are that engineers around security systems should find a way how to applicate in reasonable way, simply to get some "feeling" for what the segment started called 'AI'.
And above all to stop marketing dep. to inflate capabilities of such systems by unacceptable "short cuts".