Facial Recognition Test Results Shared

Take a look at Cognitec FaceVACS-VideoScan and Artec ID Broadway 3D.

The former we have performed some extensive tests regarding facial recognition. The results showed a satisfactory facial recognition of approximately 60%. That is, in 6 out of 10 attempts subjects of interest were recognized. It is important here to mention that many factors can influence the facial recognition rate, such as lighting conditions, subjects that do not look at the camera straight on, amongst others. With regard to assistance, it is worth mentioning here that Cognitec staff members were very helpful by providing us with tips about how to tune up our configuration.

In its turn, we have not evaluated the latter, but we have seen it in action. It looked very impressive, specially when (unsuccessfully) trying to fool the system, for instance, by holding up a picture or wearing dark sunglasses. They also claim to be able to recognize one person per second, or 60 faces per minute. However, we cannot confirm this claim because we have not tested it.

NOTICE: This comment was moved from an existing discussion: Questions On The Video Analytics Industry


Tiago, thanks for sharing. A few questions:

  • How many people/subjects were on the watchlist? I have found that larger watchlists results in much lower accuracy simply because the more people you are comparing against, the more likely subjects will look similar.
  • How often did you get false positives? i.e., john doe walks past the system but it matches him to bob smith
  • Any details on the setup? camera type? mounting height? angle? etc.

Hi John, I'll be glad to share our findings with you.

Our primary purpose was to evaluate the FaceVACS-VideoScan (software version 4.8.0 -- today's version is 5.1) for video surveillance (and not for access control).

For our evaluation we placed our video capture device (an uEye GigE IDS UI-6250SE IP camera coupled with a Pentax 1/2" C Mount 8-48mm F1.0 Manual Zoom Lens) next to the coffee corner of our Security Management department. The coffee corner has a frequent movement of employees and visitors. The camera was placed facing the corridor that gives access to the coffee corner, which results in people frontally facing the camera.

The camera's settings were as follows:

• Aperture size: F/2
• Lens focal length: 24 mm
• Focus distance: 6 meters

By using these camera's settings the closest and furthest acceptable sharpness are 5.27 m and 6.96 m, respectively. A total depth of field of approximately 1.70 m, therefore. The Lux (luminous flux) within the depth of field (between 5 and 7 meters) was above 1000 Lux. It is worth highlighting that the IP camera was connected to a PC workstation (Intel Core i5- 2500 3.30GHz processor, 4 GB RAM, 250 GB Hard Disk, Windows 7 Enterprise 64-bit) through a Gigabit switch.

61 Nedap employees participated in our evaluation by allowing their pictures to be enrolled on the watchlist database. We deliberately divided the enrollment pictures into 2 categories: Optimal and Standard. Optimal enrollment pictures (27 employees in total) consisted of photos taken from our employees's faces following Cognitec's guidelines (e.g., frontal pose, proper lighting, etc.). On the other hand, Standard enrollment pictures (34 employees in total) consisted of ordinary photos (Facebook or LinkedIn style) of our employees's faces. These ordinary photos did not fully follow the guidelines aforementioned as it is also the case for most photos of wanted criminals or suspects. The optimal enrollment pictures category contained 3 photos of the same employee with different poses (1 frontal and the other 2 slightly to the sides). Whereas the standard enrollment pictures contained only 1 photo (mostly frontal pose) of a single employee. All photos were stored in a Microsoft SQL Server 2008 R2 database.

After enrolling the images of our employees on the watchlist database, we observed their movement during a period of 24 hours. During this period we observed:

1. The number of correct matches: the evidence image correctly matches the enrolled subject.

2. The number of false matches: the evidence image incorrectly matches a certain enrolled subject.

3. The number of missing matches: the evidence image contained the face of an enrolled subject, but there was no match. That is, the score value was below the global threshold value set up for the purpose of recognition.

We then grouped all evidence images of a certain subject into events. An event consisted of two or more evidence images of the same subject during a period of time. The period of time is determined by the moment the subject enters the camera’s angle of view (i.e., first snapshot is taken) until the moment he/she leaves it (i.e., last snapshot is taken).

In order to determine if an event was successful (i.e., it resulted in a correct match), at least one snapshot belonging to this event should be successful. For example, the subject walked into the camera’s angle of view and 10 snapshots of this person were taken. If at least one snapshot resulted in a correct match, the entire event is considered as a correct match.

Out of 155 events registered during the period of 24 hours, 93 events (~60,00%) were correctly matched, 47 events (~30,32%) were missing alarms, and 15 events (~9,68%) were false alarms.

If you have any further question, do not hesitate to ask!

Tiago,

Thanks. A few questions:

Was the shallow Depth of Field any issue? Did that limit the number of images captured or cause too few quality ones?

Did you find a difference in accuracy/matching for optimal vs standard enrollment pictures?

Was the shallow Depth of Field any issue? Did that limit the number of images captured or cause too few quality ones?

Not really. We could see some faces being out focus (or blurred) when entering the depth of field as well as when leaving it. However, within the depth of field there were plenty of sharp snapshots that would enable recognition. It is worth mentioning that the camera was set up with a frame rate of approximately 15 FPS. If we take into account that humans tend to walk at about 1.40 meters/second and that our total depth of field was approximately 1.70 meters, this would result in approximately 18 snapshots taken within our depth of field. Out of these 18 snapshots, around 3 would contain a face out of focus.

Did you find a difference in accuracy/matching for optimal vs standard enrollment pictures?

Absolutely. Around 66% of the missing alarms events were related with images containing employees’ faces who had standard enrollment images on the watchlist. Whereas 34% were related with images containing employees’ faces who had optimal enrollment images. I would like although to point out here that the main reason for a missing alarm was that the subject was not being cooperative. For example, subject is walking and looking down (instead of straight on at the camera).

"the main reason for a missing alarm was that the subject was not being cooperative"

How dare they! :)

This is perhaps the toughest issue I found. By nature of this approach, you cannot force people. Moreover, many people have a tendency to look down. And worst of all, people committing crimes or wanted for them tend to be very cognizant of avoiding looking at any cameras.

Maybe the use of "honeypots" could help it! I've heard of some cases where people are not deliberately asked to look straight at the camera, but implicitly doing so by means of hiding the surveillance cameras behind one-way windows or multimedia displays.

Tiago,

It seems to me you are not giving Face Recognition in general a fair test, I agree with John, the camera type, mounting height, angle, light etc makes a big difference in the ability of a computer to identify the face in the picture.

We have been working with and testing most face recognition algorithms available in the market and when provided with the correct conditions we get better results then 99.7% accuracy of recognition even with very large data bases of enrolled people.

Sharar, I am not saying the test is unfair! :) I am just trying to get a sense of how it's being used. His conditions may simply be more demanding on the technology than yours.

For instance, if you are getting 99.7% accuracy with an analog camera pointed 45 degrees down into crowds of people in a train station against a 100,000 person watchlist, that sounds too good to be true.

So Sharar, I will ask the same questions to you - what is your setup where you are getting 99.7% accuracy?

"...and when provided with the correct conditions we get better results then 99.7%"

big deal. real world 'correct conditions' rarely exist outside of very specific applications (like border crossings, airport security lines, etc)

Yes 'correct conditions' rarely exist, if a system such as access control was installed correctly.

As for camera resolution, the resolution of the camera is not as important the magic number is 100 pixels between the eyes of the identified person in a maximum 15º face angle from the camera, with enough light on all the face.

The problem with testing face recognition on a population of people that is not trying to be identify is that you cannot calculate their face pose, when looking to the side or way down from the camera the recognition values drop, this face angle is never part of the test results, as it is very hard to imposable to measure it.

Sharar, are you using facial recognition for access control or for video surveillance? I am trying to understand your application. The context is critical here (for both a 99.7% or a 60% report).

The results of greater than 99.7% accuracy are in access control conditions, ad not less important with people that wish to get identify a fast, the speed of recognition in this case is 8 people per second per server

I believe that for access control. With access control, the people are cooperative, presenting their faces. With video surveillance, the whole point is to analyze people without them knowing. This, in itself, makes the surveillance application far harder.

I totally agree discussing Face Recognition in general without dividing it to the two different categories of access and identification from standard surveillance cameras is pointless; it is almost two completely different technologies with the same name.

From my personal experience trying to put a general recognition success percentage, on video coming from surveillance cameras that have been installed for surveillance has no real value. The face position in regarding to the camera is the most dominating factor.

Hello Shahar,

we performed our test for video surveillance purposes (subject is mostly not collaborative), and not for access control (subject is collaborative). We shortly tested the facial recognition software developed by Cognitec for verification purposes (1:1 comparison) and the accuracy was aligned with yours, that is, around 99%.

In any case when dealing in biometrics, a discussion on false reject has no substance without the false except information.

Only the two of them together make any sense.

Agreed. A system that is "99.9% accurate" but also generates loads of false positives for people who are incorrectly matched to a watchlist has serious problems.

"That is, in 6 out of 10 attempts subjects of interest were recognized."

Does that mean the subjects were recognized individually or just being recognized as a face?

Does that mean the subjects were recognized individually or just being recognized as a face?

The latter. I didn't mean to say that each individual had 10 attempts, in which an average of 6 of them resulted in a successful recognition of the individual. I meant to say that out of 155 events (i.e., subjects, at random, walking toward the surveillance camera), 93 of them (i.e., 60%) resulted in a correct recognition (or match, if you prefer): 6/10 to simplify it.