We (TechnoAware) are releasing right next week a new own platform for Face Recognition, implementing the state of the art of the best available algorithms: if you might be interested I can share information and our US contacts.
Even if things has definitely improved since years ago, still we must be very careful to take care about the conditions and the practical goals required to apply a valuable face recognition implementation!
Sorry but things like the recognition of a criminal with partially covered face from a person's glass reflection (as they show in CSI), or to recognize a person inside a crowd from a camera 100 mt far, or other foolishnesses that some vendor still uses to claim.... are still not yet possible!!....;)
Let's just remind the basic requirements we still MUST fix in order to have the best performances from a modern (serious) face recognition product:
1) The face MUST be collaborative, consciously or unconsciously, but it still (of course!) has to be collaborative. There is again a big "confusion" in the market about the term "not-collaborative recognition".. "Collaborative" by technical definition just means that the face must be the most possible frontal (not more than 20°-25° of inclination for good performances), well contrasted and possibly for several frames (5-10..).. When a camera is installed on a roof or at an angle or by an escalator and it is not required to the person to look consciously to the camera, someone use to call that "not-collaborative". But if the face does not look at the camera (unconsciousely but it practically does), or if the person covers or turns out its face, there's by definition no way to recognize it!!... Right?...
2) Still the theory warmly suggests to have 80-100 pixels (real pixels, not streched!..) eye-to-eye in the processed image: thus, the resolution and the focal needed should come by direct consequence for respecting this simple requirement.
3) Then, what is the real practical need for this tool in this place?.. Is it needed by the definition of the specific scenario a high performance? Or even if you get 1 on 100, for that specific need that's already a good valuable result in respect of not having it?
Because it's not that if you don't respect the 80-100 pixels of resolution or the 20°-25° of inclination, nothing is then working: simply, more far you go from these requirements, lower and lower will be the expectation of good performances.. And also, it depends of course from the complexity of the scenario, the quality of the camera, the lights, .... It's like when for perimeter intrusion detection by VCA someone is claiming to be able to detect even a person when it's just 1 pixel..... Well, what's the problem? Everybody can do it; but only in such easy scenarios where you can bet that 99% of times when just one pixel "moves" it's because it's a person.....;)
Let me just provide to you 2 real practical examples, to explain better what I mean by point 3.
First, imagine to be in a big railway station and you want to have a notification when a "criminal" (according to a provided database of persons) is passing by some point covered by a camera. What's the state of the art of this application today in that station? That an operator watches at some tens of monitors in a room, or a ward is just walking around the station: luckly, sometimes one of them gets a suspicious known person and they catch him.. "Sometimes" in that station means 1-2 per day, for example. Ok, let's install automatic face recognition and let's assume we can even get correct recognition of just 1/100 persons (the ones that luckly watches unconsciously correctly at one of the cameras along the path), among 10.000 person per hour passing there. Let's assume that 1/10000 of the recognized persons are actually among the suspicious ones. This would mean to get 8 persons per day (considering just 8 rush hours per day). Which is much more than 1-2 of the state of the art, and for an affordable cost in respect of the value to get 8 instead of 2 per day..
In this case, you can even not stress too much the installation: of course, let's have the best camera possible in the best position possible in order to get the most "unconscious" faces possible; but that's not a tragedy there if we don't perfectly respect the requirements, because even just 1% of hit rate could have already a great value in respect of the actual state of the art, for that application in that scenario!
Second, imagine to have an access control and the door which opens only if the person is correctly recognized to be one among the allowed ones. Well, in that case you will need to be much more careful about respecting definitely the requirements: because in this case it would be very annoying for the access process management to have just 70%-80% hit rate..... On the other side, in this case you may even improve the rate by implementing a phisical process to force the collaborativity of the person: for example just by blocking the door until a face has not been recognized. The person must be collaborative as much as possible if he wants to enter..
So as usual, it is "just" needed competence, expertise and good sense for applying (or not!..) the correct approach for the specific needs and scenarios.
At disposal for further insights, if needed..