Demo Of Facial Recognition Based On Deep Learning

Hi guys, we discussed facial recognition in the "Avigilon post" and probably it was not the best place for a such kind of discussion. I have recorded a simple video to provide you with some visual info on my words regarding the progress in the area of facial recognition. And I wanted to show that it works now/today and it is not a question of years.

Please note, that it is not a professional record, I have made it on an old office desktop PC (also shows that it works on it), it is not the official test and I was not going to cover all possible situations and so on.

Hope that you will enjoy it.

P.S. The results is not normalized. In general for this video the threshold is 0.6, result that less than 0.6 is not a match, 0.6 - 0.7 is a match and more than 0.7 is a true match in most cases.


Konstantin, thanks for sharing.

You say:

"And I wanted to show that it works now/today and it is not a question of years."

We need to be more precise about what works. You showed an extremely limited set of matches comparing to your face in a 3 foot field of view looking at the camera.

To quote myself from the Avigilon debate, I argued

"environmental conditions undermining it and sheer number of people to watch make it nowhere near practical"

You have not shown anything close to that working (i.e., poor quality images from cameras in the wild and the sheer number of people).

In demos, you can try matching one to one or one to a small group but you have begin to make a credible case for doing this at scale. I really want to emphasize at scale because that is the key engineering challenge.

Let's say we want to implement real time facial surveillance outside the Notre Dame Cathedral in Paris. First, consider how many people and how many faces go past there every day. It is easily 10s of thousands, especially since the same person will likely be picked up on multiple cameras. Every face that goes past a camera gets compared to every face on the watchlist. I am sure there are quite a number of suspected terrorists and other enemies of the state that the French would like to know about.

Now combine 10s of thousands of faces a day with hundreds (at least) people on a watchlist. How many times is the top match for a person walking by going to be false match? 1 a day, 10 a day, 100 a day? This starts to add up pretty quickly. How many suspects are going to be missed? Even if you get a hit on the watchlist (you can never be certain, you'll need a human operator to try to confirm), how long will it take to you verify that it is a match? Will you be able to track / find the person, etc.?

There is no doubting a person can sit in front of a camera and run matches like you did, but that does not begin to prove large scale real time surveillance. What do you have that proves this at large scale as it would be in production?

John, first of all I wanted to show that it worked much better now with such kind of things like angles, quality, difference in age etc. Second of all , should I be presenting my back of the head instead of face? :)

Why do you always talk about an extreme case of working with a crowd in a real time? Do not you see any other opportunities to get a value from the facial recognition?

Your case does not make sense to discuss because in real life nobody (reasonable people) will try to use FR on a chaotic crowd in an open space. There are entrances, aisles, subway exits, ATMs, vending machines, tickets offices etc. Usually the ways people enter the area of interest are well known and moreover can be managed.

Also total screening doubtfully can be used for the mentioned type of place - Are there people watching this area now? How many cameras they watch simultaneously? How many response units do they have in a field? I suppose that in most cases it will be limited number of operators that are responsible for 10th different cameras and without directly connected response units in the field. Do you think it make sense to make identification of known terrorists in a real time when the end user anyway cannot react on the positive match? Airports, train stations are different story but not public places like "outside the Notre Dame Cathedral in Paris".

Number of false matches depend on many factors and do you really expect to get an exact % on an abstract question? Such kind of information can be gathered more-or-less reliably only after the full implementation of system on site. But the vital thing is that it is adjustable and if the response team, for example, can handle 100 alerts a day then the whole system will be adjusted to provide them with the best matches within this limit. It will mean that they will get up to 100 opportunities to identify real threats.

And regarding missed suspects. How many offenders/terrorists can identify regular human being in a crowd if the black list consists of thousands or even 10th of thousands people (in fact it is the same for a 100 people watch list)? I think that the answers is close to 0. The FR gives an opportunity to get a result that will be better than zero even in a such kind of conditions because to make a decision on a possible match is much easier then try to scan the crowd.

The glass is half full, John. If everybody will be waiting for the 100% reliable product there will not be any progress in any area.

"How many offenders/terrorists can identify regular human being in a crowd if the black list consists of thousands or even 10th of thousands people (in fact it is the same for a 100 people watch list)? I think that the answers is close to 0. The FR gives an opportunity to get a result that will be better than zero even in a such kind of conditions because to make a decision on a possible match is much easier then try to scan the crowd."

Now, we are getting somewhere because this is what needs to be addressed. Matching Stallone pictures is a parlor trick, which does not even get to what you are faced with a real situation.

Let's say the facial recognition system gives you more of a chance at getting matches than human beings watching monitors. Sure, but at what cost? And with what return?

Explain to me the costs of doing facial surveillance in the way you propose (include new cameras, optimization, networking, new servers, real time staff, etc.) in a football stadium and then we can continue this discussion.

Now, we are getting somewhere because this is what needs to be addressed. Matching Stallone pictures is a parlor trick, which does not even get to what you are faced with a real situation.

Do you really think I could use not public images in a public video? If you send me your images I can do the same thing with them.

Explain to me the costs of doing facial surveillance in the way you propose (include new cameras, optimization, networking, new servers, real time staff, etc.) in a football stadium and then we can continue this discussion.

Why would I do it? To have a pleasure to get sarcastic comments from you side? Thank you but I am ready to deal with picky partners only when they pay me money.

The bottom line is that now it is possible to get matches in a such kind of cases:

I have been working with facial recognition since 2007 and for me, these results, it is like a magic. My intention was to share it with the community. We are on the edge of big changes. And I do believe that now it is not a question of whether it will be used or be useful. It is like switching from Analog to IP: it is inevitable and it will be a fatal mistake to ignore it.

"possible to get matches in a such kind of cases"

The problem is when you do this over large numbers of people on a watchlist and large numbers of people one after another (as you would in production). The chances that other people match higher are significant and the probability of operator overload is real.

What you are doing with these parlor tricks / demos is to show that facial verification works (i.e., take a photo of a person, verify that is the same person in a reference photo). That's well established technology, broadly used, but it is far different than doing real time facial surveillance against the general public.

As for cost, my point is that doing this is very expensive, both capital (cameras, commissioning, servers, software, etc.) and operational (people to check in real time every time there is a match / alert). Once you take the accuracy issues at scale, the costs of deployment and compare it to the amount of criminals caught, terrorists stopped, etc., it's hard for almost everyone to justify.

Overload: Is there difference if you ask operator to watch one camera or ten cameras? It is a decision customer makes: have tighter security or afford some risk and reduce costs. The same situation with FR - you can adjust system to get more alerts or less. It is manageable. And do not forget that usually operator does not have this information without FR.

Tricks: these "tricks" are not possible with the current generation of FR without 3D modelling/pose adjustments and even then the results are not so good for most products on the market. If you want to see the product for video surveillance as I see it let me know I will send you a link and some additional information privately. I cannot share it publicly because of privacy and legal restrictions.

Cost: you very like to mention terrorists. Do you want to hear a question: "What is a cost of saved life?"? I will not ask it because I see other benefits not only government anti-terror usage. Anyway,

  • hardware - good cameras are not rare and expensive thing now, hardware for processing - yes additional cost but it can be shared in some cases with VMS/Access Control and usually you do not need to process streams from all cameras but from few,
  • software - yes, its an additional cost but it won't be a disaster especially because it won't be used for all cameras,
  • operational - if we talk about real-time only: if you have operators and response teams then you just get a tool to make their work more focused - see my point above, if you do not have them then you cannot react in real-time to anything and it is not an issue of adding FR to the table.

While the Notre Dame test case is a fascinating one (and a valid one), I suspect that Facebook could (if it chose to do so) conduct some really fascinating tests. Presumably Facebook makes use of friend connections and the like when choosing the facial images that it compares, but I wonder if anyone in Facebook is experimenting with unfiltered facial comparisons against Facebook's huge image database? Facebook's "population" is larger than the populations of most nations, so the results of such searches would provide some valuable data. Of course, it is not in Facebook's interest to provide the results of any such tests to anyone...

It will be difficult to check results and gather statistics. Public tests on the Labelled Faces in the Wild dataset are close to what you want to see.

Konstantin, just for fun I was messing around with this image:

Most all these people have public headshots of decent quality available online, since they are "journalists" and executives.

Assuming I register the headhots into a face rec DB with 80 other random non-matching control headshots, what percent of the 20 or so fully pictured faces would you estimate will match?

What available software should I use to get that result?

@John, what percent do you think can be matched?

There can be only speculations without an experiment. Let me ask you similar question, assume I give this photo + photos of 80 random people (i.e. 100 people watch list) to a security guard then put him on a post in front of an entry to something. He will see not more than 5 faces at each moment of time. How many people on a watch list can he identify if all 20 of our interest will pass him within an hour, within a day, within a week?

There can be only speculations without an experiment.

Of course, that's the point. What do you speculate will happen here? I'll be glad to show the headshots in the database, the cut heads from the picture and use the software that is reasonably available today.

For instance were you to say 90% and John to say 10%, and it did 80%, John might want to reconsider his views or at least refine the experiment. Or vice versa.

Consider that if I am a purchaser of such software I myself must speculate to some degree on whether it will actually work. I cannot test every face before hand.

I could speculate on your question, but unless you plan on performing it, it will just be opinion that is unlikely to persuade anyone entrenched in their own beliefs.

You can give a range, e.g. 50% - 75% ?

If you have material you want to try then I can run it and provide you with the results. If you have a project in mind then I am ready to discuss it (but not with the anonymous person). Otherwise sorry but it will be stupid to give you any range other than 0-100%. I cannot even see the faces of these people to say something.

Konstantin, this discussion surrounds the BRS theme of spending CAPEX to assist a security officer in the decision process of an "anomaly". Are you aware John was a product manager for a leading Facial Recognition manufacturer? I would not question his experience on the subject.

U3 - How is facial recognition connected with "anomalies" and BRS?

I understood your argument to be that as its not 100% reliable, it's still better than nothing and that it will bring to a security officer's attention and/or allow further investigation. I agree, if that was your argument.

I also understand the difficulty in scenes with a limited database and controlled environment. The real world is outdoors and uncontrolled as John claimed. It's getting better all the time as The Beatles claimed.

"It's getting better all the time"

"It can't get much worse."

It is strange to compare facial recognition to the behavioural analytics or consider it is totally something for the far future. There are many companies that develop facial recognition or embed/use it for different purposes. What all of them are doing now if there is no market for it?

Facial recognition is already used for real-time identification. Here is an example of it that is not government related. There is a such kind of thing as a self exclusion list for the addicted gamblers. And casinos need to identify these gamblers and stop them. There can be 100, 1000 or even 10th thousands of gamblers on a watch list. So, to solve this task they either need to ask ID from all visitors or identify people on a watch list manually. How many people on a watch list can identify a security guard by watching entering people? Do not you think that it is much more efficient when an operator receives alerts with possible matches and 2 images side by side? They do and it is not a unique case.

I am not going to convince anybody to use facial recognition. If you do not see any value in it then it is your choice and your opinion. This topic was started to share information on the current not future progress in the facial recognition development. All I want to say is that personally I am impressed with the results and the new FR solutions will be at least twice more accurate in the described and other tasks.

Do not you think that it is much more efficient when an operator receives alerts with possible matches and 2 images side by side?

I guess that would depend on the images...

.

Good matches, are not you satisfied with them?

And casinos need to identify these gamblers and stop them. There can be 100, 1000 or even 10th thousands of gamblers on a watch list. So, to solve this task they either need to ask ID from all visitors or identify people on a watch list manually

Are you speaking of this system in Ontario, by any chance?

“To some degree, I probably wanted to get caught,” Christina explains. Torn between her overwhelming urge to gamble at the slots at the Windsor casino, which had wiped out her modest savings and destroyed her credit, and her determination to stop, the 50-year-old Windsor woman had volunteered to be banned from all racetracks and casinos in the province. She agreed to be charged with trespassing if she entered any of them.

A facial recognition system at the casino was on the watch for Christina, and over 17,000 Ontario residents have signed the same agreement. But lonely and overwhelmed, having lost an inner argument with herself, she didn’t find it hard to slip back in. “I was going in there regularly for two years before they caught me,” she remembers.

“When I went in there I’d put my head down, or be on the phone, or something when I went in, so they can’t see your whole face. Sunglasses.”

Sometimes casinos can be in denial too...

So what? It proves my words that there is such kind of need and there are solutions installed.

I cannot discuss Ontario casinos case but

1. Yes, it is possible to play a system if you really want it.

2. Usually watch lists that have end users are full of .... like these

(I hope you understand that these images are random images from Google)

It proves my words that there is such kind of need and there are solutions installed.

It proves most of your words:

There was a need. There was an installation. But's it's not a solution, is it?

Sunglasses. Gamblers can be so sneaky sometimes.

You know, I am not ready to be accountable for all installations of FR in the world (money upfront please).

Thinking it over, maybe it could be made to work!

Perhaps by installing cameras in a few well chosen but fixed spots where you could get a closeup frontal shot, slot machines for instance. Kinda like something I read about recently...

No, this won't work. Try to think more maybe you can find a good one.

Maybe just issue the self-excluded an RFID card that they need to swipe in order to be denied access?

Keep trying

I've got it!

You could detect the "micro-movements" on relapsing gamblers faces, like you describe here.

You don't even need to enroll them!

Boring, no fantasy at all. Sorry but I am not going to waste my time on your comments anymore.

You really need to keep more of an open mind, Konstantin.

Toshiba video analytics on a chip: