99.9997% Accurate Amazon Facial Rekognition Falsely Matches 28By: John Honovich, Published on Jul 27, 2018
A new facial recognition controversy has arisen, with the ACLU cleverly using Amazon's Rekognition service to falsely match 28 Congresspeople to mugshot photos.
However, the ACLU's results also show that the Amazon system was 99.9997% accurate and yet still generated those 28 false matches. While that may seem paradoxical, let us examine how that works.
The ACLU test required Amazon Rekognition to conduct more than 13 million facial comparisons as it submitted the 535 members of Congress vs a 25,000 mugshot database. Amazon needs to compare all the combinations to determine whether any of them are a 'match', as our graphic below shows:
Out of those 13+ million comparisons, 28 of them were wrong, with Congresspeople being falsely identified as having been arrested [insert joke about Congresspeople being criminals, etc.]. The reality though is that in 99.9997% of the comparisons, Amazon made the correct decision, as the graphic below illustrates:
This is not a defense of Amazon; Rather it is to emphasize the fundamental challenges of doing facial recognition at a large scale. AI expert Andrew Ng [link no longer available] explains this in the video embedded below:
Could Be Worse Actually
Indeed, the results of the ACLU test would have been worse if (1) the arrested people were of similar ages to the Congresspeople (i.e., the average Congressperson is ~60, the average arrested person's age is significantly less) and (2) surveillance video images were used (i.e., all images in this test were cooperative portrait style photos taken at direct angles, much easier to compare).
Amazon's Defense Ignores False Negatives
Amazon was quick to defend their facial recognition software, telling the Verge:
[Amazon] recommends at least a 95 percent threshold for law enforcement applications where a false ID might have more significant consequences. [emphasis added]
However, a major problem of raising it from the default 80% that the ACLU test used is that going to 95% radically increases the number of false negatives, that is the true matches that are not returned because the system is set at such a high threshold.
It is easy to demonstrate this with Amazon's Rekognition own demo, as the following examples we ran show how they would incorrectly not match at 95%:
At 95%, all of these valid true matches would be rejected by the system because their scores would be too low.
Fundamental Challenges Remain
At scale, false positives are going to be a significant risk, both for operationally managing them and for the risks to those people falsely matched. And eliminating those false positives, by raising the similarity threshold, will ensure many true positives are missed, i.e., false negatives, undermining the utility of the system. These challenges will remain and are at the core of increasing public debates about how this technology should be used.