Konstantin, thanks for sharing.
You say:
"And I wanted to show that it works now/today and it is not a question of years."
We need to be more precise about what works. You showed an extremely limited set of matches comparing to your face in a 3 foot field of view looking at the camera.
To quote myself from the Avigilon debate, I argued
"environmental conditions undermining it and sheer number of people to watch make it nowhere near practical"
You have not shown anything close to that working (i.e., poor quality images from cameras in the wild and the sheer number of people).
In demos, you can try matching one to one or one to a small group but you have begin to make a credible case for doing this at scale. I really want to emphasize at scale because that is the key engineering challenge.
Let's say we want to implement real time facial surveillance outside the Notre Dame Cathedral in Paris. First, consider how many people and how many faces go past there every day. It is easily 10s of thousands, especially since the same person will likely be picked up on multiple cameras. Every face that goes past a camera gets compared to every face on the watchlist. I am sure there are quite a number of suspected terrorists and other enemies of the state that the French would like to know about.
Now combine 10s of thousands of faces a day with hundreds (at least) people on a watchlist. How many times is the top match for a person walking by going to be false match? 1 a day, 10 a day, 100 a day? This starts to add up pretty quickly. How many suspects are going to be missed? Even if you get a hit on the watchlist (you can never be certain, you'll need a human operator to try to confirm), how long will it take to you verify that it is a match? Will you be able to track / find the person, etc.?
There is no doubting a person can sit in front of a camera and run matches like you did, but that does not begin to prove large scale real time surveillance. What do you have that proves this at large scale as it would be in production?