Google’s Super Resolution Enhancement Examined

By: Brian Karas, Published on Feb 10, 2017

"Zoom in and enhance, I think there is a clear shot of his face in the reflection" has been the kind of statement CCTV users have wanted to make reality for years.

Hollywood has imagined this technology, but now engineers at Google have released a whitepaper on "Pixel Recursive Super Resolution", showing real-world examples of upsampling a highly pixelated image into one with details and recognizable features.

In this report we examine Google's super resolution whitepaper, and how this technology could impact the security industry.

"**** ** *** *******, I ***** ***** ** a ***** **** ** his **** ** *** reflection" *** **** *** kind ** ********* **** users **** ****** ** make ******* *** *****.

********* *** ******** **** technology, *** *** ********* at ****** **** ******** a********** ** "***** ********* Super **********", ******* ****-***** ******** of ********** * ****** pixelated ***** **** *** with ******* *** ************ features.

** **** ****** ** examine ******'* ***** ********** whitepaper, *** *** **** technology ***** ****** *** security ********.


Recursive ***** ********** ********

*** ********** ********* ** algorithm **** *** **** low ********** ***** ******, and ************ *********** ** output **** ********* ******/**********. When ******** ** *** actual ***** *** ***-********** input *** ******* ****, the *********** ****** **** many ************, ****** ** that * ***** ******** could ****** ********* *** person, ***** **** ***** not **** **** **** to **** *** ***-*** image.

** ******* ** *** downsamples ****** ***** **** the *********, *** ******, and *** ****** ****** image *** ***** *****:

*** ******* ** ********* as ***** ******* ** an ****** ******** * painting ** * **** from * ******, ** from * **********, ***** the ****** **** ******* to *** ******** ***** on ** ************* ** how *** ******* ****** be ******* **** ***** observations ** *** *******.

***** *******/****** *** ** reconstructed, ** ** *** limited **** ** *****, the ********** **** ***** examples ** ****** ** bedrooms **** **** ************* from ******* ***-*** ***** images.

Requires ***** ***** ******/********

** ***** ** ******* accurate ******, *** ********* needs ** ** ******* beforehand ** * ****** of ****** ********** ******. If *** * ***-*** image **** **** *** correlate ** ******/******* ** has ********** ******* *** learned ****, *** ******* will ****** ** ****** inaccurate.

** ********** *** ***** on *** **** *** initial training ******* *****, ****** the ********** **** ***** that * **** **** used ** *** ******** process, ********** **** ** is **** ******** *********.

Limitations ** ***** ********** ****** *** ********

******* *** ****** ****** are *********** ********-********* *******, the ****** ***** ****** be ************ *** ******** in ********. ******* ** police ********, * ****-***** application ** **** ********** in ******** ***** **** narrow **** * ******, but ***** ****** *** be ****** ** ******** in *** **** *** a **** ****-********** ********* would **.

Compared ** ****** ***********

****** **** ********* ****** images ** *** ******** set, *** **** ** not ******** ** ** a ****** *********** *******, and *** ************ ********* include ********* ***-*** ****** to **** **** **** photo-realistic ** ****** ****** based ** ******** ******** from ******* ******. *** algorithm ** *** ********** to *** ******* ******* two ******* ********** ******, it ** ********** ** create ******* ***** **** exist ** ** ***** image.

Applications ** ******** ********

**** ********** ** ******** to ***** *********-***** "**** in *** *******" ************* to *** ****** ******* of **** *****. *** reliance ** ***** ******** would **** ** ******** the ********* *** ********, as ********* *** ***** looking *** ******* ** persons ** ******* **** have *** ****** **** seen ** *** ******* previously. ***** ***** ** applications ** *** ****-*** images ** ******* **** one ****** ** *********** identify * ****** ** object ** *** ********** of *** ***** **** another ******, ****** *** overhead ** ** ************ processing ****** *** ******** the ********* ***** ****** make **** *********** ***** the ******** ************ *** significantly *******.



Comments (21)

Just wondering how this might benefit LPR tech...even if it gave me a better guess to work off of, it could be very beneficial to me/the police.

This is somewhat different than LPR tech, though a lot of the core learning frameworks are the same.

In LPR, you are trying to find an exact match for a set of characters, this makes parts of the problem easier because you are generally limited to a very small set of possibilities (A-Z, 0-9, etc.). Also, if you can figure out (or guess) the state you may be able to limit it further if the state uses particular sequence formats.

For this Google application, they are really attempting to "paint" an image that looks convincing to a human. It is less about precise accuracy, and more about filling in details that make the objects recognizable. So, for this reason, it would not be ideal for LPR, as the algorithm might be able to take a blob of pixels and turn it into an image that looks very much like what you would expect a license plate to look like, it would probably be less concerned with differentiating between an "O" and a "Q" (for example). Or, the training data could cause it make other errors- if none of the input plates had "Q"s, but lots of them had "O"s, it might take a pixel blob of a plate that was "QQQ" and paint it as "OOO". (this is my interpretation from reading the whitepaper, there might be more to it).


Countpoint on your last bit --

It is very simple to write a quick filtering algorithm that would replace any 0, O, or Q with a wildcard that would search across all 3. So, if you're telling me that it could take a blob of pixels and potentially output to something that would give me a range of possible numbers, that would be immensely helpful in a real, serious investigation. If you're able to hand the police a list of even as few as 500 license plate numbers for them to search across, and something like the color of the car, they can narrow a list down incredibly quickly.

The ability to create a reasonable guess out of something totally unrecognizable would be game-changing for both the security and investigations industry and law enforcement and police work overall. The ability of law enforcement to take one tiny piece of evidence -- even an educated guess -- and start to narrow the field is extremely impressive. And, to your point, you could very easily input millions of pieces of reference data to compare against, so it could definitely "learn" from a database of plate photos from every state, etc.

I said it in another thread awhile back, but by far the most interesting thing happening in the electronic security industry right now is the utilization of "big data" or data analytics to provide either relevant historical or actionable information to end-users. There are some exciting things happening in that regard already -- I think the next few years is going to see an explosion of it, specifically as the 1 or 2 companies that are currently doing it start to grow it more and more.

Very interesting... but nuts to court-admissibility.

Cool stuff, but sounds like a court admissibility nightmare...

How often does video evidence get used in trial, I wonder?

As opposed to forcing a plea...

Also, if the video evidence can just identify the perps to police, then a case can often be made without video evidence.


Although a good defense attorney would ask, "Why did you focus on my client in the first place? Lead me through the steps you took, please."

Why did you focus on my client in the first place? Lead me through the steps you took, please.

"We took the grainy, poor quality video evidence and ran it thru the Google enhancer.  This enhancer uses various assumptions about various facial characteristics to construct an image which may or may not help in identifying the subject.  In this case, the artificially enhanced image prompted several employees to suggest that the recreation looked like the suspect.   The suspect was investigated and their alibi checked.  The suspect was also found to be in possession of the stolen merchandise."

Considering Apple's facial recognition capabilities in iPhone 6 and up, were a guy to integrate into their databases (fat chance), parse their catalogs and compare users' iCloud photos with video/images captured via surveillance, you'd be on to something.

I mean... who's to say we haven't all already agreed to this in Apple's 56-page TOS agreement we're so quick to scroll through and click 'Agree'?


Matt -

I thought about the social media integration angle. Most pics on Facebook/etc. tend to be high resolution, they could make a good source library for input images. 

However, there can be challenges for an algorithm like this that is trying to "paint", not trying to "match". You could end up with it creating an output image that is the compilation of facial features of several people. For the purposes of creating a "painting", the image would be highly representative of the reality (a person with a round face, short hair and small nose, or a person with a square face, long hair and big ears), but it might not accurate in terms of identifying a specific person.

If you are using a smaller dataset, say just the employees from a building, or just the students on a campus, you will likely have outputs that better match the specific individual than if you use a dataset of "all people".

Part of these problems they do attempt to tackle, it comes down to figuring out what parameters can be "normalized" and which ones make up unique qualities. In the whitepaper they use an example of sampling cars, if you average the paint colors of all cars together you would wind up with a very drab "average" color that would likely not be applicable to any vehicle. But you could average minor differences among reds or blues or greys to produce output images that were still representative.

Taken to an extreme, this technology could be useful to create a next generation smart CODEC. If you can compress the background of a scene down to "there is a red truck, a blue roadster and a brown delivery vehicle" that could be less data to store than recording the actual pixels of those objects. When playing back video, the algorithm would create representations of those objects on the fly. Or you could store a few reference frames of the vehicles, along with movement paths, and then dynamically draw the objects in along the way, used learned data about appearances to create frames that realistically represent how the vehicles would look as they moved or turned, even if you did not have that actual data recorded.

Further, if the recreation can be done fast enough, and with an "average" processor, it could be done client-side, allowing the server to record and stream very low bitrate data, while still allowing a client to piece together a more detailed image.

I do not think we will see applications of this in the near-term for surveillance, but opening up this level of image processing may lead to some amazing capabilities.

Don't rule it out so quickly. Maybe not in terms of enhancement. But be on the lookout for SM integration with surveillance, AI & analytics in the very near future. 

Do you think it will be possible to pass the footage from any camera to this algorithm? Because if that is the case you could use post analytics on all your cameras on site and cross compare the "guesses" from each camera to get real close to having the actual plate. 



Do you think it will be possible to pass the footage from any camera to this algorithm?

In theory, yes, the question is at what computational cost. The whitepaper references using 8 GPU's to handle the input from a series of static images. I did not see an exact number of the quantity of input images, but even 5 cameras running at 5 fps would generate 25 images per second, potentially requiring a LOT of GPU horsepower to process.

From what I have seen, and conversations with have had with other image-processing experts, I think that we are at the stage in technology where things are very possible when it comes to advanced image processing, but still very expensive in terms of compute requirements (or even expensive in terms of hardware cost). We will likely need another couple of generations of GPU advancements before this becomes the kind of thing that is just part of an average to high end security system.



This is interesting  ...

For the most part we think in term of local processing power for whatever we do... local processing power seems to be becoming less relevant as the cloud is becoming more pervasive. If this is important a search could be performed by a swarm of servers in the cloud. The results could be later be spit to a lowly smartphone if needs be ...

Speculation? Perhaps. I believe we're getting there. 


Yes to all but.. the way that DHS and other federal agencies are throwing $$ around it might come faster than estimated. Who but big brother has pockets deep enough to get this done and that in itself, at least for me, is the scary part. When evidence can be "created" anyone can be made guilty.

This is real scary stuff.

A captured image has to be proven to be unaltered for it to be acceptable as legal evidence yet we should trust google to decide what the final extrapolated image really is based on what  computers says it should be... if they can do this then they can also make that image appear to be anything they want it to be and it does not meet the evidence requirement.. changing the evidenciary requirement would be a huge can of worms and dangerous to those google or the gov doesnt like. Think of the movie "Enemy of the State" or similar.. No Thanks Google.. design better cameras and lenses.

I don't know that Google's software is really changing anything here.

Whether you use a completely automated algorithm (from Google or from anyone else), a semi-automated algorithm (in which a person clicks a button to apply an enhancement but doesn't necessarily know what the button does), or a completely manual alteration, you still have the legal need to show the original image and show how you arrived at the modified image.

The only problem would occur if Google (or anyone else) was able to create an altered image that could NOT be identified as altered.

I am not an image expert, but it appears that this will become more and more challenging.

...if they can do this then they can also make that image appear to be anything they want it to be and it does not meet the evidence requirement...

Neither does a sketch from a police artist.  Even so, both may be useful in identification.


Interesting discussion but I am skeptical.  More research and especially better training needs to be done.  Now I do love technology and I especially like AI neural networks and cognitive processing...

The easiest part for the algorithm to process should be those parts of the image with the greatest contrast.  For example the dark pupil and iris set against each other and the iris set against the white portions of the eye should be much more accurate.  When I look at the middle image, the algorithm did not get the direction of the eyes correct and when I look at the bottom image, the make up the woman is wearing appears to fool the algorithm again.  In looking at the contrast between the red lips and white teeth in the top and bottom pictures, this seems to be a little better.  The shapes of the lips and mouth are more consistent between the algorithm and actual image.  What I expect to see in all three pictures (or at lease the top two) is better definition between the cheeks the dark background beyond.  I do not see this, especially in the top image where the algorithms missed the jaw line shape badly.  Missing this badly changes the perception of the face substantially....  The top image looks like two different people with the actual image giving the impression of being much younger in age that the algorithm guess.  All three people in the pictures appear to be looking in different directions in the algorithm than in the actual images although as mentioned, the lower image is much less so that the others...  I would like to know more about how this algorithm works, I can guess but....

Another issue is that the results of the algorithm can be greatly effected by the initial scanning resolution of the picture (as opposed to the initial resolution of a low resolution camera).  If color assignment is done inaccurately during the quantization (A to D conversion) process, it seems to me that would throw the algorithm off as well.  

With experience, i know that most FR algorithms face difficulty in skin tones as well as with age (difficulty with younger aged individuals datasets), it would certainly be interesting to run this Google Super Resolution algorithm against faces of African origin and analyze the results, as per Tim's comment, the results would probably not be very enouraging. Having said that, this is still a very interesting and postitive research in FR and in the right direction, I can see a lot of potential but just like everyone else has said, will take a while to get there.

To quote from the conclusion:

As in many image transformation tasks, the central problem of super resolution is in hallucinating sharp details by
choosing a mode of the output distribution.
It is unlikely that hallucinations would be accepted as evidence!
These systems (both this approach and previous attempts) ingest a database of hi-resolution reference images which are then used to generate hi-resolution images "matching" low resolution input. For example it might be that the system has been fed a khollection of Khardasian images (all varieties of that species).
Now suppose such systems are presented with a low-res image that is consistent with Khardasian-ness:
  • In previous approaches, the likely result is a sort of average of the various Khardasians, not matching any one in particular (a Khardasian Khimera)
  • The novelty of this approach is that it will make a definitive choice to match one of the people in its database, rather than a blend / average (so it will be Kim or Kourtney or Khloe and will not be a Khimeric blend of the three). It does not seem to be highly probable that it will pick the correct Khardasian, however: it will make a definitive choice, but perhaps the wrong choice.

Hence there is no evidentiary application, but there may well be intelligence or investigation applications, since a trivial modification could output all plausible candidates for a given low-res image. For example if an investigator somehow knows that the people in an area at a given time are members of a small set of known individuals he may be able to use this to guide further investigation (via different modalities) as to which of that small set is likely in a given low-res image.


Login to read this IPVM report.
Why do I need to log in?
IPVM conducts unique testing and research funded by member's payments enabling us to offer the most independent, accurate and in-depth information.

Related Reports

Leica Launches LIDAR / Thermal / IP Camera on Mar 04, 2020
Swiss manufacturer Leica is launching what it calls a "real-time reality capture device" fusing LIDAR, thermal cameras, IP cameras and more. In...
Masks Cause Major Facial Recognition Problems on Feb 24, 2020
Coronavirus is spurring an increase in the use of medical masks, which new IPVM test results show cause major problems for facial recognition...
AI/Smart Camera Tutorial on Feb 20, 2020
Cameras with video analytics, sometimes called 'Smart' camera or 'AI' cameras, etc. are one of the most promising growth areas of video...
IPVM Opens 12,000 Sqft Testing Facility on Dec 16, 2019
IPVM is proud to announce the opening of the world's first video surveillance testing facility that will allow us to significantly expand our...
AI Video Surveillance (Finally) Goes Mainstream In 2020 on Sep 03, 2019
While video surveillance analytics has been promoted, hyped and lamented for nearly 20 years, next year, 2020, will be the year that it finally...
UK Facial Recognition Essex Errors Report on Jul 05, 2019
Facial recognition trials in the UK have generated significant controversy and debate over the past few years. This week, it flared again when Sky...
Carnegie Mellon AI Startup Zensors Profile on Jun 11, 2019
Zensors is a startup formed by Carnegie Mellon graduates from a Carnegie Mellon research project, offering customized models per camera that they...
Covert Facial Recognition Using Axis and Amazon By NYTimes on May 20, 2019
What if you took a 33MP Axis camera covering one of the busiest parks in the US and ran Amazon Facial Recognition against it? That is what the...
Bosch AI Camera Trainer Released And Tested on Apr 09, 2019
Bosch is releasing a highly unusual new AI feature - 'Camera Trainer'. Now, coming as a standard feature in Bosch IVA/EVA analytics, one can train...
AI Video Tester Released on Apr 02, 2019
IPVM has released the world's first AI video tester that lets you see how various AI models (including from Amazon, Google, Microsoft and YOLO)...

Most Recent Industry Reports

FLIR New Coronavirus Prioritized Temperature Screening Camera Examined on Apr 03, 2020
FLIR has announced a new series of thermal cameras "prioritized for entities working to mitigate the spread of COVID-19 virus", the A400/A700...
ADI Branch Burglary on Apr 03, 2020
A security systems distributor branch is an odd target for burglary but that happened this week at ADI's Memphis location. Vehicle Smash &...
Hikvision And Dahua Now Blocked From Conforming ONVIF Products on Apr 03, 2020
Dahua and Hikvision, sanctioned for human rights abuses, are now blocked from submitting products for ONVIF conformance, a blow to the mega China...
YCombinator AI Startup Visual One Tested on Apr 02, 2020
Startup Visual One, backed by Silicon Valley's powerful Y Combinator, aims to be "Your 24/7 Watchman" with advanced analytics and object...
Free IPVM Memberships For The Unemployed on Apr 02, 2020
IPVM is giving 3-month free memberships (regular price $99) for the unemployed, no questions asked. To get it, just contact us, your request...
Dahua Faked Coronavirus Camera Marketing on Apr 01, 2020
Dahua has conducted a coronavirus camera global marketing campaign centered around a faked detection. Now, Dahua has expanded this to the USA,...
Video Surveillance Trends 101 on Apr 01, 2020
This report examines major industry factors and how they could impact video surveillance in the next 5 - 10 years. This is part of our Video...
USA's Seek Scan Thermal Temperature System Examined on Apr 01, 2020
This US company, Seek, located down the road from FLIR and founded by former FLIR employees is offering a thermal temperature system for the...
Terrible Convergint Coronavirus Thermal Camera Recommendation on Apr 01, 2020
A week after Convergint disclosed falling revenue, pay and job cuts, Convergint is touting 'extensive research' that is either grossly incompetent...
The IPVM New Products Online Show April 2020 Opens With 40+ Manufacturers on Mar 31, 2020
IPVM is excited to announce the first New Products Online show, with 40+ manufacturers, to be held April 14 to the 16th, free to IPVM members,...