Google’s Super Resolution Enhancement Examined

By: Brian Karas, Published on Feb 10, 2017

"Zoom in and enhance, I think there is a clear shot of his face in the reflection" has been the kind of statement CCTV users have wanted to make reality for years.

Hollywood has imagined this technology, but now engineers at Google have released a whitepaper on "Pixel Recursive Super Resolution", showing real-world examples of upsampling a highly pixelated image into one with details and recognizable features.

In this report we examine Google's super resolution whitepaper, and how this technology could impact the security industry.

"**** ** *** *******, I ***** ***** ** a ***** **** ** his **** ** *** reflection" *** **** *** kind ** ********* **** users **** ****** ** make ******* *** *****.

********* *** ******** **** technology, *** *** ********* at ****** **** ******** a********** ** "***** ********* Super **********", ******* ****-***** ******** of ********** * ****** pixelated ***** **** *** with ******* *** ************ features.

** **** ****** ** examine ******'* ***** ********** whitepaper, *** *** **** technology ***** ****** *** security ********.

[***************]

Recursive ***** ********** ********

*** ********** ********* ** algorithm **** *** **** low ********** ***** ******, and ************ *********** ** output **** ********* ******/**********. When ******** ** *** actual ***** *** ***-********** input *** ******* ****, the *********** ****** **** many ************, ****** ** that * ***** ******** could ****** ********* *** person, ***** **** ***** not **** **** **** to **** *** ***-*** image.

** ******* ** *** downsamples ****** ***** **** the *********, *** ******, and *** ****** ****** image *** ***** *****:

*** ******* ** ********* as ***** ******* ** an ****** ******** * painting ** * **** from * ******, ** from * **********, ***** the ****** **** ******* to *** ******** ***** on ** ************* ** how *** ******* ****** be ******* **** ***** observations ** *** *******.

***** *******/****** *** ** reconstructed, ** ** *** limited **** ** *****, the ********** **** ***** examples ** ****** ** bedrooms **** **** ************* from ******* ***-*** ***** images.

Requires ***** ***** ******/********

** ***** ** ******* accurate ******, *** ********* needs ** ** ******* beforehand ** * ****** of ****** ********** ******. If *** * ***-*** image **** **** *** correlate ** ******/******* ** has ********** ******* *** learned ****, *** ******* will ****** ** ****** inaccurate.

** ********** *** ***** on *** **** *** initial training ******* *****, ****** the ********** **** ***** that * **** **** used ** *** ******** process, ********** **** ** is **** ******** *********.

Limitations ** ***** ********** ****** *** ********

******* *** ****** ****** are *********** ********-********* *******, the ****** ***** ****** be ************ *** ******** in ********. ******* ** police ********, * ****-***** application ** **** ********** in ******** ***** **** narrow **** * ******, but ***** ****** *** be ****** ** ******** in *** **** *** a **** ****-********** ********* would **.

Compared ** ****** ***********

****** **** ********* ****** images ** *** ******** set, *** **** ** not ******** ** ** a ****** *********** *******, and *** ************ ********* include ********* ***-*** ****** to **** **** **** photo-realistic ** ****** ****** based ** ******** ******** from ******* ******. *** algorithm ** *** ********** to *** ******* ******* two ******* ********** ******, it ** ********** ** create ******* ***** **** exist ** ** ***** image.

Applications ** ******** ********

**** ********** ** ******** to ***** *********-***** "**** in *** *******" ************* to *** ****** ******* of **** *****. *** reliance ** ***** ******** would **** ** ******** the ********* *** ********, as ********* *** ***** looking *** ******* ** persons ** ******* **** have *** ****** **** seen ** *** ******* previously. ***** ***** ** applications ** *** ****-*** images ** ******* **** one ****** ** *********** identify * ****** ** object ** *** ********** of *** ***** **** another ******, ****** *** overhead ** ** ************ processing ****** *** ******** the ********* ***** ****** make **** *********** ***** the ******** ************ *** significantly *******.

 

 

Comments (21)

Just wondering how this might benefit LPR tech...even if it gave me a better guess to work off of, it could be very beneficial to me/the police.

This is somewhat different than LPR tech, though a lot of the core learning frameworks are the same.

In LPR, you are trying to find an exact match for a set of characters, this makes parts of the problem easier because you are generally limited to a very small set of possibilities (A-Z, 0-9, etc.). Also, if you can figure out (or guess) the state you may be able to limit it further if the state uses particular sequence formats.

For this Google application, they are really attempting to "paint" an image that looks convincing to a human. It is less about precise accuracy, and more about filling in details that make the objects recognizable. So, for this reason, it would not be ideal for LPR, as the algorithm might be able to take a blob of pixels and turn it into an image that looks very much like what you would expect a license plate to look like, it would probably be less concerned with differentiating between an "O" and a "Q" (for example). Or, the training data could cause it make other errors- if none of the input plates had "Q"s, but lots of them had "O"s, it might take a pixel blob of a plate that was "QQQ" and paint it as "OOO". (this is my interpretation from reading the whitepaper, there might be more to it).

 

Countpoint on your last bit --

It is very simple to write a quick filtering algorithm that would replace any 0, O, or Q with a wildcard that would search across all 3. So, if you're telling me that it could take a blob of pixels and potentially output to something that would give me a range of possible numbers, that would be immensely helpful in a real, serious investigation. If you're able to hand the police a list of even as few as 500 license plate numbers for them to search across, and something like the color of the car, they can narrow a list down incredibly quickly.

The ability to create a reasonable guess out of something totally unrecognizable would be game-changing for both the security and investigations industry and law enforcement and police work overall. The ability of law enforcement to take one tiny piece of evidence -- even an educated guess -- and start to narrow the field is extremely impressive. And, to your point, you could very easily input millions of pieces of reference data to compare against, so it could definitely "learn" from a database of plate photos from every state, etc.

I said it in another thread awhile back, but by far the most interesting thing happening in the electronic security industry right now is the utilization of "big data" or data analytics to provide either relevant historical or actionable information to end-users. There are some exciting things happening in that regard already -- I think the next few years is going to see an explosion of it, specifically as the 1 or 2 companies that are currently doing it start to grow it more and more.

Very interesting... but nuts to court-admissibility.

Cool stuff, but sounds like a court admissibility nightmare...

How often does video evidence get used in trial, I wonder?

As opposed to forcing a plea...

Also, if the video evidence can just identify the perps to police, then a case can often be made without video evidence.

 

Although a good defense attorney would ask, "Why did you focus on my client in the first place? Lead me through the steps you took, please."

Why did you focus on my client in the first place? Lead me through the steps you took, please.

"We took the grainy, poor quality video evidence and ran it thru the Google enhancer.  This enhancer uses various assumptions about various facial characteristics to construct an image which may or may not help in identifying the subject.  In this case, the artificially enhanced image prompted several employees to suggest that the recreation looked like the suspect.   The suspect was investigated and their alibi checked.  The suspect was also found to be in possession of the stolen merchandise."

Considering Apple's facial recognition capabilities in iPhone 6 and up, were a guy to integrate into their databases (fat chance), parse their catalogs and compare users' iCloud photos with video/images captured via surveillance, you'd be on to something.

I mean... who's to say we haven't all already agreed to this in Apple's 56-page TOS agreement we're so quick to scroll through and click 'Agree'?

 

Matt -

I thought about the social media integration angle. Most pics on Facebook/etc. tend to be high resolution, they could make a good source library for input images. 

However, there can be challenges for an algorithm like this that is trying to "paint", not trying to "match". You could end up with it creating an output image that is the compilation of facial features of several people. For the purposes of creating a "painting", the image would be highly representative of the reality (a person with a round face, short hair and small nose, or a person with a square face, long hair and big ears), but it might not accurate in terms of identifying a specific person.

If you are using a smaller dataset, say just the employees from a building, or just the students on a campus, you will likely have outputs that better match the specific individual than if you use a dataset of "all people".

Part of these problems they do attempt to tackle, it comes down to figuring out what parameters can be "normalized" and which ones make up unique qualities. In the whitepaper they use an example of sampling cars, if you average the paint colors of all cars together you would wind up with a very drab "average" color that would likely not be applicable to any vehicle. But you could average minor differences among reds or blues or greys to produce output images that were still representative.

Taken to an extreme, this technology could be useful to create a next generation smart CODEC. If you can compress the background of a scene down to "there is a red truck, a blue roadster and a brown delivery vehicle" that could be less data to store than recording the actual pixels of those objects. When playing back video, the algorithm would create representations of those objects on the fly. Or you could store a few reference frames of the vehicles, along with movement paths, and then dynamically draw the objects in along the way, used learned data about appearances to create frames that realistically represent how the vehicles would look as they moved or turned, even if you did not have that actual data recorded.

Further, if the recreation can be done fast enough, and with an "average" processor, it could be done client-side, allowing the server to record and stream very low bitrate data, while still allowing a client to piece together a more detailed image.

I do not think we will see applications of this in the near-term for surveillance, but opening up this level of image processing may lead to some amazing capabilities.

Don't rule it out so quickly. Maybe not in terms of enhancement. But be on the lookout for SM integration with surveillance, AI & analytics in the very near future. 

Do you think it will be possible to pass the footage from any camera to this algorithm? Because if that is the case you could use post analytics on all your cameras on site and cross compare the "guesses" from each camera to get real close to having the actual plate. 

 

 

Do you think it will be possible to pass the footage from any camera to this algorithm?

In theory, yes, the question is at what computational cost. The whitepaper references using 8 GPU's to handle the input from a series of static images. I did not see an exact number of the quantity of input images, but even 5 cameras running at 5 fps would generate 25 images per second, potentially requiring a LOT of GPU horsepower to process.

From what I have seen, and conversations with have had with other image-processing experts, I think that we are at the stage in technology where things are very possible when it comes to advanced image processing, but still very expensive in terms of compute requirements (or even expensive in terms of hardware cost). We will likely need another couple of generations of GPU advancements before this becomes the kind of thing that is just part of an average to high end security system.

 

Hi

This is interesting  ...

For the most part we think in term of local processing power for whatever we do... local processing power seems to be becoming less relevant as the cloud is becoming more pervasive. If this is important a search could be performed by a swarm of servers in the cloud. The results could be later be spit to a lowly smartphone if needs be ...

Speculation? Perhaps. I believe we're getting there. 

 

Yes to all but.. the way that DHS and other federal agencies are throwing $$ around it might come faster than estimated. Who but big brother has pockets deep enough to get this done and that in itself, at least for me, is the scary part. When evidence can be "created" anyone can be made guilty.

This is real scary stuff.

A captured image has to be proven to be unaltered for it to be acceptable as legal evidence yet we should trust google to decide what the final extrapolated image really is based on what  computers says it should be... if they can do this then they can also make that image appear to be anything they want it to be and it does not meet the evidence requirement.. changing the evidenciary requirement would be a huge can of worms and dangerous to those google or the gov doesnt like. Think of the movie "Enemy of the State" or similar.. No Thanks Google.. design better cameras and lenses.

I don't know that Google's software is really changing anything here.

Whether you use a completely automated algorithm (from Google or from anyone else), a semi-automated algorithm (in which a person clicks a button to apply an enhancement but doesn't necessarily know what the button does), or a completely manual alteration, you still have the legal need to show the original image and show how you arrived at the modified image.

The only problem would occur if Google (or anyone else) was able to create an altered image that could NOT be identified as altered.

I am not an image expert, but it appears that this will become more and more challenging.

...if they can do this then they can also make that image appear to be anything they want it to be and it does not meet the evidence requirement...

Neither does a sketch from a police artist.  Even so, both may be useful in identification.

All

Interesting discussion but I am skeptical.  More research and especially better training needs to be done.  Now I do love technology and I especially like AI neural networks and cognitive processing...

The easiest part for the algorithm to process should be those parts of the image with the greatest contrast.  For example the dark pupil and iris set against each other and the iris set against the white portions of the eye should be much more accurate.  When I look at the middle image, the algorithm did not get the direction of the eyes correct and when I look at the bottom image, the make up the woman is wearing appears to fool the algorithm again.  In looking at the contrast between the red lips and white teeth in the top and bottom pictures, this seems to be a little better.  The shapes of the lips and mouth are more consistent between the algorithm and actual image.  What I expect to see in all three pictures (or at lease the top two) is better definition between the cheeks the dark background beyond.  I do not see this, especially in the top image where the algorithms missed the jaw line shape badly.  Missing this badly changes the perception of the face substantially....  The top image looks like two different people with the actual image giving the impression of being much younger in age that the algorithm guess.  All three people in the pictures appear to be looking in different directions in the algorithm than in the actual images although as mentioned, the lower image is much less so that the others...  I would like to know more about how this algorithm works, I can guess but....

Another issue is that the results of the algorithm can be greatly effected by the initial scanning resolution of the picture (as opposed to the initial resolution of a low resolution camera).  If color assignment is done inaccurately during the quantization (A to D conversion) process, it seems to me that would throw the algorithm off as well.  

With experience, i know that most FR algorithms face difficulty in skin tones as well as with age (difficulty with younger aged individuals datasets), it would certainly be interesting to run this Google Super Resolution algorithm against faces of African origin and analyze the results, as per Tim's comment, the results would probably not be very enouraging. Having said that, this is still a very interesting and postitive research in FR and in the right direction, I can see a lot of potential but just like everyone else has said, will take a while to get there.

To quote from the conclusion:

As in many image transformation tasks, the central problem of super resolution is in hallucinating sharp details by
choosing a mode of the output distribution.
 
It is unlikely that hallucinations would be accepted as evidence!
 
These systems (both this approach and previous attempts) ingest a database of hi-resolution reference images which are then used to generate hi-resolution images "matching" low resolution input. For example it might be that the system has been fed a khollection of Khardasian images (all varieties of that species).
 
Now suppose such systems are presented with a low-res image that is consistent with Khardasian-ness:
  • In previous approaches, the likely result is a sort of average of the various Khardasians, not matching any one in particular (a Khardasian Khimera)
  • The novelty of this approach is that it will make a definitive choice to match one of the people in its database, rather than a blend / average (so it will be Kim or Kourtney or Khloe and will not be a Khimeric blend of the three). It does not seem to be highly probable that it will pick the correct Khardasian, however: it will make a definitive choice, but perhaps the wrong choice.

Hence there is no evidentiary application, but there may well be intelligence or investigation applications, since a trivial modification could output all plausible candidates for a given low-res image. For example if an investigator somehow knows that the people in an area at a given time are members of a small set of known individuals he may be able to use this to guide further investigation (via different modalities) as to which of that small set is likely in a given low-res image.

 

 
 
 
Login to read this IPVM report.

Related Reports

Video Surveillance History on May 06, 2020
The video surveillance market has changed significantly since 2000, going...
Video Surveillance Cameras 101 on Feb 25, 2020
Cameras come in many shapes, sizes and specifications. This 101 examines the...
Leica Launches LIDAR / Thermal / IP Camera on Mar 04, 2020
Swiss manufacturer Leica is launching what it calls a "real-time reality...
New Axis M30 Cameras Tested on Mar 26, 2020
Axis has released a new generation of, for them, relatively low cost M30...
VMS 101 on Mar 03, 2020
This guide teaches the fundamentals about video management...
AndroVideo Presents Edge AI Face Recognition Cameras on Jun 26, 2020
AndroVideo presented its AI at the edge face recognition cameras at the May...
Density Presents Occupancy Monitoring For Coronavirus Protection on May 22, 2020
Density presented its cloud-based occupancy sensor to deal with Coronavirus...
Camio Presents Coronavirus Social Distancing Analytics on Apr 20, 2020
Camio presented its social distancing analytics for responding to coronavirus...
VSaaS 101 on Mar 25, 2020
Video Surveillance as a Service (VSaaS) is the common industry term for cloud...
YOLOv5 Released Amidst Controversy on Jul 27, 2020
YOLO has gained significant attention within video surveillance for its...
Hanwha Wisenet X Plus PTRZ Tested on Feb 14, 2020
Hanwha has released their PTRZ camera, the Wisenet X Plus XNV-6081Z, claiming...
Video Analytics 101 on Mar 16, 2020
This guide teaches the fundamentals of video surveillance...
Surveillance Storage 101 on Mar 23, 2020
This guide teaches the fundamentals of video surveillance...
Trueface Presents AI Face Recognition, Mask and Temperature Detection on Jun 10, 2020
Trueface presented its AI facial recognition, mask and temperature detection...
Video Surveillance 101 Book Released on Jul 07, 2020
IPVM's unique introduction to video surveillance series is now available as a...

Recent Reports

Huawei HiSilicon Shortage Impacts Surveillance Manufacturers on Aug 14, 2020
Huawei acknowledged problems and challenges for its HiSilicon chip business,...
Final Rule Does Not Expand Hikvision Dahua Blacklist on Aug 14, 2020
The final White House rule (200.216) has been added and contrary to the...
Taiwan Lilin NDAA Compliant Cameras Tested on Aug 13, 2020
Taiwan-based manufacturer Lilin is taking direct aim at Dahua and Hikvision...
White House Expands Dahua Hikvision Blacklist To Federal Funding [Final Rule Reverses] on Aug 13, 2020
The White House is expanding the NDAA to blacklist anyone who "uses" banned...
Actual Coronavirus Testing Options Examined on Aug 13, 2020
Fever cameras have emerged as an indirect and flawed way to test for...
Video Analytics Online Show September 2020 Opened - Axis, Avigilon, Bosch, BriefCam, Genetec, Milestone + 30 More on Aug 12, 2020
IPVM's sixth online show will feature 35+ Video Analytics companies...
The German Company Powering Many China Temperature Tablets (Heimann) on Aug 12, 2020
Many fever tablet suppliers market German-made Heimann thermal sensors while...
Salesforce Drops Dahua and Hikvision on Aug 12, 2020
Salesforce has dropped Dahua and Hikvision as customers, forcing the two mega...
Access Control Course Fall 2020 - Register Now on Aug 12, 2020
IPVM offers the most comprehensive access control course in the industry....
Genetec CEO Declares "We Don't Negotiate Payment With Patent Trolls" on Aug 11, 2020
Are patent trolls like terrorists? Genetec's CEO is coming out strongly...
Hanwha AI Analytics Camera Tested on Aug 11, 2020
Hanwha has released their Wisenet P AI camera, adding person and vehicle...
Alabama Schools Million Dollar Hikvision Fever Camera Deal on Aug 11, 2020
The Baldwin County, Alabama public schools purchased a $1 million, 144-camera...
Dahua Taunts Australian Government, Continues To Sell Illegal Fever Cameras on Aug 10, 2020
Dahua is effectively taunting the Australian government by continuing to sell...
HID Releases VertX Replacement Aero on Aug 10, 2020
HID is replacing two established and broadly supported types of access...
NDAA Compliant Video Surveillance Whitelist on Aug 10, 2020
This report aggregates video surveillance products that manufacturers have...