Natural Language Scene Processing: The Future Of Analytics?

Take a look at this new artificially intelligent natural language image processor from Google and Stanford:

It was only recently that computer systems became smart enough to identify unknown objects in photographs. Even then, it has generally been limited to individual objects. Now, two separate teams of researchers at Google and Stanford University have created software able to describe entire scenes. This could lead to much better and more intelligent algorithms in the future.

One example:

"A group of young people playing a game of frisbee."

In the near term, computer vision systems that can discern the story in a picture will enable people to search photo or video archives and find highly specific images. Eventually, these advances will lead to robotic systems able to navigate unknown situations. Driverless cars would also be made safer. However, it also raises the prospect of even greater levels of government surveillance.

What's the guess for near term? 1yr or 10?

I am curious how much hardware / processing power / time it takes to do this. I briefly scanned the Stanford paper but may have missed that.

My concern would be that this takes a massive amount of resources to do a single image....

Far closer to 10 than 1 IMO.

Qualcomm claims they can do it realtime:

Is Avigilon going to allow this type of high-level metadata creation to go on without even a challenge?

nice concept... as with anything new it will take some time for it to be effective and doable with a reasonable amount of resources (processing power, time, money, etc.)... many benefits (and possible setbacks) once it is ever available to video surveillance especially when monitoring large crowds...