Hikvision's ImageNet Win Analyzed

Published Nov 28, 2016 18:30 PM
PUBLIC - This article does not require an IPVM subscription. Feel free to share.

Hikvision is pushing hard to move up market and win at video analytics.

One notable sign is Hikvision touting their #1 ranking for an ImageNet challenge.

We hired a computer vision academic, expert at ImageNet, to review the results and assess Hikvision's performance. Inside this note, we explain how ImageNet works, how Hikvision did, the role of GPUs, and what this means for Hikvision's pursuit of the video analytics market.

Executive Summary

Key findings based on the academic's review:

  • The key differentiator for Hikvision was their superior access to GPUs, the critical differentiator in such competitions.
  • Hikvision's algorithms / technical approach was solid and credible but within ordinary computer vision academic techniques.
  • Hikvision's win is an area with lower practical application to video surveillance while Hikvision did not enter the one challenge that actually used video.
  • Overall, this was a solid marketing move for Hikvision but the results, by themselves, do not prove much about their technological capabilities.

#1 In Scene Classification

Hikvision received the top score in the scene classification challenge. The most practical application is Google style image search, i.e., enter in a term like 'gift shop' or 'rainforest' and get images of those scenes.

The challenge used millions of images from the Places2 Database, examples of which are shown below:

IPVM Image

Hikvision's score of .0901 in scene classification roughly translates to being 90% accurate in matching an image in the first 5 results. This could signal potential future applications in video surveillance searching but generally in video surveillance the scene is unchanging for a given fixed camera so detecting scenes is less practically useful.

#2 In Object Detection

Hikvision also came in second for object detection across "200 basic-level categories" with a score of 0.65. The expert explained that meant an average 65% probability of detecting the object correctly (CMU was first at 66%). However, those scores per object vary significantly as some objects are very hard for any system to accurately detect (e.g., knives and guns given their small size and hard surfaces) while others are much easier like gorillas (with larger size and more distinct texture).

No Entry For Video Test

Of the 5 challenges, only 1 of them used video clips, the others were just images. Unfortunately, Hikvision did not enter the 'object detection from video' challenge. The expert noted that most organizations limit entries due to resource constraints. However, the video one was certainly the most relevant and also contained harder / lower resolution clips.

GPUs Key

The expert emphasized that GPU availability was the key driver. For example, they mentioned that graduate students compete vigorously for access to more GPUs. The more GPUs they have, the more training that can be done, the quicker the training can be finished, allowing for even greater retraining.

Related, Hikvision has touted its access to Nvidia GPUs and for being the first in China and one of the first globally to obtain Nvidia's DGX-1, which Nvidia markets as '250 servers in a box'. For this test, Hikvision mentions using 'newly-built M40-equipped GPU clusters'. By contrast, the expert noted that even the top research universities participating in such challenges face limitations on accessing GPUs.

More broadly, the expert noted that the main difference in performance between 'poor' entrants and 'top' entrants are driven by GPU availability.

Positives Hikvision Approach

At the same time, the expert emphasized that Hikvision's description (found in the results) showed a credible, modern approach to tuning and training their network. Also, here is Hikvision's PowerPoint deck on the ImageNet results.

Questionable Applicability to Video Surveillance Networks

The expert emphasized that all participants work strenuously to optimize their results to the test data sets since the goal is to win the competition. The expert felt that this lead to broad 'over-optimizing' which could lead to significantly poorer results in production.

As for video surveillance, the expert mentioned both difficulties and simplifications relative to ImageNet. On the positive side, with video surveillance, assumptions can be made about the size and orientation of objects (like people and cars) that would make detection and categorization easier. On the negative, the Imagenet images were far higher resolution, better dynamic range and without the low light problems or IR wash out common in video surveillance. Moreover, there are no constraints on pricing / resources dedicated like they would be in a commercial video surveillance deployment.

Good Brand Building / Some Experience

For a company that is most known internationally for low cost cameras and basic recorders, being ranked among and, in some cases above, top research universities and global companies is definitively good for enhancing their brand. Moreover, the experiences gained from doing the challenges will certainly help towards developing production analytics.

Not Direct But Could Be Signal

On the other hand, the tests have limited direct applicability to production video surveillance, given their much more ideal images and unconstrained GPU usage. However, Hikvision certainly has the resources and organization plus the stated desire to dominate video analytics so their progress should be watched closely.

Comments (4)
U
Undisclosed #1
Nov 28, 2016
IPVMU Certified

Good Analysis.

Do you think that Dahua's recent LFW win, is more pertinent then to surveillance video? Maybe your ImageNet expert is qualified to assess that test as well?

Dahua Technology Sets New Record For LFW Facial Recognition

(1)
JH
John Honovich
Nov 28, 2016
IPVM

I think it's less pertinent because (1) Hikvision has made much clearer direction of releasing intelligent video globally and (2) intelligent video has much greater overall demand than facial recognition.

In terms of an expert, we would want a different expert who focuses on faces in the wild as the two tests / fields are in the same overall space but different specializations.

Quite bluntly, Dahua struggles to sell $80 HD cameras so I am not bullish on Dahua's chances of delivering a production ready ground breaking facial surveillance system. Hikvision is much more organized so along with 1 and 2 above makes it more of a focus for coverage.

(1)
UM
Undisclosed Manufacturer #2
Nov 29, 2016

Amended:

Hikvision received the top score in the scene classification challenge. The most practical application is Google style image search, i.e., enter in a term like 'Chinese Government Target #1' or 'Chinese Government Target #2' and get images of those scenes.

(1)
(3)
Avatar
Christian Laforte
Jan 04, 2017

Very interesting article! Hiring a computer vision academic for this article was a brilliant idea.

(1)