Hikvision Nvidia Supercomputing Partnership

Author: John Honovich, Published on Jul 18, 2016

Hikvision is gearing up its supercomputing efforts.

Partnering with Nvidia, one of the US' largest tech companies, doing $5 billion revenue annually, Hikvision is getting the first of Nvidia's new supercomputers inside China.

In this report, we examine what Hikvision is doing, what Nvidia is offering and what this means for the future of video surveillance.

********* ** ******* ** *** ************** *******.

********** **********, *** ** *** **' ******* **** *********, ***** $* billion ******* ********, ********* ** ******* *** ***** ** ******'* new ************** ****** *****.

** **** ******, ** ******* **** ********* ** *****, **** Nvidia ** ******** *** **** **** ***** *** *** ****** of ***** ************.

[***************]

Supercomputing *********** 

************* ** ******* *****'* *********** ***-***** ****** ******* ** ********** '*** ******* ** * ***' (though **** ** ***** ** *** **** ****** ********** ** time ** ************).

*** **** **** ** *** ****** ***** ********* / ******** vision, ***** *** *** **** ******** *** ******, ******** ** Nvidia's ******** **********. **** *****, **** ******'* ***, ******** *** DGX-1 *****:

******: ******'* *** ******** ***** *********** **** ********* **** **** new ***** **** ***** ***** **********:

Hikvision ***** ***** ********* *** ******

***** ********* ******* ** *** **** ** *** ** *** end ********* *** ***** *** ****, ****** ** ***** **** are ******* ** **** ***********, ********* * *** ****** ** building*****'* ******(******: ********) **** ************ ****** (*** ** ** ******** **** the**********'* ******).

*** ***-* ** ** ********* ** *********'* ************ **** ******. During ******'* *** **********,*********'* ** ** *&* *** ************** ********** *************** ** **** ********* ** *****. *************** ********* ***** *****:

**** ********* ************ ********* * ********* ***** ****** ************ *********** *****'*(****** *** ** **** ***** ******* ** *** ****), *****'* ******* on-board ***** ******** * ********* **** ** **** *********** ***** analytics **** ******** ************ ******* / *********.

*** *** ***-*, ********* *** *** ************ ******** *** ***-********* ************ **** ******** *********'* *********** *************. ******* ***********, *** ***-* **** not ** ********* ********* ***** *** ********* ** ****, ***** Hikvision ** **** ** ******.

********* **** **** ******** ** **** **** *** ***** **** this, *.*.:

******* ******* ** *** **** ****** ***** ****** *********** ** performing *** **** ****** **** / ******:

*** ******* *** * ************* ** *** **** **** ******* plates ********, **** ***** ***** ******** / ***** ** ******** to ** ******* ****** **** *** *******:

*******, ********* ****** * ******** **** ******* **** ******* ******** talking ***** *** ****** *********** ********** **** ****** ******** *** found ** ******* ***** **** ********** *** *** ****, ******* her '** * *******' ** ***** ****** ******. *** ******* also ***** ***** *** *** ****** *** **** **** ** helping ** ********* * *********:

Costs ***** ****** / ***

**** **** ********* *** **** **** **** ***** **** / overhead. ********* **** "******* ********* ** ************ ***** ******* ** Flops/sec" ***** ** ** ******** ****** *** *********** *** (*.*., a ******* ***** ** ** *** ***** ** ******** ** gigaflops, * ******** ** **** ********* ***** ** ** ****). By ********, *** *** ***-* ** ********* ** *** *********/*, though ***** *** ***** ****** ** ** $***,*** ***** *** also *** ****** **** *****-***** ***** ********.

*******

***** *********'* '********' **-**-****** ********, ** ** *** ***** **** is ****** ** **** ** ****** ** *** ******* ***** for ***** ** **** *** *********'* ********* *********** ** ******** computer ****** *** ** * **** ****** ** ***** ******* to ************* **** **** *** ******'* ** ***** *** ******* the ******* ********** ****** ******** ******* ** *** **********.

Comments (37)

...the new DGX-1 is specified at 170 Teraflops/s, though still not cheap coming in at $129,000 price and also not coming with video decoding.

The nvidia computer doesn't come with a video card? In case you want to use an AMD?

I added a note to qualify it as 'large scale'. I am not sure what video card it comes or does not but it does not appear to be built for handling decoding of video like the Tegra does.

Wondering how they got around the supposed U.S. supercomputer export restrictions you always hear about.

Soon they might be restricting exports to us though:

I guess this means we will be seeing a Hikvision clone of the Nvidia box in a 6 months or so.

By contrast, the new DGX-1 is specified at 170 Teraflops/s, though still not cheap coming in at $129,000 price and also not coming with video decoding.

My understanding is, yes a DXG-1 probably could be used for video decoding, because GPUs are rather generic, but you probably wouldn't use it for that. Their purpose is to save time training neural networks, so the rest of us have access to hardware that means waiting 3 weeks for a network to train, those with a DXG-1 may only have to wait only 1 day (for example). I think it is meant for accelerating the R&D side of deep learning. At a price of 129,000 USD, you'd want to keep it running 24/7 doing the most processor intensive work in order to pay for itself, and you wouldn't bog it down with decoding video if there was another way.

I am also thinking, if it was a western company, e.g. a major VMS or camera manufacturer, if they purchased a DXG-1, then they'd probably keep it quiet.

I am also thinking, if it was a western company, e.g. a major VMS or camera manufacturer, if they purchased a DXG-1, then they'd probably keep it quiet.

Agreed. In fairness, I am pretty sure Hikvision wants to keep this quiet as well. The announcement was not from Hikvision's website (not even the Chinese only version). However, it is in their government parent company though.

And Nvidia declined comment.

Humm... soon Person of Interest show will not be science fiction anymore...

I don't know about that :-), but I think the main outcome of this type of technology, at least in the immediate future, are better tools for searching archived video in VMS/NVRs.

But how will you add this in to VMS / NVRs without a lot more cost / new equipment? Since it requires a lot more processing power, this does not fit into the typical model of either open VMS on COTS Intel servers or low cost low power NVRs. Yes/no?

My understanding is that it requires a huge amount of processing power to train a neural network, but not as much to apply an input and get an output once it is trained. Now, there are always going to be exotic solutions that train and update themselves in real time, but lets keep things simple. The deep learning frameworks out there seem to automatically scale with hardware, so the more GPUs you have installed on the machine, the better they perform, and if there are no GPUs then they just use the CPUs, in which case it is a lot slower. At what point the lack of processing power causes everything to become so slow as to be useless, I am not sure. But maybe that means you can only process one channel and every second I-frame - still useful in some circumstances. Keep in mind GPUs are still getting faster and cheaper, and their power requirements are decreasing.

My understanding is that it requires a huge amount of processing power to train a neural network, but not as much to apply an input and get an output once it is trained.

Hikvision VP says 2TFlops for surveillance video:

I am sure it takes far more for initial training and it must vary depending on what type of DNN one has developed but, from everything I have seen so far, it is still quite high compared to traditional analytics.

Hikvision VP says 2TFlops for surveillance video

But to put that into perspective, Nvidia's GTX 1080 cards are around 9 Teraflops, and they are meant for gamers. The Tesla M4 server GPU is around 2.2 teraflops.

NVIDIA Tesla M4

"7x more power-efficient processing than CPUs for deep learning at 20 images/sec/watt"

So it does look to me that the potential to improve VMS analytics are at least in the realm of possibility. If it proves too expensive to add a GPU to every VMS server, or too processor intensive to analyse more than a single camera stream (say) , then perhaps there is still the potential to upload archived data from the VMS and do searches on the client. That could still be useful in some situations.

As for another comparison, the Tegra's being used by Hikvision have a max of ~1 teraflops.

I do not know why Hikvision choose the Tegras vs the GTZ 1080 but that's surely do to my lacking of understanding on details.

So I agree with you that the trends (mid to long term) are favorable but right it looks like adding hundreds of dollars (minimally) in pure hardware costs to do this.

Tegras are GPUs for mobile devices aren't they? Therefore lower power than a desktop GPU like the GTX 1080 and better suited for server hardware. GTX 1080s are power hungry desktop GPUs and I wouldn't put one in a VMS server. I was thinking however, that a traditional gaming/desktop GPU could still be useful if you do offline analysis on the client machine. e.g. upload 10 hours of video from the VMS server to the client for 2 cameras after an incident, and do a search for all people in green shirts. Wouldn't be as useful as realtime analysis on every channel, but still useful and not really any additional cost.

Yes, this is very interesting scenario. When you have thousands of cameras, it is may be too expensive to analyze them all realtime. But when you are investigating some case, you know what cameras and what periods you want to analyze, so the possibility of VMS to transfer video from recording server and generate metadata during this process is very unique and valuable, I believe.

There is an efficient way of using neural network in real time. To analyze not the raw high-res stream 25fps, but only captured objects. I believe, that this technology will help to dramatically reduce false alerts. After getting an object from conventional motion detector, we can try to classify it by neural network. So, it is one piece of frame in some seconds...not 25 full frames in one.

The biggest problem here is learning. Not only processor power consumption but also the process. We've tested Alexnet and it doesn't work well for our applications. 1000 classes of objects is too much for our case. And may be quality of training photos was too good in compare with typical surveillance pictures. Training the network is biggest challenge here.

That is what I understand also, the image is broken down into bounding boxes and ideally each bounding box contains only one object to be categorized by the deep learning network (DLN) - but there could be several per image. Presumably therefore, the load on the DLN would depend on what the camera is viewing. A camera on a busy city street with lots of people continuously walking, would probably have a high processor load. The opposite would be detecting the presence of a car in a car-park overnight that would normally be empty - perhaps in this case the server may not need a GPU at all... ?

This is an interesing photo I made during IFSEC:

HikVision 1U server with 8 GPUs on board!

Interesting. The question I ask is, are these off-the-shelf Nvidia GPU boards that I am seeing here? ... or is Hikvision taking the Tegra processor directly and building their own circuit boards around it.

U2 - That's a 1RU rack case with 16x Tegra chips running. That's what's under all those heat sinks. It's a really innovative way to put over ~14 Teraflops of processing power on a very low power consumption appliance: ~300w.

The picture is displaying sideways. Download it and rotate it clockwise 90° and it gives better context.

Rack in 40 of these with fast switches laced throughout the rack and you've got a half petaflop of distributable neural net trained AI-driven video analytics for a couple hundred thousand bucks and the power and cooling bills of a studio apartment.

Stuff like this is going to make for an interesting new generation of surveillance video.

Some more info here, from NVIDIA GTC China

AI City, 1 Billion cameras by 2020

From the video:

"with the enormous volume of data they (Hikvision) have collected..."

It is occurring to me that we might have been missing the point slightly when talking about hardware. The main thing you need to develop a deep learning product like what we have been discussing here, is not really the hardware, but the enormous amount of data you need to train the DLNs. This must surely be one area where Hikvision's relationship with the Chinese government gives them a significant advantage. In contrast, most VMS/camera companies, especially in the west, have no access to the data their products record, as this is obviously not owned by them.

2, thanks for sharing, very interesting video.

I've embedded the relevant section below:

To your point:

The main thing you need to develop a deep learning product like what we have been discussing here, is not really the hardware, but the enormous amount of data you need to train the DLNs.

I do agree that the data is critical but even once you've trained the DLN, it still takes a ton of processing to do.

That video confirms you need 2 TX1s per video channel and ~8 per 1RU box consuming 300 watts.

I certainly believe that is far less than traditional approaches but with TX1 pricing ~$500 per unit and the cost of the box and power, it is still a lot of money per channel.

At what resolution and frame rate is each stream? 1080, 25fps? What say you just apply it to a second stream at roughly 1/3 the width and height and only 5 fps. Right there you have reduced the hardware requirements to (1/3) * (1/3) * (1/5) =1/45. Presumably there would be a compromise in quality but I bet you could still get good results.

Even if you do that, you still need a tonne of data. If you want 60,000 images of people as seen through the perspective of a security camera, where do you get it? You can use generic images of people, cars etc. from the internet, but ideally you'd use images taken from security cameras themselves, which tend to be higher up looking down, and, well everything just looks different through a security camera... Where can you get data that from?

Anywhere on the internet that contains 1000s of hours of stock video surveillance footage, from 1000s of different cameras, indoors and outdoors?

At what resolution and frame rate is each stream? 1080, 25fps? What say you just apply it to a second stream at roughly 1/3 the width and height and only 5 fps. Right there you have reduced the hardware requirements to (1/3) * (1/3) * (1/5) =1/45. Presumably there would be a compromise in quality but I bet you could still get good results.

That's just speculation.

(1) the slide only lists resolution. You entered in frame rate.

(2) since this is a sales presentation, it is more reasonable to assume they are using optimistic numbers. You are assuming they are overstated by a factor of 45x. If they thought they could do it 1/45 the combined pixel count and resolution, they likely would have used that to make the channel count per TX1 / server far higher.

My point is still valid, which is you can always step down the resolution and frame rate, and therefore HW requirements, and still get useful results. Just assume they are talking 5 fps, then I would say step down to 1 fps and my maths would still work out the same. As Murat said above, even more optimization if you combine it with VMD.

Here at the office I have four outdoor cameras, All I want to know is if an image contains a person or a vehicle. Most of the time nothing is happening, and probably would only needs to process an image every 1 minute on average if combined with VMD. I wouldn't even need a GPU for that. But I would still need lots of data to train the network, that I find is the stumbling block.

No doubt, to get the full benefits of DL, e.g. recognizing a fight breaking out on a street, then you probably do need the sort of hardware they are talking about, and even more data...

Just assume they are talking 5 fps, then I would say step down to 1 fps and my maths would still work out the same.

But as you know and acknowledge 1fps would significantly reduce the performance.

Basically, you are assuming that Nvidia's marketing people are hurting their marketing claims by using unrealistically high stream requirements. I am simply saying that is an imprudent assumption to make.

Basically, you are assuming that Nvidia's marketing people are hurting their marketing claims by using unrealistically high stream requirements. I am simply saying that is an imprudent assumption to make.

Not really, they would know that developers are aware there are many different ways DL can be applied to video analysis, or indeed any task. Simple analysis to detect if 1 in 10 images contains a vehicle or person (my example above), is not the same thing as "recognise that someone has had an accident, recognise that a pet or child has been lost..." . The first is easy (at least conceptually) but I wouldn't have a clue how to do the second.

The pascal architecture (nVidia 10xx) is too new for them to have used, but the mobile chips are now available so in a moore's law like fashion they just got a massive boost in capability at the same or lower price & power.

If you build a 'server farm' that puts the processing architecture in the cloud and makes it a demand based service available to any VMS installation and we start to have a model that could be practical. No hardware to buy and you only pay for what you use, but when you use it, you have more capability available then would be practical to build into a box.

If I were HIK, that's what I'd do, then I'd build the capability in my VMS client to interact with it. So customers can do things like search video, set critieria and push video at that service.

But hey, that's just me.

If you build a 'server farm' that puts the processing architecture in the cloud and makes it a demand based service available to any VMS installation and we start to have a model that could be practical.

If I were HIK, that's what I'd do

So customers have to send their video to a cloud system run by the Chinese government?

I understand what you mean from a technical perspective, just strikes me as debatable from a security perspective for many customers.

And if someone in the U.S. licensed the technology, ran it in the U.S. and guaranteed that the data never left american soil (digital soil that is..) and that HIK or the Chinese government never had access?

So customers have to send their video to a cloud system run by the Chinese government?

ezviz cloud?

Probably comes down to cost. I don't know much about what's out there regarding online analytics services, but I know that the Microsoft one is pretty impressive:

Microsoft Cognitive API (be sure to upload your images to their demo)

Problem is the price, they charge something like $1.50 per 1000 images. So for a large city site of 1000 cameras at 1 fps, you'd be paying $1.50 per second! It adds up very quick. So I assume that the Hikvision product exists because in many cases it is cheaper to buy your own 'server farm'.

It certainly would if you were fully utilising it, in particular if you were using it for real time analysis. But, if the use case is forensic/event based and you only need the capability occasionally, then the scales would start to tip in the on-demand direction.

I like the business model though, make it demand based pricing, and easily accessible through the tools a customer uses all day...

The 2016 results for "ImageNet" (one of the most important computer vision competitions) came out fairly recently; HikVision entered for the first time, and did well:

  • http://image-net.org/challenges/LSVRC/2016/index

My (unaffiliated) summary of HikVision's results is:

  • Second in “Object Localisation” - detect, classify (from 1000 defined categories) and find bounding boxes for the five most "prominent" objects in each of a collection of photographs. I.e. what's the five most important things in the scene.
  • Second in “Object Detection” - detect, classify (from 200 defined categories) and place bounding boxes around all of the objects (in the 200 categories) in each of a collection of photographs. Useful for search and perhaps alarms.
  • Won “Scene Classification” - this is assigning the scene viewed by the camera to one of 365 categories (city street, shopping mall etc), so the camera can know which type of scene it's “looking” at and can apply appropriate analytics accordingly.
  • Middle ranking results in “Scene Parsing” - here the task is to label each pixel in an image with the category of object (from 150 object categories) which it is part of. This is (perhaps) the most difficult challenge but also has the most useful applications: it support all the applications of the others and then some. Note that this task is new in 2016 in contrast to the others.

These are all on stills (photographs); ImageNet also has an “Object Detection in Video” task which is similar to “Object Detection” but uses 30 categories (simpler). HikVision do not seem to have entered this.

Many teams approach ImageNet as a numbers game: lots of powerful GPUs, lots of PhD students and/or postdocs trying all sorts of variations in a (one hopes) intelligent fashion. The description on HikVision's submission seems to indicate this strategy. Many of the entries involve "ensembles": a collection of networks processing the same image in parallel then taking the average (or some other combination) of the results. This improves the results by a few percent but increases the (test time) processing cost (and the training cost) by the number of networks in the ensemble.

Hik's (affiliated) summary of the competition's results here.

Login to read this IPVM report.
Why do I need to log in?
IPVM conducts unique testing and research funded by member's payments enabling us to offer the most independent, accurate and in-depth information.

Related Reports

SIA Plays Dumb On OEMs And Hikua Ban on Sep 20, 2018
OEMs widely pretend to be 'manufacturers', deceiving their customers and putting them at risk for cybersecurity attacks and, soon, violation of US...
Avigilon Announces AI-Powered H5 Camera Development on Sep 19, 2018
Avigilon will be showcasing "next-generation AI" at next week's ASIS GSX. In an atypical move, the company is not actually releasing these...
25% China Tariffs Finalized For 2019, 10% Start Now, Includes Select Video Surveillance on Sep 18, 2018
A surprise move: In July, when the most recent tariff round was first announced, the tariffs were only scheduled for 10%. However, now, the US...
Chinese Government Praises Hikvision For Following Xi Jinping on Sep 17, 2018
The Chinese government council responsible for managing China's state-owned companies praised Hikvision’s obedience to China’s authoritarian leader...
VMS Export Shootout - Avigilon, Dahua, Exacq, Genetec, Hikvision, Milestone on Sep 13, 2018
When crimes, accidents or problems occur, exporting video from one's video surveillance system is critical to proving incidents. But who does it...
Australia and French National TV Investigate Hikvision, Australia Military To Remove Hikvision Cameras on Sep 12, 2018
An Australian National TV investigation on Chinese video surveillance has put a spotlight on Hikvision, including a promise from Australia's...
US DARPA Investing $2 Billion In AI on Sep 11, 2018
The US Defense Advanced Research Projects Agency (DARPA) is granting more than $2 Billion to companies developing new AI technologies. The money...
Trump Administration Considers Sanctions Against Dahua and Hikvision on Sep 11, 2018
The Trump administration is considering sanctions against Dahua and Hikvision for their involvement in human rights abuses against minorities...
Ambarella on Computer Vision and US Hikua Ban on Sep 10, 2018
Ambarella, a widely-used video surveillance component supplier, is betting on the rise of computer vision and is already seeing a sales impact from...
China "Largest Threat To US National Security", Declares FBI And Counterintelligence Heads on Sep 07, 2018
China is 'bar none', the 'largest threat to [US] national security' plus China has declared 'economic war' on the US, according to William Evanina,...

Most Recent Industry Reports

BluePoint Aims To Bring Life-Safety Mind-Set To Police Pull Stations on Sep 20, 2018
Fire alarm pull stations are commonplace but police ones are not. A self-funded startup, BluePoint Alert Solutions is aiming to make police pull...
SIA Plays Dumb On OEMs And Hikua Ban on Sep 20, 2018
OEMs widely pretend to be 'manufacturers', deceiving their customers and putting them at risk for cybersecurity attacks and, soon, violation of US...
Axis Vs. Hikvision IR PTZ Shootout on Sep 20, 2018
Hikvision has their high-end dual-sensor DarkfighterX. Axis has their high-end concealed IR Q6125-LE. Which is better? We bought both and tested...
Avigilon Announces AI-Powered H5 Camera Development on Sep 19, 2018
Avigilon will be showcasing "next-generation AI" at next week's ASIS GSX. In an atypical move, the company is not actually releasing these...
Favorite Request-to-Exit (RTE) Manufacturers 2018 on Sep 19, 2018
Request To Exit devices like motion sensors and lock releasing push-buttons are a part of almost every access install, but who makes the equipment...
25% China Tariffs Finalized For 2019, 10% Start Now, Includes Select Video Surveillance on Sep 18, 2018
A surprise move: In July, when the most recent tariff round was first announced, the tariffs were only scheduled for 10%. However, now, the US...
Central Stations Face Off Against NFPA On Fire Monitoring on Sep 18, 2018
Central stations are facing off against the NFPA over what they call anti-competitive language in NFPA 72, the standard that covers fire alarms....
Hikvision USA Starts Layoffs on Sep 18, 2018
Hikvision USA has started layoffs, just weeks after the US government ban was passed into law. Inside this note, we examine: The important...
Chinese Government Praises Hikvision For Following Xi Jinping on Sep 17, 2018
The Chinese government council responsible for managing China's state-owned companies praised Hikvision’s obedience to China’s authoritarian leader...

The world's leading video surveillance information source, IPVM provides the best reporting, testing and training for 10,000+ members globally. Dedicated to independent and objective information, we uniquely refuse any and all advertisements, sponsorship and consulting from manufacturers.

About | FAQ | Contact