Hikvision Nvidia Supercomputing Partnership

Author: John Honovich, Published on Jul 18, 2016

Hikvision is gearing up its supercomputing efforts.

Partnering with Nvidia, one of the US' largest tech companies, doing $5 billion revenue annually, Hikvision is getting the first of Nvidia's new supercomputers inside China.

In this report, we examine what Hikvision is doing, what Nvidia is offering and what this means for the future of video surveillance.

********* ** ******* ** *** ************** *******.

********** **********, *** ** *** **' ******* **** *********, ***** $* billion ******* ********, ********* ** ******* *** ***** ** ******'* new ************** ****** *****.

** **** ******, ** ******* **** ********* ** *****, **** Nvidia ** ******** *** **** **** ***** *** *** ****** of ***** ************.

[***************]

Supercomputing ***********

************* ** ******* *****'* *********** ***-***** ****** ******* ** ********** '*** ******* ** * ***' (though **** ** ***** ** *** **** ****** ********** ** time ** ************).

*** **** **** ** *** ****** ***** ********* / ******** vision, ***** *** *** **** ******** *** ******, ******** ** Nvidia's ******** **********. **** *****, **** ******'* ***, ******** *** DGX-1 *****:

******: ******'* *** ******** ***** *********** **** ********* **** **** new ***** **** ***** ***** **********:

Hikvision ***** ***** ********* *** ******

***** ********* ******* ** *** **** ** *** ** *** end ********* *** ***** *** ****, ****** ** ***** **** are ******* ** **** ***********, ********* * *** ****** ** building*****'* ******(******: ********) **** ************ ****** (*** ** ** ******** **** the**********'* ******).

*** ***-* ** ** ********* ** *********'* ************ **** ******. During ******'* *** **********,*********'* ** ** *&* *** ************** ********** *************** ** **** ********* ** *****. *************** ********* ***** *****:

**** ********* ************ ********* * ********* ***** ****** ************ *********** *****'*(****** *** ** **** ***** ******* ** *** ****), *****'* include **-***** ***** ******** * ********* **** ** **** *********** video ********* **** ******** ************ ******* / *********.

*** *** ***-*, ********* *** *** ************ ******** *** ***-********* **************** ******** *********'* *********** *************. ******* ***********, *** ***-* **** not ** ********* ********* ***** *** ********* ** ****, ***** Hikvision ** **** ** ******.

********* **** **** ******** ** **** **** *** ***** **** this, *.*.:

******* ******* ** *** **** ****** ***** ****** *********** ** performing *** **** ****** **** / ******:

*** ******* *** * ************* ** *** **** **** ******* plates ********, **** ***** ***** ******** / ***** ** ******** to ** ******* ****** **** *** *******:

*******, ********* ****** * ******** **** ******* **** ******* ******** talking ***** *** ****** *********** ********** **** ****** ******** *** found ** ******* ***** **** ********** *** *** ****, ******* her '** * *******' ** ***** ****** ******. *** ******* also ***** ***** *** *** ****** *** **** **** ** helping ** ********* * *********:

Costs ***** ****** / ***

**** **** ********* *** **** **** **** ***** **** / overhead. ********* **** "******* ********* ** ************ ***** ******* ** Flops/sec" ***** ** ** ******** ****** *** *********** *** (*.*., a ******* ***** ** ** *** ***** ** ******** ** gigaflops, * ******** ** **** ********* ***** ** ** ****). By ********, *** *** ***-* ** ********* ** *** *********/*, though ***** *** ***** ****** ** ** $***,*** ***** *** also *** ****** **** *****-***** ***** ********.

*******

***** *********'* '********' **-**-****** ********, ** ** *** ***** **** is ****** ** **** ** ****** ** *** ******* ***** for ***** ** **** *** *********'* ********* *********** ** ******** computer ****** *** ** * **** ****** ** ***** ******* to ************* **** **** *** ******'* ** ***** *** ******* the ******* ********** ****** ******** ******* ** *** **********.

Comments (37)

...the new DGX-1 is specified at 170 Teraflops/s, though still not cheap coming in at $129,000 price and also not coming with video decoding.

The nvidia computer doesn't come with a video card? In case you want to use an AMD?

I added a note to qualify it as 'large scale'. I am not sure what video card it comes or does not but it does not appear to be built for handling decoding of video like the Tegra does.

Wondering how they got around the supposed U.S. supercomputer export restrictions you always hear about.

Soon they might be restricting exports to us though:

I guess this means we will be seeing a Hikvision clone of the Nvidia box in a 6 months or so.

By contrast, the new DGX-1 is specified at 170 Teraflops/s, though still not cheap coming in at $129,000 price and also not coming with video decoding.

My understanding is, yes a DXG-1 probably could be used for video decoding, because GPUs are rather generic, but you probably wouldn't use it for that. Their purpose is to save time training neural networks, so the rest of us have access to hardware that means waiting 3 weeks for a network to train, those with a DXG-1 may only have to wait only 1 day (for example). I think it is meant for accelerating the R&D side of deep learning. At a price of 129,000 USD, you'd want to keep it running 24/7 doing the most processor intensive work in order to pay for itself, and you wouldn't bog it down with decoding video if there was another way.

I am also thinking, if it was a western company, e.g. a major VMS or camera manufacturer, if they purchased a DXG-1, then they'd probably keep it quiet.

I am also thinking, if it was a western company, e.g. a major VMS or camera manufacturer, if they purchased a DXG-1, then they'd probably keep it quiet.

Agreed. In fairness, I am pretty sure Hikvision wants to keep this quiet as well. The announcement was not from Hikvision's website (not even the Chinese only version). However, it is in their government parent company though.

And Nvidia declined comment.

Humm... soon Person of Interest show will not be science fiction anymore...

I don't know about that :-), but I think the main outcome of this type of technology, at least in the immediate future, are better tools for searching archived video in VMS/NVRs.

But how will you add this in to VMS / NVRs without a lot more cost / new equipment? Since it requires a lot more processing power, this does not fit into the typical model of either open VMS on COTS Intel servers or low cost low power NVRs. Yes/no?

My understanding is that it requires a huge amount of processing power to train a neural network, but not as much to apply an input and get an output once it is trained. Now, there are always going to be exotic solutions that train and update themselves in real time, but lets keep things simple. The deep learning frameworks out there seem to automatically scale with hardware, so the more GPUs you have installed on the machine, the better they perform, and if there are no GPUs then they just use the CPUs, in which case it is a lot slower. At what point the lack of processing power causes everything to become so slow as to be useless, I am not sure. But maybe that means you can only process one channel and every second I-frame - still useful in some circumstances. Keep in mind GPUs are still getting faster and cheaper, and their power requirements are decreasing.

My understanding is that it requires a huge amount of processing power to train a neural network, but not as much to apply an input and get an output once it is trained.

Hikvision VP says 2TFlops for surveillance video:

I am sure it takes far more for initial training and it must vary depending on what type of DNN one has developed but, from everything I have seen so far, it is still quite high compared to traditional analytics.

Hikvision VP says 2TFlops for surveillance video

But to put that into perspective, Nvidia's GTX 1080 cards are around 9 Teraflops, and they are meant for gamers. The Tesla M4 server GPU is around 2.2 teraflops.

NVIDIA Tesla M4

"7x more power-efficient processing than CPUs for deep learning at 20 images/sec/watt"

So it does look to me that the potential to improve VMS analytics are at least in the realm of possibility. If it proves too expensive to add a GPU to every VMS server, or too processor intensive to analyse more than a single camera stream (say) , then perhaps there is still the potential to upload archived data from the VMS and do searches on the client. That could still be useful in some situations.

As for another comparison, the Tegra's being used by Hikvision have a max of ~1 teraflops.

I do not know why Hikvision choose the Tegras vs the GTZ 1080 but that's surely do to my lacking of understanding on details.

So I agree with you that the trends (mid to long term) are favorable but right it looks like adding hundreds of dollars (minimally) in pure hardware costs to do this.

Tegras are GPUs for mobile devices aren't they? Therefore lower power than a desktop GPU like the GTX 1080 and better suited for server hardware. GTX 1080s are power hungry desktop GPUs and I wouldn't put one in a VMS server. I was thinking however, that a traditional gaming/desktop GPU could still be useful if you do offline analysis on the client machine. e.g. upload 10 hours of video from the VMS server to the client for 2 cameras after an incident, and do a search for all people in green shirts. Wouldn't be as useful as realtime analysis on every channel, but still useful and not really any additional cost.

Yes, this is very interesting scenario. When you have thousands of cameras, it is may be too expensive to analyze them all realtime. But when you are investigating some case, you know what cameras and what periods you want to analyze, so the possibility of VMS to transfer video from recording server and generate metadata during this process is very unique and valuable, I believe.

There is an efficient way of using neural network in real time. To analyze not the raw high-res stream 25fps, but only captured objects. I believe, that this technology will help to dramatically reduce false alerts. After getting an object from conventional motion detector, we can try to classify it by neural network. So, it is one piece of frame in some seconds...not 25 full frames in one.

The biggest problem here is learning. Not only processor power consumption but also the process. We've tested Alexnet and it doesn't work well for our applications. 1000 classes of objects is too much for our case. And may be quality of training photos was too good in compare with typical surveillance pictures. Training the network is biggest challenge here.

That is what I understand also, the image is broken down into bounding boxes and ideally each bounding box contains only one object to be categorized by the deep learning network (DLN) - but there could be several per image. Presumably therefore, the load on the DLN would depend on what the camera is viewing. A camera on a busy city street with lots of people continuously walking, would probably have a high processor load. The opposite would be detecting the presence of a car in a car-park overnight that would normally be empty - perhaps in this case the server may not need a GPU at all... ?

This is an interesing photo I made during IFSEC:

HikVision 1U server with 8 GPUs on board!

Interesting. The question I ask is, are these off-the-shelf Nvidia GPU boards that I am seeing here? ... or is Hikvision taking the Tegra processor directly and building their own circuit boards around it.

U2 - That's a 1RU rack case with 16x Tegra chips running. That's what's under all those heat sinks. It's a really innovative way to put over ~14 Teraflops of processing power on a very low power consumption appliance: ~300w.

The picture is displaying sideways. Download it and rotate it clockwise 90° and it gives better context.

Rack in 40 of these with fast switches laced throughout the rack and you've got a half petaflop of distributable neural net trained AI-driven video analytics for a couple hundred thousand bucks and the power and cooling bills of a studio apartment.

Stuff like this is going to make for an interesting new generation of surveillance video.

Some more info here, from NVIDIA GTC China

AI City, 1 Billion cameras by 2020

From the video:

"with the enormous volume of data they (Hikvision) have collected..."

It is occurring to me that we might have been missing the point slightly when talking about hardware. The main thing you need to develop a deep learning product like what we have been discussing here, is not really the hardware, but the enormous amount of data you need to train the DLNs. This must surely be one area where Hikvision's relationship with the Chinese government gives them a significant advantage. In contrast, most VMS/camera companies, especially in the west, have no access to the data their products record, as this is obviously not owned by them.

2, thanks for sharing, very interesting video.

I've embedded the relevant section below:

To your point:

The main thing you need to develop a deep learning product like what we have been discussing here, is not really the hardware, but the enormous amount of data you need to train the DLNs.

I do agree that the data is critical but even once you've trained the DLN, it still takes a ton of processing to do.

That video confirms you need 2 TX1s per video channel and ~8 per 1RU box consuming 300 watts.

I certainly believe that is far less than traditional approaches but with TX1 pricing ~$500 per unit and the cost of the box and power, it is still a lot of money per channel.

At what resolution and frame rate is each stream? 1080, 25fps? What say you just apply it to a second stream at roughly 1/3 the width and height and only 5 fps. Right there you have reduced the hardware requirements to (1/3) * (1/3) * (1/5) =1/45. Presumably there would be a compromise in quality but I bet you could still get good results.

Even if you do that, you still need a tonne of data. If you want 60,000 images of people as seen through the perspective of a security camera, where do you get it? You can use generic images of people, cars etc. from the internet, but ideally you'd use images taken from security cameras themselves, which tend to be higher up looking down, and, well everything just looks different through a security camera... Where can you get data that from?

Anywhere on the internet that contains 1000s of hours of stock video surveillance footage, from 1000s of different cameras, indoors and outdoors?

At what resolution and frame rate is each stream? 1080, 25fps? What say you just apply it to a second stream at roughly 1/3 the width and height and only 5 fps. Right there you have reduced the hardware requirements to (1/3) * (1/3) * (1/5) =1/45. Presumably there would be a compromise in quality but I bet you could still get good results.

That's just speculation.

(1) the slide only lists resolution. You entered in frame rate.

(2) since this is a sales presentation, it is more reasonable to assume they are using optimistic numbers. You are assuming they are overstated by a factor of 45x. If they thought they could do it 1/45 the combined pixel count and resolution, they likely would have used that to make the channel count per TX1 / server far higher.

My point is still valid, which is you can always step down the resolution and frame rate, and therefore HW requirements, and still get useful results. Just assume they are talking 5 fps, then I would say step down to 1 fps and my maths would still work out the same. As Murat said above, even more optimization if you combine it with VMD.

Here at the office I have four outdoor cameras, All I want to know is if an image contains a person or a vehicle. Most of the time nothing is happening, and probably would only needs to process an image every 1 minute on average if combined with VMD. I wouldn't even need a GPU for that. But I would still need lots of data to train the network, that I find is the stumbling block.

No doubt, to get the full benefits of DL, e.g. recognizing a fight breaking out on a street, then you probably do need the sort of hardware they are talking about, and even more data...

Just assume they are talking 5 fps, then I would say step down to 1 fps and my maths would still work out the same.

But as you know and acknowledge 1fps would significantly reduce the performance.

Basically, you are assuming that Nvidia's marketing people are hurting their marketing claims by using unrealistically high stream requirements. I am simply saying that is an imprudent assumption to make.

Basically, you are assuming that Nvidia's marketing people are hurting their marketing claims by using unrealistically high stream requirements. I am simply saying that is an imprudent assumption to make.

Not really, they would know that developers are aware there are many different ways DL can be applied to video analysis, or indeed any task. Simple analysis to detect if 1 in 10 images contains a vehicle or person (my example above), is not the same thing as "recognise that someone has had an accident, recognise that a pet or child has been lost..." . The first is easy (at least conceptually) but I wouldn't have a clue how to do the second.

The pascal architecture (nVidia 10xx) is too new for them to have used, but the mobile chips are now available so in a moore's law like fashion they just got a massive boost in capability at the same or lower price & power.

If you build a 'server farm' that puts the processing architecture in the cloud and makes it a demand based service available to any VMS installation and we start to have a model that could be practical. No hardware to buy and you only pay for what you use, but when you use it, you have more capability available then would be practical to build into a box.

If I were HIK, that's what I'd do, then I'd build the capability in my VMS client to interact with it. So customers can do things like search video, set critieria and push video at that service.

But hey, that's just me.

If you build a 'server farm' that puts the processing architecture in the cloud and makes it a demand based service available to any VMS installation and we start to have a model that could be practical.

If I were HIK, that's what I'd do

So customers have to send their video to a cloud system run by the Chinese government?

I understand what you mean from a technical perspective, just strikes me as debatable from a security perspective for many customers.

And if someone in the U.S. licensed the technology, ran it in the U.S. and guaranteed that the data never left american soil (digital soil that is..) and that HIK or the Chinese government never had access?

So customers have to send their video to a cloud system run by the Chinese government?

ezviz cloud?

Probably comes down to cost. I don't know much about what's out there regarding online analytics services, but I know that the Microsoft one is pretty impressive:

Microsoft Cognitive API (be sure to upload your images to their demo)

Problem is the price, they charge something like $1.50 per 1000 images. So for a large city site of 1000 cameras at 1 fps, you'd be paying $1.50 per second! It adds up very quick. So I assume that the Hikvision product exists because in many cases it is cheaper to buy your own 'server farm'.

It certainly would if you were fully utilising it, in particular if you were using it for real time analysis. But, if the use case is forensic/event based and you only need the capability occasionally, then the scales would start to tip in the on-demand direction.

I like the business model though, make it demand based pricing, and easily accessible through the tools a customer uses all day...

The 2016 results for "ImageNet" (one of the most important computer vision competitions) came out fairly recently; HikVision entered for the first time, and did well:

  • http://image-net.org/challenges/LSVRC/2016/index

My (unaffiliated) summary of HikVision's results is:

  • Second in “Object Localisation” - detect, classify (from 1000 defined categories) and find bounding boxes for the five most "prominent" objects in each of a collection of photographs. I.e. what's the five most important things in the scene.
  • Second in “Object Detection” - detect, classify (from 200 defined categories) and place bounding boxes around all of the objects (in the 200 categories) in each of a collection of photographs. Useful for search and perhaps alarms.
  • Won “Scene Classification” - this is assigning the scene viewed by the camera to one of 365 categories (city street, shopping mall etc), so the camera can know which type of scene it's “looking” at and can apply appropriate analytics accordingly.
  • Middle ranking results in “Scene Parsing” - here the task is to label each pixel in an image with the category of object (from 150 object categories) which it is part of. This is (perhaps) the most difficult challenge but also has the most useful applications: it support all the applications of the others and then some. Note that this task is new in 2016 in contrast to the others.

These are all on stills (photographs); ImageNet also has an “Object Detection in Video” task which is similar to “Object Detection” but uses 30 categories (simpler). HikVision do not seem to have entered this.

Many teams approach ImageNet as a numbers game: lots of powerful GPUs, lots of PhD students and/or postdocs trying all sorts of variations in a (one hopes) intelligent fashion. The description on HikVision's submission seems to indicate this strategy. Many of the entries involve "ensembles": a collection of networks processing the same image in parallel then taking the average (or some other combination) of the results. This improves the results by a few percent but increases the (test time) processing cost (and the training cost) by the number of networks in the ensemble.

Hik's (affiliated) summary of the competition's results here.

Login to read this IPVM report.
Why do I need to log in?
IPVM conducts unique testing and research funded by member's payments enabling us to offer the most independent, accurate and in-depth information.

Related Reports

Infinova's Xinjiang Business Examined on Dec 07, 2018
As pressure mounts for companies to stop doing business in China’s Xinjiang region amid a severe human rights crisis, IPVM has found Infinova sold...
Dahua Car Startup Raises $290 Million But Questions Abound on Dec 03, 2018
Dahua’s electronic car startup LeapMotor raised $290 million in funding this year, it said in an announcement. However, this news raises questions...
Alarm.com "AI" Video Analytics Tested on Nov 30, 2018
Alarm.com has announced what it calls an "artificial intelligence (AI) architecture and video analytics service", touting that  Alarm.com's...
Evidence of Dahua's Involvement In Xinjiang Surveillance on Nov 28, 2018
IPVM adds new details about Dahua’s activities in Xinjiang, a Chinese region where mass surveillance used to facilitate grave human rights...
Vintra "AI-Powered" Video Analytics Startup Profile on Nov 27, 2018
Vintra is a Silicon Valley startup focused on AI-based video analytics. They had booths at IACP and ISC West demonstrating their hosted or...
Genetec Kiwi Intrusion Detector Analytics Tested on Nov 27, 2018
Genetec has built Kiwi Security's Intrusion Detection analytics into Security Center, aiming to simplify deployment compared to separate camera...
Top Manufacturers Gaining and Losing 2018 on Nov 26, 2018
This is the 5th year IPVM has tracked manufacturers gaining and losing: Top Manufacturers Gaining and Losing 2014 Top Manufacturers Gaining and...
Intel Neural Compute Stick 2 / Movidius AI Test on Nov 21, 2018
AI is a major trend in video surveillance with manufacturers paying significant attention to Intel's Movidius Myriad chips. Indeed, Avigilon has...
Pressure Mounts Against Dahua and Hikvision Xinjiang Business on Nov 19, 2018
Pressure is mounting against Hikvision, Dahua, and other companies operating in Xinjiang as an international outcry brews against the Chinese...
Arcules Cloud VMS Tested on Nov 19, 2018
Arcules is a big bet, or as they describe themselves a 'bold company', spun out and backed by Milestone and Canon.  But how good is Arcules cloud...

Most Recent Industry Reports

The 2019 Video Surveillance Industry Guide on Dec 10, 2018
The 300 page, 2019 Video Surveillance Industry Guide, covers the key events and the future of the video surveillance market, is now available,...
Multi-Factor Access Control Authentication Guide on Dec 10, 2018
Can a stranger use your credentials? One of the oldest problems facing access control is making credentials as easy to use as keys, but restricting...
Top 2019 Trend - AI Video Analytics on Dec 10, 2018
160+ Integrators answered: What do you think the top industry trend will be in 2019? Why? AI / video analytics was the run-away winner with...
AV Tech Company Profile on Dec 07, 2018
Taiwanese manufacturer AV Tech's revenue declined ~70% since 2012. Planning a comeback, AV Tech spoke to IPVM about their opportunities and...
Ubiquiti $79 Flex IP Camera Tested on Dec 07, 2018
U.S. Manufacturer Ubiquiti has released a 1080p, integrated IR IP camera, selling it directly for $79, making this one of the least expensive IP...
Infinova's Xinjiang Business Examined on Dec 07, 2018
As pressure mounts for companies to stop doing business in China’s Xinjiang region amid a severe human rights crisis, IPVM has found Infinova sold...
Akuvox Intercom Profile on Dec 06, 2018
Akuvox, a Chinese manufacturer of VoIP products, is expanding heavily into Video Intercom products with disruptive pricing targeted for commercial...
Sublethal Camera Gun Examined on Dec 06, 2018
Sublethal is a South African company that manufactures a remotely-controlled, camera-enabled gun called the Boomslang, which is Afrikaans for tree...
UK ICO Denies IPVM GDPR Complaint Against IFSEC, Decides Each Exhibitor Responsible on Dec 06, 2018
The UK Information Commissioner's Office (ICO) has denied IPVM's complaint against IFSEC for misuse of facial recognition. Each Exhibitor...
VMS Live Monitoring Shootout - Avigilon, Dahua, Exacq, Genetec, Hikvision, Milestone, Network Optix on Dec 05, 2018
Viewing live video is the first interaction and most common task most users have with a VMS. Who does it best and worst? Who offers the most...

The world's leading video surveillance information source, IPVM provides the best reporting, testing and training for 10,000+ members globally. Dedicated to independent and objective information, we uniquely refuse any and all advertisements, sponsorship and consulting from manufacturers.

About | FAQ | Contact