Specifying An NVR - Do I Need More RAM And A Purpose Built Video Server?

I'm reaching out for guidance on architectural best practices when building NVRs. A little background: we are currently equipped with Dell 33 TB SATA RAID 5 / 4 GB RAM / Intel Xeon E5620 4 Core @ 2.4 GHz, and each NVR hosts between 30 to 60 IP cameras . The camera settings average about 7 FPS, 800 x 600, H.264, and a 768 bit rate. The vendor has moved away from requiring us to use their Dell NVRs, and they have released a hardware-agnostic version. We are beginning to build our own HP equivalents, and our enterprise standard for memory is 32 GB with 48 TB drives. One of the points of contention with the vendor is that they claim the amount of RAM on the box will not impact performance; however, we have crashed several of our recorders due to what appears to be memory issues. Secondly, I'm concerned that the Dell and/or our HP equivalent are not truly video-grade servers -- just standard application servers. Our server team did indicate that the vendor-provided Dells are nothing more than off-the-shelf machines which could be sufficient. I did find an article that references 6G SAS drives as a more appropriate hard drive choice for monitoring/recording/etc video; but I don't have much more than that in terms of best practices. Any assistance is greatly appreciated!


"One of the points of contention with the vendor is that they claim the amount of RAM on the box will not impact performance."

In general, our testing and experience concurs with that - RAM is not a primary factor of VMS performance (see: VMS Server Load Fundamentals Tested).

"however, we have crashed several of our recorders due to what appears to be memory issues."

Can you elaborate on this? You checked statistics and RAM was maxed out or? If so, was CPU also pegged as well or? And how much RAM were you using?

"I'm concerned that the Dell and/or our HP equivalent are not truly video-grade servers -- just standard application servers."

A lot of the surveillance storage vendors hype this and claim to have 'secret sauce' but most of us are skeptical of this and, yes, most surveillance vendors are just re-selling COTS machines from Dell, HP, Super Micro, etc.

Our earlier generation NVR's did fine with that number of cameras using SATA drives, Pentium Core Duo processors and 2GB of RAM on Windows XP. When we're just talking recording, you might be otherthinking it. You're not doing video "rendering", and unless you're doing transcoding (like for example converting MJPEG streams from the cameras into h264 on the server side) or higher end video analytics, it's not that resource intensive.

You should probably consult with your VMS provider what they recommend. Some VMS vendors will base performance on total pixels processed per second. So for instance, a VMS company may say they process 1000 megapixels per second recording, provided you meet a minimum CPU. So you take the total pixels of all the cameras multiplied by the framerate per second and see if they fall into that range.

Ie. 30 cameras X 2 megapixels per camera at 10 FPS = 600 megapixels per second.

You also have to get from them their estimated record rate in either megabytes or megabits per second. In the above example, that's around 115mb (megabits), or 14.5MB (megabytes) per second. So that's well under what a gigabit network, internal SATA storage or iSCSI storage system can handle, but is the VMS software capable of processing that rate in recordings?

If you consult with your VMS provider, not only do you get a better chance of properly estimating your requirements (and you can always go over a little for a margin of safety), you put the onus on them if something doesn't work right. If you "guesstimate" on your own, you leave yourself open to being accused of not following vendor specifications.

We have been developing this answer in my lab for several years now. I will mention a few things to be aware of without divulging our 'secret sauce' ;-)

The answer to this starts with the VMS itself. Different ones handle the data their own way....and will give you a different performance result. Those that have a two stage storage mechanism where the 1st stage is a high RPM drive (which acts as a cache) will be able to achive a overall higher throughput. Of course you can use high RPM drives for everything and get great storage performance, but that would not be cost effective.

The second influence is the combo of 'how many' cameras, their CODEC and the bitrate. This creates the overall payload the system must handle. Depending on where things like Motion Detection are performed (camera or server)...will affect the memory. As you get higher into the throughput numbers...the bottleneck changes from a CPU/Memory centric to a NIC/Storage centric one. The testing report mentioned by John is the low end of the throughput chain.

The usage of the NVR with respect to Clients must also be factored in. A system that is 'recording only' will have a much better performance than one where the Client function is running.

Let Task manager and Perfmon (assuming Windows) be your friends as you try to answer this for yourself.

Mike,

I'm glad you brought up PerfMon.... it is an excellent tool that I use in classes to demonstrate how easily observable changes occur (CPU/Memory/Disk/Network) when you change bit rate, compression, motion on server vs camera, frame rate, etc...

As a server dude, can you enlighten us on some of the obvious things (like CPU, memory) to look for when testing VMS's on a box, and even better, some of the not-so-obvious things that can be seen in PerfMon (threads, disk read/write, etc) that can be of benefit to system designers and/or troubleshooters to be aware of?

Marty,

Perfmon is needed to look at and 'record' a long run.... I like to grab many hours of runtime with the tool....overnight being the best. You can see odd spikes to remind you that you did not turn OFF things like disk defrag and search indexing.

Look at the NIC received and sent data streams separately from ONLY the NIC actual hardware....not the 'Teredo' and other odd names you see there.

If you are running in Always Record mode...you can compare the NIC input to the Write HDD values...the averages will track each other.

The Queue lengths metrics of Processor, Logical Disk and Physical disk are good ones to keep an eye on. You will learn how high the average can be before it starts losing data. It is a different answer for different VMSs. Hopefully the VMS you are testing with indicates lost data in a log or on screen so that you can match the perfmon value with a bad result.

Watch the available Mbytes of memory because this will help identify memory leaks.

Search out the processes associated with the VMS and watch the metrics that those provide and try to learn what they mean.

Sorry for the delayed reply, John. Thanks so much guys! This is REALLY helpful! John: We are using 4 GB of RAM on all of our servers, and yes, it was maxing out the RAM but the CPU was only at 30%. Our vendor continues to be adamant and decidedly obstinate that no more than 4 GB should be required.

One other major issue we are experiencing is artificating and frozen frames on our IP cameras when viewing any video through the VMS client -- this is what sparked my interest in the SAS drives. When we view the stream directly from any IP camera (for example, an Axis Q6034) via a browser, the image is perfect (we're viewing it MJPEG). However, when we view the same stream and/or playback of the same stream via the VMS client, the artifacting is terrible (VMS is recording H.264). My understanding from the vendor (and this may be true for many VMS) is that camera streams hit the server HDD, and it is actually from the HDD that the client is pulling the "live" video vs a direct connection to the camera.

I apologize in advance if my above description is confusing or difficult to follow -- no sleep last night. :)

David, both things you are describing are out of the ordinary. So let's dig in a little more:

"We are using 4 GB of RAM on all of our servers, and yes, it was maxing out the RAM but the CPU was only at 30%."

This is atypical. It could be that specific VMS uses more RAM than other VMSes or it could be something specific to your machine's setup. You've verified that there are no other services conusming significant RAM? Can you share who the VMS is?

Other aspect:

"artificating and frozen frames on our IP cameras when viewing any video through the VMS client"

There's one key important difference between these two scenarios - direct from camera using MJPEG vs through VMS using H.264. What is the bit rate, resolution, frame and compression of the H.264 stream? It could be that the H.264 stream is misconfigured / set to too low of a rate?

The other theory / idea I have is that this may just relate to your server RAM being maxed out and unable to handle more requests (like sending out video to clients).

John,

I wish I could share the VMS with you, but I was told that we shouldn't based on some NDAs in place -- sorry for the inconvenience -- I know it would help.

Regarding the utilization during that specific outage:

* One VMS-specific service was utilizing about 1.2 GB and another was at about 800 MB (UVidDrv.exe*32 / CommMgr.exe*32)

* We also have a command center monitoring tool/service that was running -- Splunkd.exe utilizing 500 MB

* Other services were running between 10 MB - 100 MB (I'll have to look into what they were).

* No memory leaks have been detected, although in earlier versions of the software, that was in fact an issue.

Q6034 (as configured in our VMS) -- I verified that the settings on the camera itself (i.e. through the browser) are reflected in the VMS configs for the corresponding camera.

* Res: 1280x720

* Max Frame Rate: 7

* H.264

* Bit Rate: 1024 (CBR)

* Frame Interval/GOV: 16

* Total Recording

* The only "analytic" system-wide that we have running is "Signal Loss" notifications -- all VMD is turned off

Let me know if you need any other parameters.

Just received some additional info from one of server support teams:

* RAM was running high, but the page file was set low -- he thinks they updated the page file size to 4x the RAM. The server team is going to get me some additional logs generated during the failure, but I may not get those until next week

Thanks!

Dave

David, if the VMS side was consuming 2MBs RAM, that seems roughly reasonable for VMS usage. As you note on the end, it could be server configuration issue and you might just want to be safe and go to 8GB RAM (cheaper than lots more troubleshooting). But I definitely don't think you need 32GB of RAM for a VMS server.

As for the camera, it's an Axis PTZ set to CBR. 1Mb/s is kind of low for 720p/7fps especially if the PTZ is moving or the scene is complex. I'd try increase the bit rate to 1.5Mb/s or 2Mb/s and see what that does.

David,

Based on what you are saying I have to assume you are doing Client Activity on your Recording Server. Client activity will stress nearly everything is a system depending on the VMS. This is why I mentioned it in my post.

Yours appears to be memory stressed. This can happen if you are changing the viewing parameters to be different than the recording paramenters...and your camera does not support dual streaming. This forces the CPU to transcode the data and use up memory to do this.

You could verify this effect by running the Client on a different system and do NOT change the stream parameters.

Another thing that you could be expriencing is a poor video subsystem for a VMS that wants a good one. Based on my lab testing, the video subsystem requirements range from 'nothing special...use the CPU' to 'let the video card do some work'.

GPUZ is your friend here.

Thanks guys. I'll try your suggestion, John.

Mike,

Just to make sure I understand, when you refer to "client", I assume you are referring to running the VMS client on the server (i.e. watching and/or playing back video). We only run our clients on dedicated workstations in our operations center; however, we do have an additional component installed on these NVRs to allow our server team to monitor the health of the machine -- are you familar with iDrac cards? It may be of no consequence, but I thought it worth mentioning.

Good choice for the Clients... MANY users fall into the trap of thinking the Server is behaving badly when in fact they are causing the issue by running the Client or one of its 'transcoding' tasks on the recording server..thus making it run poorly as a result.

I use BMC based mobos and the resource hit is minimal...so I can reasonably assume a RAC card is also a minimal hit.