PROBLEM: Missing Video - Gaps In Recording

An end user member said they are finding gaps in their recording where video is missing, anywhere from a few seconds to minutes or even an hour.

This leads to 3 important questions:

  • What is causing these problems?
  • What can be done to identify these gaps?
  • What can be done to eliminate them?

Bonus question, how often is this a practical problem in your deployments.

I'll through out some initial thoughts:

  • Lots of things can cause this, from momentary network outages to recorders becoming overloaded temporarily to cameras rebooting, etc.
  • Finding gaps tend to be hard. Some systems allow for setting up alerts on loss of camera connection but this may not always trigger for all problems causing recording gaps.

We faced this same issue (amongst numerous other bugs) some months ago. Due to the total lack of support by the manufacturer we had to troubleshoot this on our own.

The first thing that someone finds out is that the overall system behaviour only provides inconsistent information about the specific problem that you are trying to identify. The bigger the system, the more inconsistency you get. What we did is that we tried to design a plan in order to detect consistent information on events leading, causing or that may be related to the the loss of video.

Is this happening on specific devices? Is it periodic on specific devices? What is the minimum and the maximum video loss window? Can you catch this live and figure out whether manual operations (be them on the system itself or the network infrastructure) might restart the recording?

In our case we hired an IT company that deployed network monitoring tools along with snmt traps on all video sources, NVRs, workstations, servers, etc. For a certain time period where there were quite a few video recording loss events, we didn't manage to identify any network issues - bear also in mind that we are always using dedicated network for surveillance which are built after the manufacturer's recommended hardware (which is never cheap).

Unfortunately, although there were not network related issues, there were tons of system related issues like device disconnections, inability of client - server connections and more that we never actually managed to identify in greater detail. Our tries to mine consistent data gave as a very important clue. We figured out that there was a certain maximum time window of video recording lost and not a single second more.

As John is mentioning finding gaps is really hard. Indeed we never got any alert of camera disconnection that would trigger the loss of video recording.

Further investigation lead to the discovery (the manufacturer failed or didn't want to let us know that such a service was present) that the VMS itself was running a specific integrity check service periodically (exactly the aforementioned time window). During this check, any parts of the system found not to function as expected, were forced back to the "expected" operation. This affected the recording as well and ended any video recording lost. So, every T minutes this service was run automatically. Any video recording loss happening within this period would last a maximum of T minutes to a minimum of 1 second (should it happen at T-1second). The cause of the problem was identified by the manufacturer as a bug that was never actually solved - just worked around with unacceptable short term solutions leading to promises that were never fulfilled. On the contrary those workarounds lead to even more related (or even non-relate) problems.

To me, the key is consistency, through which you can figure out a pattern that will show you the general cause of the problem. From that point you can easier figure out your next steps since you will not anymore have to find a needle in the hay stack.

It took us about a month to install the system and almost 6 months to idetify the problem....

Should the problem be identified on the same kind of system through a personal message, i would be more than happy to assist any way i can.

My first suggestion would be a managed switch and MRTG.

Start looking at packet flows over time. Is the data moving across the network, but being dropped at the recorder, or is the data not even moving at all?

Once you've sorted this out, you can keep using MRTG as an early warning metric to see if bandwidth is spiking or dropping abnormally.

Nicholas, great details, thanks!

Brian, I agree that network monitoring can help but that will only detect gaps in packet flows. Issues inside the VMS/recorder (like Nicolas's example of an integrity check service) won't be detected. These types of issues can be a lot more dangerous as it's often hard to figure out what's going on inside the VMS/recorder.

We use the Nagios tool for network monitoring for more than 15 minutes of outage.

I've found that one of our VMSs has a lot of issues with missing video. I believe that the problem is either the VMS software is buggy or they hardware applicance that they forced upon us is underpowered.

Our network is all managed switches, and most run at 20-80% of capacity. If a camera, or other network device, gives the local switch too many errors, then a report is generated. I've found that one brand of camera seems to have far more issues than some of the other brands.

Since we have more than one VMS, I can have a camera recorded on both. I can see issues of studdering and video loss on the appliance, and not on the other sysem, hence my belief it is the software or appliance.

My Axis, Bosch, and Sony cameras/encoders on Video Insight with high-end Dell servers/VMs work quite well.

I'm an end user so I have fairly tight controls over my environment. I use SolarWinds Orion to monitor all my managed Cisco switches. Any network issues are fairly easy to find. Orion will email me, or I can go back thru the logs/graphs to see the issue. Any managed switch will keep a short log and stats to help identify network issues. Orion provides an overview and keeps logs/stats for several months.

I did have an issue with Milestone Xprotect for awhile. It would run out of RAM and stop recording some cameras. A service/server restart would fix the problem for a week or so. Milestone tried to help, but was unable to resolve. I did eventually upgrade to version 8 and the problem seems to have been resolved.

I've also had individual cameras that have locked up. They were POE so I just connected to the Cisco switch, dropped the POE, turned the POE back on and the camera restarted. For cameras that have this issue more than once I will upgrade the firmware, which so far has fixed any related issues.

Just a thought: If you think the problem might be your VMS, then setup a secondary VMS pointing to the same camera. If one is loosing video and the other isn't then it is a VMS issue. Most VMS vendors offer a free trial, which should be long enough to see any issues. You might like the trial version better than your current VMS. ;)

Aaron, great feedback, thanks!

This makes me nervous: "If you think the problem might be your VMS, then setup a secondary VMS pointing to the same camera"

With a second VMS running, you now have cameras running (at least) two streams which could cause issues in itself. The cameras could become overloaded, frame rate might drop, connection might fall out. Some cameras should handle this with no issue but others might create new problems.

With a second VMS running, you now have cameras running (at least) two streams which could cause issues in itself. The cameras could become overloaded, frame rate might drop, connection might fall out. Some cameras should handle this with no issue but others might create new problems.

Yep, new problems, but also just a lot of work to track down what could be a smaller issue. However, for the record, with multicast it's very possible to have two VMS running a single camera. Just don't expect configuration or camera management to work 100% as each VMS fights the other.

As for troubleshooting this... Start with some basic questions:

  • Is one camera affected, or all?
  • If all cameras, are they the same brand, make, model, and firmware version?
  • If some cameras, what's common, the switch? The network? The subnet?
  • Good managed switches will give you statistics on dropped packets and other issues. I would do that first before setting up NAGIOS or MRTG from scratch to troubleshoot. Besides, those packages require switches to have that basic functionality anyway (sFlow or SNMP) so look at the switch first!
  • Network cabling good?
  • Using TCP or UDP for your RTSP stream? UDP will drop packets and frames, causing loss.
  • If you rule out the network, then look at your storage and your hosts.
  • Windows has Performance Monitor. Setup metrics on CPU, RAM, page faults, network statistics and I/O, disk stats and I/O, disk queue depth, etc. Monitor all statistics against known good numbers.
  • Windows Event Log. Enough said.
  • Storage logs. Controller and disk failures can cause I/O to go crazy on those path's as rewrite operations are sent out. I've seen this personally bring down entire production critical systems on top-tier storage arrays.
  • File system integrity. How does SCANDISK come back? Bad records and bad allocation tables can do bad things.

Seth, excellent list, thank you!

The one thing that still concerns me is: How do you find out if your VMS/recorder is the culprit? If it's the one dropping video, it might be hard to track down. Does anyone have any ideas on this?

Check the VMS logs? But then you need to know what to look for, which can be difficult as it depends on how or what the VMS logs/reports.

The one thing that still concerns me is: How do you find out if your VMS/recorder is the culprit? If it's the one dropping video, it might be hard to track down. Does anyone have any ideas on this?

Check the VMS logs? But then you need to know what to look for, which can be difficult as it depends on how or what the VMS logs/reports.

I hate to say it, but you're relying almost completely on the VMS vendor to give you enough debugging data to solve the issue. If the VMS doesn't have decent logging or other 'event tracking' facilities, you are going to be fighting a very up-hill battle..

In which case perhaps demoing another VMS is a good idea, as a troubleshooting step.

Seriously, I would rather hold a gun to my head and pull the trigger before I become responsible for a system without sufficient troubleshooting tools!

Great suggestions Seth.

I'm a VMS developer...

Here just some random thoughts :

1) in any network even if it's only 60% loaded packet lost on IP level happens.

2) this is why normally any VMS works with RTP over TCP and it's hard to blame VMS for video lost:-)
Normally VMS uses very well tested OS( Linux, windows ) TCP/IP stack.
3) some cameras/switches handle packet lost better than others

4) A lot of packet lost happens due to cheap embedded NIC cards, so for 60+ Mbps - use discrete NIC.

Last point is very important. My experience tells me it makes a lot of difference.

1) in any network even if it's only 60% loaded packet lost on IP level happens.

I have to take issue with this- a properly designed IP network with good equipment configured to best practices, and actively maintained and monitored... will not lose IP or UDP packets. At all.

However, I will agree with the premise that not all networks are built the same, and thus the lowest common denominator must be factored into statements like this.

2) this is why normally any VMS works with RTP over TCP and it's hard to blame VMS for video lost:-) Normally VMS uses very well tested OS( Linux, windows ) TCP/IP stack.

Agreed, since all current O/S TCP/IP stacks are rooted in Unix code... TCP across the board is stable. UDP requires a well-designed network. Will you get 80% efficiency out of the network? No.

4) A lot of packet lost happens due to cheap embedded NIC cards, so for 60+ Mbps - use discrete NIC. Last point is very important. My experience tells me it makes a lot of difference.

This. Right. Here.

I had to reply to your post to point out this -truth-. Embedded NICs like the RealTeks of the world are a real problem in many network situations. Why DVR manufacturers relied on desktop-class components like this still confound me to this day. Even if it's an embedded Intel or nVidia or AMD NIC, the OEM may just take the controller and pair it with a horrible PHY, creating nightmares.

Discrete NIC: Does a Realtek PCI-E Card like this count?: http://www.tp-link.com.au/products/details/?categoryid=235&model=TG-3468

Or are you guys referring to these:

http://www.amazon.com/Intel-Gigabit-Network-Adapter-EXPI9301CTBLK/dp/B001CY0P7G/ref=sr_1_1?ie=UTF8&qid=1364475609&sr=8-1&keywords=intel+pci-e+nic

http://www.amazon.com/Intel-PRO-1000-Desktop-Adapter/dp/B000BMZHW8/ref=sr_1_36?ie=UTF8&qid=1364475808&sr=8-36&keywords=intel+server+pci-e+nic

Or would you only consider a "server class" NIC, like this:

http://www.amazon.com/Intel-1000-Dual-Server-Adapter/dp/B000BMZHX2/ref=sr_1_1?ie=UTF8&qid=1364475691&sr=8-1&keywords=intel+server+pci-e+nic

http://www.amazon.com/HP-Profile-GigaBit-1000Base-T-383738B21/dp/B0007UC9DO/ref=sr_1_14?ie=UTF8&qid=1364475719&sr=8-14&keywords=intel+server+pci-e+nic

You could also try the baseline CMS/Multiviewer/VMS that came with your cameras - they are usually horrible to us but stable for viewing.

For example - Dahua has PSS - a 64 channel viewer/CMS/VMS that is always stable with there own cameras. Having a reference such as this is very helpful for eliminating hardware and non-VMS problems.

Vivotek also has similar, as does Geovision.

Another option is something like Network Optix HD Witness - with the free client you can load up as many cameras as you like for live streaming - the 1.5 release will also feature notifications of packet loss and camera disconnects (tells you which camera disconnected/dropped packets) - so you don't even need to monitor it manually - the notifications can be by email.

Video Insight offers a 30 day trial and supports like 1,600 camera models, if you are looking for something to test stability of cameras.

As has been mentioned. When testing with an additional system make sure that it isn't overloaded, it doesn't overload the network or cameras. Don't let the testing disturb the result.

Another reason there appear to be "recording gaps" can be motion settings. These should be thoroughly tested during implementation. We've seen gaps due to motion settings, bad hardware, memory leaks, bad drive sectors, bad software (VMS) release, bad firmware releases, software updates, schedule changes, system reboots and many, many more reasons. You don't know why until digging a little deeper. Start with overall analysis as best you can and look for the greatest common denominator and work your way from there.

Joshua, agreed. For purposes of this discussion, I was assuming that recording was set to continuous mode. If it is motion based, it gets far more complicated to understand why video is missing.

Use Genetec.

Record constantly a frame every second or 2,3,4 whatever you like. Bust on motion to 7pfs (or the frame rate of your choice) use video trickling and record the video on the SD card of the camera in case of a network failure and automatically dump the video on the server when network is restored. You can even fail over the achiever should the main recorder go down.

There is no reason to miss video if you use the right tools.

Remus Tomici

Remus, let me generalize this because, while Genetec is a leader here, they are not alone. Using redundant edge based recording should help avert this problem. Milestone Corporate has similar capabilities.

Also, even if a VMS supports it, make sure they support it for your brand/model of camera.

Seth is very accurate, if the network is built and CONFIGURED properly it eliminates or substantially reduces packet loss! We have found in most cases it is not the VMS issue but the network.

Second is the server performance. Exceeding 80% capacity will affect performance.

Easiest way is to use the proper network monitoring tools, the data is either present or not, read detailed report!

Michael, thought you are not to pitch specific provider when it is not in discussion/topic?? John?

Steve, to the best of my knowledge, (1) Remus is not promoting his own company and (2) he mentioned a functionality that is not widespread (i.e., he didn't just say use Genetec VMS, he said, use this advanced Genetec feature to eliminate this specific problem).

John - he said Michael not Remus.

Josh, thanks, I missed that. Michael's an end user without any financial interest and also has demonstrated himself as an active technical commenter on IPVM. Btw, the background to this is that Steve recently recommended his own company, did not disclose his affiliation, was off topic so I deleted the comment. That's a lot different than the examples here.

Some causes I've seen is an intermittent drop in connection between the VMS server and iSCSI storage array. Check for conneciton issues there.

Or can be problem with internal disk drive. Run standard disk diagnostics (CHKDSK, HDD factory provided diags, etc.)

Also can be bad switch or switch ports. Does the VMS have the capability to log dropped camera stream events?

Weirdest problem I ever seen turned out to be not enough system memory to index the video for searches. The video was actually there if you went to a point that was near the apparent "gap" and either rewound into or fast forwarded into the gap area, but if you did a search to the affected indexed area it would look like the video wasn't there and would just skip forward to the next properly indexed area. Upgrading the system memory fixed that problem.

I've also seen where incorrectly set motion activated recording caused gaps, but John says for this purpose that would not be the case.

Intermittent loss is painful but if it is regular, it may be worthwhile to determine the size of the gap. My VMS developer's engineers worked hard to find that particular cameras had an I-frame setting which was the cause of a minute drop at a crucial site. In another instance our anti-virus software was the culprit.