Random Video Recording Gaps - Why?

A member commented, "I have seen in more than one project where only IP cameras are used and that too recording 24*7 at around 20fps and 2MP or 3MP resolution - lot of recording gaps occurring. These are random, and vary from camera to camera , no particular pattern is observed. These gaps range from few seconds to few minutes (say 2 minutes or 3)."

He then asked what might cause this.

The most things that I have seen include (1) camera disconnects - the camera might reboot, network connection could go down, PoE switch could be overpowered or (2) server issue - the recording server could be overloaded, e.g.

What have you seen? Please share.


With ONSSI, classic at least, and possibly some versions of Milestone, the server does database administration once one day at a scheduled time. During this time no recording occurs.

On my system (small), this can last a couple of minutes before everything is back to normal. Worse, if you exceed the per camera storage limit during the day, it will force the camera off line for 1-2 minutes or so while it frees up space in the active DB. If this is happening everyday on several cameras, you will find all sorts of 2-3 minute gaps all over the place on different cameras.

Yes, that's the one. I think the forced intraday cleanup when you go over a single camera limit thing is just a NetDVR thing, I hope at least.

It can make Swiss cheese out of your database. Guessing SeeTec engine doesn't have it, but don't know.

'Milestone - it can make Swiss cheese out of your database"....

Related: Milestone VMS Live / Archive Databases

The key to make this work well is to get the Archiving properly scheduled to match the input bitrate and to get the old data expired and deleted on a schedule. It is truly and 'AND' function.

If either one is not properly defined in the system...it will suffer.

The VMS was not mentioned in the query. Which one was it based on or was it a question about several VMSs?

All the VMSs I have tested in my lab have some sort of housekeeping chore where out goes out and looks to see how much data it has and initiates some sort of cleanup operation to make way for the new data.

John's hit list is a good one. In addition....here are a couple things I watch.

If you use perfmon....you can watch the NIC received data discarded counter....zero is the best answer here. This means the system had data that was not retrieveed from the NIC buffers by the system.

Perfmon also helps out when you watch how much data is being sent over to the storage subsystem. This MS article is a good read on the subject... Storage Performance Counters article

Lower quality switches that have limited internal max bandwidth will also play a role in missing data. You truly get what you pay for in this area.

One think that I have noticed from experience is that the Motion detection sensitivity need to be set higher on certain regions.

Most decent VMS allow for different motion regions on the same camera.

Bandwidth spikes caused by high motion activities resulted in the total simultaneous disc writes from all cameras exceeding the capability of the NVR.

In the absence of data, it could be all or none of these problems. What makes these intermittent problems so unnerving is that if you haven't proven to yourself that you've identified the real issue, you are throwing darts in the dark, hoping that you've addressed it. Then you find yourself dreading the next phone call from the customer, yelling at you about not having fixed the problem.

What you need to have is some continuous monitoring of your video surveillance system so you can see whether the gaps in video are correlated with dropped packets, queue depth to storage, POE events or something else. In our data, a majority (>70%) of this kind of problem is configuration related.

Given that your cameras are configured to generate a significant amount of data even with compression, you need to check for load issues in your switches and servers. H.264 is a good codec but can be misleading if you've spec'ed the configuration assuming a level of motion that doesn't match actual experience.

Milestone (ONSSI) configurations have a particular problem in their Live-to-Archive migration regime if not configured properly. You need to measure how much video data is actually downloaded to the recorder and whether it is exceeding the capacity of the Live partition before it is migrated to the Archive partition. You may have it configured where it works most of the time but if you get more motion on some days, the amount of data may exceed the total capacity of the partition before migration. If that happens, the system will delete the video files from the Live partition prior to migration and you'll see these kinds of gaps.

Other times, we've seen situations where users are surprised to discover additional streams being pulled from a camera, recording to multiple servers unintentionally. Creating and managing additional streams can strain the camera's capacity.

Issues with fragmentation affecting performance of storage happens over time. Look at write queue depth and write latency for clues to those issues. If it is happening pretty soon after initial install, this is less likely the problem.

If you are looking for some technology that can help you monitor your installations, let me know.

I have seen cameras get stupid when network error rates get too high. Axis M1054's used to do this. It appeared that network glitches would eventually cause a camera to stop recording, and sometimes it would "get better" and restart. The thing that ended up being useful was to compare network stats from "good" and "bad" cameras. Look at the camera, look at the switch, etc. Also vendors seem to not actually confess when they have network issues so this can also be one of those "a firmware update might help, no the vendor didn't give a reason" situations.

I assume this is a closed network with no other network traffic and so there isn't something else to correlate with. If this were an enterprise network I would suggest you identify a specific time and date of a "glitch" and take that to the IT team and ask if they were doing scans at that time. And of course I assume you weren't messing with the network at the time (ooh! look! this cable has both ends plugged into this switch. I wonder if that could do anything?(

...this cable has both ends plugged into this switch.

That's called a poor man's cable tester...

and it creates routing loops if you're using a cheesy unmanaged switch instead of a real switch with spanning tree. also if you're lucky IT will shut down your entire switch quickly so you'll learn of your error before you leave the job site. If you're unlucky IT will make sure your badge is cancelled so that when you go for the next service call you'll find guido the it security duded in the front lobby, ready to break your kneecaps.

...and it creates routing loops if you're using a cheesy unmanaged switch instead of a real switch with spanning tree.

False.

You are confusing broadcast loops with routing loops. Broadcast loops are layer2, frame based and cause duplicate frames which flood the switch with traffic.

Routing loops are layer 3, packet based and typically involve 3 or more routers.

Plugging both ends of a cable into a switch will result in a broadcast storm, not a routing loop.

...you'll find guido the it security duded in the front lobby, ready to break your kneecaps.

In that case guido would find his TTL shortened to 0.

Any resources that are shared are potential causes. Here is an example for storage.

One of our clients has ~150 cameras recording to a server with a 80TB volume. The random recording gaps appear after the volume is full. In other words, the file system fragmentation and VMS file creation/deletion come into play.

After close examination, we found out the recording gaps happened when the particular VMS is doing the house cleaning. During this period, the VMS scans through all the recorded directories, reading its particular index files, and updating the Windows NTFS Master File Table (MFT).

This problem is becoming challenging with large volume (e.g., 80TB) as the MFT is getting bigger (the amount of reading and updating) and all the recording directores are spreaded. The default MFT zone is 12.5% of the volume.

The visibility is important to find the cause.

Sean, excellent example! What is the VMS doing about this? Shouldn't the VMS be designed to handle this prior to allow itself to be full?

We only can observe what different VMS do. The sophisticaed ones try to optimize around the Microsoft MFT algorithm to reduce the access. Also, there is no issue for the volume to be full. Just the storage access pattern is changed afterward.

During this period, the VMS scans through all the recorded directories, reading its particular index files, and updating the Windows NTFS Master File Table (MFT).

The VMS is updating the Windows MFT directly? For performance reasons? How do they insure atomic transactions?

NTFS 3.0 lets you enable a USN Journal, which logs the actual commands sent to the file system (but not the data, like the regular journal). Have you used this to see what is going on?

Integration issues when using a a different manufacturer for VMS than the camera manufacturer. Had some Pelco cameras recently that were randomly losing connection to a Milestone Professional server, needed a firmware upgrade to stay online reliably.

Mike,

I'm unaware of any cleanup processes with Avigilon ACC 4 or 5, nor have we ever had random gaps of recording that was 100% related to the VMS or the cameras.

For the benefit of other members (FYI only), Avigilon does not use a seperate database for motion, alerts, bookmarks, etc. Avigilon Pre-allocates all available space within a volume, and writes directly to the files, thus bypassing the need to request available block information from the OS. All of the database functionality to tag video for motion, cameras names, dates, etc, is all slipstreamed into the video streams as they are written to the files. This allows one to export a segment of video and a player, and have full functionality to search for all events, motion, analytics, bookmarks, and-or re-export from the export...

As far as I can tell, the only maintenance process is the BACKUP video functionality, and we've never seen that impact video. (That is perhaps the weakest link in the plartform.)

We sold Video Insight up until we drank the Avigilon Cool-Aid in 09, and that whole SQL light database stuff was the achilles heal in that platform, and perhaps others that require it.

from our IT support experience, a small business goes out and buys a business software package that uses SQL Light, and they backup a folder, but the SQL tables are never closed, so they one day loose it all thinking they were backing up. It's an ugly thing.

Avigilon Pre-allocates all available space within a volume, and writes directly to the files, thus bypassing the need to request available block information from the OS.

True, that is far more efficient way of handling storage on a purpose built server.

FWIW, if you have to use SQL-lite , you can insert and then delete a few thousand empty gigabyte blobs in just a few statements, which will create the extents ahead of time. Turn off the database free space scavenger thing first.

There is not enough information provided. It could be the system being overwhelmed at various points in terms of bandwidth, I/O capacity, network switch backplanes being overwhelmed, aggregate POE power being insufficient, or any combination thereof. It could even be some overlooked Windows service locking needed files (had this one recently). Depending upon the VMS or NVR being used perhaps there is a memory leak prompting the application/service to crash and reboot. If the person reporting this is willing to provide more details such as the VMS used, the camera makes/models, count of cameras, and network equipment/layout we could dig a bit deeper. There are plenty of experts on various systems that subscribe to IPVM who would be glad to assist.