How to Handle Hard Drive Failures

Published Feb 13, 2012 05:00 AM
PUBLIC - This article does not require an IPVM subscription. Feel free to share.

No matter how well maintained a surveillance system is, hard drives eventually fail. Indeed, A recent member's poll [link no longer available] showed that most find hard drive failures to be 'a common and significant problem'. What happens when a hard drive fails can vary significantly depending on the system type used, the priority of the system and the concerns of the end user. In this note, we examine the most common real world issues and provide guidance on how to best deal with hard drive failures.

Impact of Hard Dive Failure

How much of an impact a failed hardrive has on a system depends on the architecture of its storage. There are three typical scenarios:

  • Single drive for OS and archived video: In systems which use the same drive for the server or DVR/NVR OS, as well as video storage, failures are most critical. Loss of the drive results in the entire system going down, with users unable to view live or archived video. The unit is essentially dead until it can be replaced in its entirety, or a drive with the OS reloaded.
  • OS and archives on separate drives: In many DVRs, the operating system is kept separate from video storage, either in flash or on a seperate hard drive. This prevents video from being lost should the OS drive fail. Conversely, it prevents complete failure of the unit if the video drive fails. The system may still be used for live viewing until the drive is replaced and archiving may begin again.
  • RAID: When using RAID for storage, whether it be internal or external, the system, including archiving may continue to run when a drive fails. Depending on the RAID level, one or more drives fail without issue. When a new drive is inserted, the RAID array begins rebuilding the lost data. During this process, performance (mainly throughput handling) may be reduced. Some high-end enterprise systems allow for full performance during rebuilds, though lower-end units do not.

One key consideration is how and when users will know if archiving has stopped. For small systems especially, the recorder is often located in an unstaffed location, such as wiring closet, without any check of whether it is working or not for extended periods of time. Thus, often, the first time a user realizes the recorder is not working is after an incident has occurred. Some DVRs and VMS platforms are able to send notifications if archiving stops for any reason, but we suspect these are often not set up.

Client Response to HDD Failure

Depending on the client and how they view the criticality of their surveillance system, response to a HDD failure will be different. For some, going without their system while it is repaired is acceptable, as they rarely have incidents which need review. For others, downtime is absolutely critical, and immediate response and repair of the hard drive is required. 

Integrators may keep spares on hand for end users with stringent downtime requirements, as part of their service contract. Other users may simply receive a loaner unit, to provide basic recording functionality while the production unit is sent out for repair.

Written Procedures

No matter which steps are taken, due to the potentially negative reaction some users may have when their system is down, it is essential that response procedures are agreed upon, and put in writing as part of the warranty or service agreement. While users will still often direct their frustration at the integrator, this reduces the potential for contention or blame.

Recovery

While it is possible for data recovery services to read some, or even most, data from a failed hard drive, this is often a costly prospect. On average, recovery of a single hard drive can cost well over $1,000. Complex physical problems (caused by severe crashes) may cost upwards of $2,000 as more manual work is required. This makes data recovery an expensive process, reserved only for severe cases.

Potential Liability

No matter who the client is, the potential for liability issues is present when a surveillance system is in place, but not recording. Incidents, no matter how infrequent, may occur at any time. Video of a critical incident may be key evidence in litigation (such as accidents, vandalism, slip and fall, etc.), making hard drive failure a real risk to users. 

For users whose video surveillance is governed by regulations, hard drive failure is potentially more of a risk. For those in government or gaming verticals, who are required to store video for certain durations, an unnoticed hard drive failure could place them in violation of these regulations. For this reason, these users nearly always use RAID storage, sometimes RAID 6 and beyond, to guard against multiple hard drive failures. 

Recommendations

For users of small surveillance systems, the best choice is to use DVRs with separate drives for OS and video. For redundancy, inexpensive NAS units, which offer RAID 1 (disk mirroring) are available for under $300 USD. Not all DVRs are capable of using NAS storage, however, so users should ensure compatibility before purchasing.

For users of enterprise-level surveillance systems, RAID is very common. The main concern is the RAID array's performance during a rebuild. If a drive fails, no data is lost. However, low performance arrays can practically take down a sysem if read/write capability dips too low during a rebuild, which may take 6-10 hours or more. Users should beware of this, and verify with vendors that performance will remain high during rebuilds.

For all users, if video is critical, selecting a recording platform which will notify them of hard drive failure is always recommended. Immediate notification is the best way to protect against lost video.