How Costly are Hard Drive Failures?

Published Apr 29, 2009 04:00 AM
PUBLIC - This article does not require an IPVM subscription. Feel free to share.

Storage tends to be one of the more costly and problematic parts of video surveillance systems. Most video surveillance systems, even today, do not employ storage redundancy. When drives break, systems can fail and evidence can be lost.

This fosters an ongoing debate about the cost of hard drive failures and the value of doing something about them.
 
In this special premium report, we break down the issue into 5 critical questions:
  • How often do hard drives fail?
  • How do IP video systems impact hard drive failures?
  • What is the service cost of hard drive failures?
  • What is the cost of losing evidence from hard drive failures?
  • What is the business case for storage redundancy?

To determine how costly hard drive failures are, we need to estimate how likely they are to fail. Industry statistics from manufacturers and integrators range from 2% to 20% of drives failing per year. That's a very wide range but it does reflect legitimate operational issues that can vary. These issues include:

  • Environment - in video surveillance, recorders are often placed in hostile environments. Examples of deployments I have seen: in closets wedged between office supplies, on the floor, underneath a guard's desk, used as a footstool by the guard, covered in dirt and in many of these situations, no air conditioning and day time ambient temperatures over 30 C/ 90 F. Environmentally controlled data centers are not the norm.
  • Enclosures - the quality of the enclosure used for the hard drives can vary dramatically: the case itself and the type of cooling used. Video recorders range from low end PCs to industrial class servers. 
  • Age of hard drives - the older the hard drive, the more likely it is to fail
  • The type and manufacturer of hard drives - certain hard drives are rated to be more reliable than others
 
The first two are video surveillance specific. The last two are generic issues with hard drives. For more general information on hard drive failures, see the famous Google report on hard drive failures [link no longer available] and a Carnegie Mellon study.
 
What number should you pick? Obviously DVR vendors choose lower numbers and storage vendors choose higher ones to make their cases. In my experience, hard drive failures tend to be quite low - under 5%, with a few catastrophic exceptions.
  • Manufacturer specific problems: For instance, when I was an integrator, we had a dozen hard drive failures in 2 months only to find later on that it was a problem with a batch of units from our manufacturer. 
  • Adverse environments: When recorders were placed in really adverse environments, hard drive failures definitely increased substantially. 
 
As such, I can believe a wide range of statistics when it comes to failures. However, I usually use 5% as an estimate.

How do IP video systems impact hard drive failures?

IP Video should reduce the impact of hard drive failures in two ways:

  • End the placement of headends in the security guards office: The environmental problems I described above is often motivated by the need to headend analog video systems next to the guard (especially because years ago systems were not well networked). With the rise of IP, the need to terminate coax in a single location next to the guards should decrease. This, in turn, should increase reliability simply by moving the hard drives into data centers or at least climate controlled, clean environments.
  • Enable video storage to be aggregated in fewer, larger pools: Providing storage redundancy requires an additional fixed cost (for the additional hard drives and controller/software to perform the redundancy). If you only have a few cameras in a location, the overhead cost for redundancy can be very high. IP video is making it easier to send video to a central location to reduce this overhead. This is not a cure-all as sending surveillance video across wide area networks remains financially infeasible for most. However, in many cases this should help make it easier to justify the use of storage redundancy.

What is the service cost of hard drive failures?

The direct cost of hard drive failures is generally the multiple of the failure rate and the labor cost to service (hard drives usually have 3 - 5 year warranties and are relatively inexpensive anyway)

The labor to service a failure can vary dramatically from nearly free to $300 USD+. This depends on the physical proximity of service technicians to the recorder. In centralized sites (like schools), this distance is trivial. In decentralized sites (like convenience stores), this distance can be hundreds of kilometers. [Unfortunately, decentralized sites generally have fewer camera counts so it is harder to justify the use of redundancy - discussed in the last section].

To approximate, let's say a recorder has 3 hard drives, each hard drive has a 5% failure probability and costs $250 to service. Over a 5 year period, the total direct cost of servicing hard drive failures for this recorder is about $200 - a fairly insignificant amount relative to the cost of the recorder.

There are 2 major exceptions that can significantly increase cost:

  • A few vendors run the OS and video management application on a hard drive without redundancy. If a drive fails in this setup, the recorder will no longer be able to record (until the hard drive failure is resolved). Customers often require a temporary recorder to be deployed and then swamped out when the original unit is fixed. This can triple the cost of servicing.
  • A few vendors require DVRs or NVRs to be sent back to the factory to have hard drives replaced. This creates the same problem and increase in cost structure.

What is the cost of losing evidence from hard drive failures?

This cost is the combination of the likelihood and adverse impact of losing evidence. For this to happen, two events must occur.

  • First, a hard drive needs to fail. 
  • Secondly, the time period of video stored on that hard drive must contain evidence needed that has not previously been exported.

It is the second factor that is really critical to understanding the value of hard drives failing. For most users, the probability of losing critical evidence from a hard drive failure is quite low. In my experience working with publicly traded corporations, I would say that only 10% of all hard drive failures resulted in any evidence being lost and only a fraction of that were serious cases where the loss exposure was significant.

This is the opposite of other data stores, like e-mail. If you lost a hard drive with 500GB of corporate e-mail, the likelihood that this creates a major problem is close to 100% - simply because the information value is high and 500GB contains an immense number of e-mails. In video surveillance, 500GB (or even 1 TB), probably represents a few weeks of a dozen cameras - almost all of which is irrelevant to any investigation.

How do you project or estimate the value of evidence lost? I recommend multiplying the hard drive failure rate, the number of hard drives, the probability of losing evidence on a given hard drive and the estimated value of the evidence lost. For a recorder with 3 hard drives, assuming 5% annual hard drive failure rate, 10% probability that a hard drive has valuable evidence and $1,000 value of that evidence, in a 5 year period, the loss exposure is $150.

While you can change the numbers, they do reflect a common operational reality: most security managers are willing to absorb the risk of hard drives failing because the probability of losing valuable evidence is fairly low.

There are 3 important exceptions to this:

  • Some organizations have internal or governmental requirements about retaining video. Losing a hard drive can result in fines or other penalties.
  • Some organizations have very high risk profiles (prisons, airports, etc.) where the value of evidence is high. This can make the loss of evidence (even if small), very painful.
  • Some organizations have a low tolerance for risk, especially when the loss of evidence can cause operational problems and potential job loss for the security manager whose video was lost.

What is the business case for storage redundancy?

I estimate that, on average, the loss from hard drive failures for a 16 channel recorder over a 5 year period is $350 ($200 from service, $150 from evidence lost). 

These numbers reflect the operational reality of video surveillance users - though losing data is frustrating, it's not terribly valuable. However, there is a cost and to the extent that the premium for these solutions is less than this cost, rational users should adopt storage redundancy.

Over 4 TB (more specifically 4 hard drives), using RAID5 becomes very cost effective. This is because most storage arrays provide RAID5 as a standard feature. The value and utility only increases as hard drive counts increases (see the article on RAID6 advantages for large-scale systems).

For large camera counts and storage sizes (10s of terabytes or more), the use of IP SANs is certainly helping the use of redundancy (see our review of IP SANs). IP SANs provide RAID as a standard feature and for large deployments are a cost effective way of providing storage.

The challenge remains for deployments with smaller camera counts (whether they are single site locations like liquor stores or large chains like fast food restaurants). In these deployments, storage internal to the recorder is generally used. This storage does not usually offer RAID standard. Adding an IP SAN is a very costly addition for such small sites. As an alternative, some users are adding consumer / small business NAS arrays that are under $1,000. The drawback to this is potential performance issues and the need to setup an external device.

Conclusion

Storage redundancy adoption has been slow in video surveillance, primarily because the lack of loss that hard drive failures create. The migration to IP is helping to reduce the costs and problems of hard drive failures, however, the use of redundancy in smaller camera counts is likely to remain limited due to the cost premium.