What Are The Alternatives If RAID5 Is Dead And RAID6 Is Dying?
In articles for ZDNet, author Robin Harris claimed in 2007 that RAID5 would be dead in 2009 and now claims RAID6 will be dead by 2019 Why RAID 6 stops working in 2019. There are numerous other articles about the death of RAID5 (including my own here in 2009) but I have to wonder if purchasers of large-scale video surveillance systems shouldn't start considering alternatives even now (RAID1? RAID10? Triple striping alternatives?).
What alternatives are being considered and what alternatives are being deployed?
Carl, it seems that big storage providers are focusing now on non-RAID solutions. For instance, Isilon and DNN. Is that correct?
You know, I don't think DDN ever gave me a straight answer to that question. The sales guy gave a spiel that was probably aimed at IT gurus because, although he did talk about data safety, it was not in English, as far as I could tell.
Now I know how my wife feels when I start talking technical.
RAID5 is not dead and RAID6 is nowhere near dead. Just because someone proclaims it so doesn't mean it is the truth. I see both of these being around for a long time until someone comes up with a cheap open alternative. Anyone that tells you different right now is trying to sell you something.
Sales Engineer at Exacq
Can we clarify scale here? A RAID5 array with 3 or 4 drives seems to be in fairly safe shape even today, even with the logic of that article, no? Obviously, that's a small system. I would assume a similar situation for RAID6 with 7 or 8 drives? Still small but enough for the lower 90% of users not to worry.
As an end user, what do you think I'm trying to sell? Sure you can use RAID5 if your storage requirements are small or your data isn't critical. Heck, you can use JBOD, if you're so inclined. But in my vertical (Casino Surveillance), data integrity is not only mission-critical, its loss can cost a bunch in both direct and indirect losses.
Are you refuting my experiences with RAID5 data loss?
By the way John, I think drive size also comes into play here. With the advent of 4TB drives, even 3-drive RAIDs hit the 200 million sector (actually, that should be 23 billion sectors = 12TB) point. 5TB and larger drives will break that barrier with less than 3 drives.
Now, you could argue that the author's logic is somewhat flawed in that a 12TB RAID5 set will not always fail or maybe not even fail often, but I still contend that critical applications demand better than RAID5 and with the advent of ever-larger hard drives we will start to see more failures in RAID6.
It seems to me that likely as not, few of the storage systems installed today will be relevant in 2019. We will all be putting in new and different systems. The storage industry has done a reasonable job of producing storage system options as the needs arise. They are not always timed perfectly but usually reasonable IMHO anyway.
If there are problems and money to be made, someone will invent that better mousetrap and the market will follow. We will most likely have several more options between now and then to choose from.
We can already use a RAID of RAID volumes (1,4,5,6 or 7). Those would all extend the reliability several more years. The technology is not new and some current controllers/SAN systems could do it with only a firmware upgrade. You can do it now by combining software and hardware RAID. A RAID1 of RAID6 volumes doubles your reliability with very little software performance penalty, but at some point we ought to discuss the costs. How much do we save by using 4TB drives, building HUGE volumes that are less reliable than 1-2TB drives? Especially when we end up with double the number of disks to add extra redundancy to match smaller disk volume systems?
For example, IBM has been shipping their SAN Volume Controller for about 10 years now. It is designed to do just that and sit between your storage arrays to virtualize/mirror/copy/etc… all while the data remains accessible. IBM SAN Volume Controller - Wikipedia (it isn’t cheap though )
This got really long in the first draft but in summary, the storage that exists today already works for 90%+ of the video market for the near future it seems to me. Trying to guess a future tech storage winner x years down the road seems kind of a risky purchase today...
I absolutely agree with Robert Ansell's post. RAID5 and RAID6 are ABSOLUTELY not dead, they are more common than ever in serious video deplyments and they are reasonably secure, offering a high level of uptime.
You make a general claim saying "RAID5 is dead and RAID6 is dying" based on a very limited view of the video storage world (the Casino surveillance vertical). I think that your use case is the exception and definitely not the rule.
My company produces at least 5+ RAID5 or RAID6 servers every month (for the last 2+ years) and we have yet to hear of a single instance where data has been lost on one of these servers. So we're talking about over 100 separate RAID servers and not a single byte of data has been lost. That's not to say that drives haven't failed-- they have. But in all cases, the replacement drive has been installed and rebuilt before a second (in the case of RAID5) or a third (in the case of RAID6) drive failed. To me, this is serious guaranteed uptime and will refute any claim to RAID5 or RAID6 being dead.
And Robert Ansell is particularily well placed to comment on this: I would guess that Exacq produces many dozens of RAID5 or RAID6 servers every month also. I would trust his comments implicitely.
I didn't say RAID6 is dead, I said RAID5 is (or at least it should be). If you and Robert believe RAID5 has "serious guaranteed uptime", why do you even sell RAID6? And are they larger-scale systems or smaller systems with only a few drives (3-4)?
My "limited" view is based on experience with 65 RAID systems of multiple manufacturers, each populated with 16-48 drives in a 24/7 environment over nearly 10 years (not all at once - hardware was replaced and system expanded in 2006).
From your comments, it appears you're selling RAID5 systems with no hot spare? If so, I'm surprised none of your customers has lost any data. The longer a system sits degraded without performing a rebuild, the more likely it will encounter UREs during the rebuild process.
I agree that RAID 5 & 6 are not dead. But in my short (VMS) history I've been burned by RAID 5 in the resent months. For small volumes that are also backed up to tape I use raid 5. But for my video volumes I now use raid 6. Not too long ago I lost 2 drives at the same time during a server reboot. I lost 12tb of video footage And had some very unhappy "customers". The problem with other raid types 1 and 10 is the efficiancy and cost. There is so much "waste". My path, until a better option, is to use raid 6 with a hot spare. It's also important to use enterprise level drives and not consumer drives. The MTBF is much better. This is in no way perfect, still doesn't address rebuild time and degraded performance issues. It also only allows for 2 drive failures at once with a sizable rebuild time. Just my $.02.
RAID as a concept will die. When is the question, but as drive sizes are getting larger, it will be up to the storage vendors to come up with technologies that potentially replace this...
... or is it really up to storage vendors?
One of my favorite examples is BackBlaze. These guys are a provider of unlimited cloud backup for your desktop. They're talking about server backup later, but here's what's interesting. They talk about their hardware platform quite a bit, basically it's a Linux server running tons of common hard disks in a custom chassis.
Take a look at their solution here.
What makes their pod so unique? It has NO RAID. Zero. Instead, they have developed an application framework for their business that builds data redundancy and replication into the mix at a higher than hardware level. Their hardware is uninitelligent, and provides no failure-protection. If a pod dies, they don't care. Reason? They replicate the data to other pods. Their software detects a hardware failure, it can start creating a new copy of the data on that pod elswhere.
Google does something very similar. They distribute their storage over thousands of COTS hardware nodes, and build fault-tolerance at the software layer.
Lets look at the a futuristic file-system, ZFS. ZFS has RAID capability built into it. In fact, if you want to use ZFS instead of a NTFS filesystem, you don't use a RAID controller for smaller disk-sets, and for larger sets you may stripe a few RAID arrays across into a RAID-Z. You want to present the disks as JBOD or RAID0. ZFS handles the failure, handles the data integrity, and does a better job of it.
Unfortunately, there's still a rebuild process that takes a large amount of time... 20 hours for 50TB or so, hardware-dependant. Current solutions now don't address the time it takes to rebuild from failure.
So the future will be about pushing fault-tolerance away from the hardware. Abstract the hardware in a virtual solution, and manage the data from that level. Then you can replicate, apply parity, calculate checksums- do what ever it takes to keep the data available. Corey mentioned the IBM SAN Virtual Volume Controller, and that's a step in the right direction. Abstraction.
But how about now? How do we achieve resiliency without worrying about hardware vendor technology and without resorting to relying completely on RAID?
As an IT solutions provider, I've had my share of disaster recovery situations. Those that end up the best are those where there was a replicant of the data that was lost.
That's why I don't prefer RAID5/RAID6 alone.
"Their software detects a hardware failure, it can start creating a new copy of the data on that pod elswhere"
- The question I would have regarding that, and other failure-predictive solutions is that they tend to rely on data reads in some way. SMART and "Drive Check Period" are two related items that both have problems with streaming video recording. In my experience and from explanations given by a number of storage vendors and drive manufacturers, surveillance video recording systems that write continuously don't give the storage system the opportunity to detect and correct basic problems like bad block relocation and error recovery. This is totally the opposite of traditional IT storage requirements.
For one thing, verify-after-write is not possible and for another, SMART often requires more time to detect and correct many errors than the 7 seconds storage systems typically allow. The combination of those two factors make error recovery difficult, if not impossible.
Seth, while I agree that the future is NO RAID, the question is what do surveillance deployments use? Cloud backup is not practical given the immense storage requirements of surveillance. Building a custom solution, like Google, is probably not practical as surveillance is big, but not that big.
There seems to be a few specialist storage providers (like DDN, Isilon, Veracity Coldstore) with NO RAID offerings and the digital tape disk guys. Others worth considering?
Apparently, DDN does use RAID6, but with their own flavor of error detection and recovery. See the detail tab for the S2A6620 Here.
"Redundant, hot swappable components, automated failover features, and RAID 6 deliver end-to end data protection and availability."
Seth, I am not saying the methods are not applicable. Rather, my point is how does one deliver this in surveillance? Google has teams of engineers and developers to optimize this. Even most big surveillance deployments have nothing close to this.
Btw, may you share who your VMS vendor/partner is who has taken steps this way?
Seth, very interesting. So, basically you are mirroring across 2 low cost appliances vs a single more complex, expensive storage system?
I have no problem with this approach and we use it for some things, but the price estimate for the 10TB system is too high. While you can certainly build systems that cost that much for video storage it would be quite wasteful... They would normally be in the $5K to $10K range depending on options and the vendors fiscal quarter... ;-)
Complexity is not free. By building a custom system like this you are doing exactly what you are complaining the vendors do with a lock-design. Even within my organization, my greenest employee can swap HD, PS or even a controller with little or no help from a senior engineer on a real SAN/NAS storage system. With this more complex system with NAS/SAN external and iSCSI, or other Clustering/HA-style network issues, special software and lot of CLI work to be done that is no longer true and the seasoned folks must get intimately involved. What is my time worth or cost and how exactly does this make the system more reliable/cost-effective for me or the customer over the 5+ years the system is deployed?
So it is somehow OK for the security/VMS supplier (us) to lock a customer into something very few people can support well, but not for the vendor to do the same thing to you with an all in one? (Seth, I am not trying to pick on your approach as I said I even design system like this myself.) I am just trying to make the point that there are lots of complaints here about vendor lock-in and yet custom-one-off systems are effectively the same thing from the customers perspective. Kind of the Pot calling the Kettle black... ;-)
Isn't that the point of being a VAR/integrator after all? To add unique value to the sale of HW/SW so that the customer chooses your solution over others? Are not the storage folks doing exactly the same things we are, but on a more detailed and rapidly replicated level?
(I am sure I am going to catch heat for this one... ;-) )
Not sure I would call that "low cost". Since we will need approximately 800TB, $15k for 10TB equals $1.2M for the storage alone, not including the rest of the required equipment.
Dedicated SAN systems can run ~$300-$400/TB for starter systems of up to about 240TB of RAW storage. The more storage you buy, generally the lower the costs/TB... I just ran a *very* rough web price quote (no taxes or shipping or installation) and 1000TB of RAW SAS/SAN storage ran about 24U and ~$280K and that brings the price down to nearly $280/TB. Can you really build a custom COTS-based system and implement all the software while not giving up the redundancy and reliability inherent in dedicated SAN/SAS/NAS units for a lower cost?
Oh man, I just read the BackBlaze Pod post I mentioned in another discussion, and was floored when I found the $60/TB number they're throwing around.
Not something to use in surveillance, at all. Yet, wow.
So the prices are basically a wash COTS vs. dedicated SAN... $280/TB web price before discounts should be at roughly price parity with your $250 example for 800TB. I am a fan of open source and COTS, but not regardless of anything else. Being open source and COTS is not a solution, it is only one tool in the toolbox, one possible piece of the puzzle.
I agree, that things are moving along and the standards will progress, but I would argue that SANs have become commoditized on the low-medium end already and so the spread between the TCO for COTS/Open source and these SANS sized for a VMS has shrunk for small to medium systems <~1PB. 5-8 years ago that was not the case as most SANs demanded a huge premium, but today... I think it is not so clear a choice.
So what then? Use the cheapest RAID system? Use name brand storage like Dell or hp? Use something more expensive like DDN or EMC Isolon?
What are the tradeoffs besides support quality vs. cost per TB? What do companies like EMC and DDN offer in terms of reliability and support that can't be obtained from selecting more midrange or even lower end storage and adding other layers of redundancy?
Is it even worth the bother to manage a complicated system for a (relatively) untrained end user or would they be better off choosing the highest quality storage they can afford because it has simple built-in management tools and is built for high reliability?
For at least some of you, setting up, managing, troubleshooting and repairing storage systems is simple. For the rest of us, it can be a nightmare - RAID sets, logical volumes, LUN mapping, load balancing and the like are difficult to learn and it doesn't help when each manufacturer (and I've dealt with four) does things differently.
I'm the only one in our organization who can set one of our RAIDs up from scratch and even I have to scratch my head and think about it each time because it's not something I do every day. And forget getting the Integrator back each time a system fails. That's way too much down time.
For the size system you are talking about I would pick someting like the LSI/Engenio SAS family (NetApp now owns it, IBM and Dell rebrand it), EMC's low end SAS family or maybe HPs P2000 G3 series of SAS storage. The LSI user interface hasn't changed much in 10 years. I mostly use IBM and Dell OEM versions of this system but there are other brands. The current versions can handle sustained sequential limits of 3GBytes/sec reads and 1GByte/sec RAID writes. Throughput is not really an issue for systems like this. You just need to workout the VMS storage parameters and document the heck out of it so you can cookie cutter them...
I pick SAS when the distances are close enough and the number of hosts are small because it always just works. Dual 6Gbps HBA are <$200 each and I can't touch the efficiency and low latency of SAS with anything except FC. (FC massively bumps the cost and complexity.)
FC is better performance wise than iSCSI all other things being equal, but iSCSI is slightly easier to support for network savvy types... Slightly...
If you pick iSCSI, just use 10G and keep things simple. Dual Intel 10G NICs are ~$850-ish use Cat6 cabling and are rock solid. For a pair of hosts and one dual-controller SAN, you do not need a 10G switch. Just directly attach one 10G port to each SAN controller. for two SANs and one host, same thing. One port for each controller dedicated cable... If you grow larger get a dedicated 10G switch (blade, or rack), but don't pay more than $~10K for 24 10G copper ports... You only need the Layer-2 network switching. No fancy bells and whistles do you any good for video storage.
The high end systems get you support where the SAN phones home and dispatches the tech to repair your stuff automatically. They automigrate your hot spots to SSD banks to make sure the disks are not causing you grate bottlenecks... They also can provide a lot of Flash-Copy/Snapshot things that VMS storage just doesn't use normally...
You will need to figure out what the VMS you choose needs for the IOPS/Camera/viewing loads you need. That is the homework someone must do before spec'ing the system...
Does this help?
Somewhat. Our current system is fc right now (don't get me started on parallel SCSI). Data transport has been extremely reliable. In 6-1/2 years of continuous operation on 33 DAS servers/storage, we lost one HBA and had one bad interconnect. After we replaced the failed HBA, the server randomly rebooted for months, Eventually I discovered that hp didn't "play nice" with storport drivers - thanks hp level 1 for your lack of support!
I'm bringing in both EMC and DDN. I'm not certain our Integrators will like it but the storage companies they've brought to the table so far don't give me real warm, fuzzy feelings.
SAN (or even ET) phoning home won't happen. Due to the nature of our business and our regulations, there can be no connection to the outside world; at least not unsupervised.
You might consider having the contract offers stipulate that you go straight to level 2 for all service issues. Most of the storage/server folks offer that option for a small extra fee anyway. You just have to request it and make it part of the deal...
Sometimes level one support folks are just fine and do a great job, and sometimes they are just an obstacle in the service path... Hard to predict, but maybe better to avoid... Knowing how things are going to go up front might be better all around.
Started by Jermaine Wilson
|less than a minute by Jermaine Wilson|
Started by Undisclosed Manufacturer #1
|about 5 hours by Undisclosed Integrator #2|
Started by Conor Healy
|about 6 hours by Undisclosed Integrator #7|
Started by John Honovich
|about 5 hours by Undisclosed #2|
Started by Brian Rhodes
|3 minutes by Brian Rhodes|
Back to Top