Stanley, a lot of big storage 'manufacturers' use Supermicros hardware, re-brand it and throw their 'secret sauce' on it. In general, Supermicros is a safe choice, if not a well known brand in itself.
Why did it fail? How old is it? Any performance specs on the current storage hardware?
I have had the same problem with an Intransa Storage and have started to use BCD Video who use HP products.
I chose the Supermicros based on the long warranty, 3 years. They have honored the warranty and have been very responsive.
IPVMU Certified | 08/22/13 03:12pm
I would definitely hope you’re not looking to try and put 72TB in the cloud.
We’ve been using AC&NC Jetstor for a while now and are pretty satisfied by their ease of use, performance and service. The largest we’ve deployed is a 2 x 16 bay system (one primary unit and one expansion unit) with 64TB raw storage approaching the 3 year mark with no problems so far.
Stanley, My two cents...If you have someone honoring a 3 yr warranty and have been very responsive I would stick with them. As for the corruption issue I assume your talking hard drive related. If I could do it and the server didn't have a problem with a different brand name I might consider looking at a different drive brand. I also have to assume these are enterprise rated. If not then I would really be looking at that as an option. Security systems beat hard drives up. I have started looking at surveillance rated drives that have a higher write cycle MTBF. I passed on Intransa due to the super high cost verses off the shelf stuff. As a side note in the consumer and commercial industry we use software to detect drive failure. It's a crapshoot at best and I have witnessed software detecting it sometimes days in advance on a regular system of normal read and write cycles. I only mention this because it seems to me in security we don't focus on utility software as much. Security pros tend to strip down everything but the essentials to get the best performance but that comes at a cost. Unfortunatly I can't give any first hand experiences to that large of a system as a whole. All of my servers and systems run Seagate hard drives with only one failure in about 20 I have been running for the last five years. Nothing special about them. The ones you buy from any store work for my little systems. The one that did fail the fan on the DVR seized up in a 90 degree day inside an enclosure (read oven).
I've used the Dell MD3200i before with great success. Not the cheapest but it is extremely reliable and Dell has great support. Don't buy into the Seagate hard drives, storing vidoe is like storing other data where the drives are used 24/7, you just have heavy writing with random reads. Good server/enterprise class drives will do. I've run into both Intransa (now gone) and Supermicro in the past and had performance and higher than normal drive failures. IMO you should ave yourself the trouble and go for something like a Dell or IBM even with the higher price tag as your total ROi will be higher than a Supermicro or the like.
Stanley, are your servers running RAID too ? Do you know what RAID vendor ? There are problems with Segate drive timeouts with certain combinations of drive firmware and raid vendor. Just interested in what you're using.
Drives do fail, and it doesn’t matter it is from HP or from Intransa. Look at google’s study on disk failure rate based on 100,000 drives (http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/disk_failures.pdf).
Couple of key points:
- The failure rate is far higher than what disk manufactures specified.
- The failure rate is proportional to how heavily the drives been used (Video Surveillance is one of the worst)
- Disk failure rate will go higher after first 3 years.
What this means for video surveillance applications?
From planning side, using RAID6, you will get troubles easily with RAID5. Rebuild itself with large capacity drives will corrupt your data due to disk UER (uncorrectable error rate). With RAID6, you dont need to worry about UER, and system can afford to have 2 simultaneous drive failures. Also, how to carve up the storage system is important, you want spread the risk, you don’t want lose ALL your data in case there are two drive failures in the system.
Need to keep eyes on your system. It is not a set and forget situation. Any time you have one drive failure, you need to replace it ASAP. This is more important especially if you have aged system, prepared to have 5% or more annual disk failure rate for aged system.
Make your environment better. We have customer who has bad power problems, where storage system got unexpected shutdown/startup for more than 10 times in two days. This is very bad for drives. Not only you may lose user data, some situations, you may completely corrupt the storage system.
I tend to disagree with the 3-year drive failure statement. In my 10 years of experience with RAID-for-video, I've found that drive failure rate predictions are often wrong. I've seen enterprise drive failure rates ramp up in as little as a year in some systems while other systems are still having very low failure rates after 7 years of continuous use.
I believe there are too many factors that affect hard drive failures to allow one to quantify and project actual failure likelihood. These factors range from drive design and manufacturing processes through RAID system design and setup. In fact, I've seen more premature drive "reported failures" caused by improper RAID system setup than by actual failed drives.
One of the main factors in this failure mode is how the drives and system handle slow drive response to SMART and controller "drive checks". We've found that systems that are recording video act in ways that are unexpected by storage manufacturers who are used to providing IT storage. Constant writing is exactly the opposite usage from normal operations and many RAID systems typically allow a drive 7 seconds to respond to health queries. In Surveillance applications, drives that are constantly writing will often take longer to respond, causing the controller to fail them. Suggested cures range from removing and re-inserting failed drives to turning off the "Drive Check Period" setting.
One fairly neat workaround is the method used by DDN (DataDirect Networks). Their systems are capable of automatically cycling power to a drive that exhibits failure symptoms, running tests after its reboot and initiating a partial rebuild of just that drive's lost data, which can be completed in minutes, rather than the hours or days a total rebuild would take.
In another case, we experienced rapidly-increasing enterprise drive failures after a bit over a year of operation. The problem was diagnosed by the drive manufacturer as faulty head bonding (the r/w heads became detached from the actuator). This only affected one batch of the drives that accounted for perhaps 20% of the total drives in the system. This demonstrates a rule every purveyor of large systems should consider - drives should never all be from the same batch. An interesting point is that drives from other batches that were installed at the same time still pass media scans after 7 years of continuous use.
"Anyone using Supermicros as a storage server? Any problems?"
We have had good experience with Supermicro storage servers, but our usage differs substantially.
We've been using two supermicro servers procured in late 2008 without trouble. One was procured from Padova, and the other was procured on Ebay "slightly used."
We have nowhere near the number of cameras you are using -- we are storing fewer than 10 simultaneous MPG4 encoded HD video streams, together with other ancillary sensor data. Our storage is significantly less than yours (just under 8 TB). Beyond issues of scale, our servers are functional only during particular events, which, together with setup and testing intervals, probably amount to about 3 months out of the year. Finally, we install a new software build on them every 18 months or so.
Based on our positive experience, we would continue using Supermicro servers.
Stanley, good to see you received a lot of responses. I would consider RAID 6 if RAID 5 isn't safe enough for your application...I'm assuming its not as 72TB is an awful lot of video to lose. Make sure that your power supplies are connected to two different power sources to avoid disruptions. I'll assume that you have a UPS for the equipment as well. Hope this helps.
Thanks for the information. Great site and worthwhile reading, expecially the intro into IP video.
Stanley, what is the model number of the Seagate drives you are using? Seagate's "surveillance" drives, family "SV35" are not Enterprise class drives, which is their "Constellation ES" (recently renamed to "Enterprise Capacity"). If you are using the SV35 family in a large sytem (my rule of thumb is 8 drives or more in one chassis) you could have rotational vibration issues, tolerance to which is not built into SV35, or RAID "compatibility" issues because SV35 is not tested with the various RAID cards like the Enterprise drives are.
Guys, no disrespect to the previous poster at all, but rotational vibration issues may not be the problem. I remember a deployment years ago when a RAID vendor came to our customer site with a spirit level and said the servers were not racked correctly. There was a 20 second silent pause while everyone had a strange look on their face.
Also these are my personal thoughts, right or wrong, just my perspective ;)
From what I've seen in the systems we've deployed is that RAID systems fail due to communication issues between the raid card firmware and the hard drives ( and firmware ) it is managing. Also factor in the end user's lack of urgency when the first drive fails and they do not replace it quick enough ( read days or weeks ).
The RAID systems are getting constantantly hammered 24/7 with streams of data, there is no let up, no pausing, just incoming data that needs to be saved now, and hurry up because there's more data coming after that. RAID is managing it's hard drives and hopefully they're all cooperating nicely. Obviously drives timeout every now again, hiccup, reduce their write speeds, whatever. My thinking is that the RAID system does not handle this cleanly. If we were deploying a RAID based document storage system, the RAID card could probably handle this nicely in that it can re-try for a few times and eventually write it's data to all the drives. But surveillance systems are not like that. Write this data NOW because there's more stacking up behind it.
So RAID cannot handle the drive issues correctly, data get's partially written, or drives get dropped from the raid set. Users don't respond quick enough in a say RAID-5 deployment and the end result is corrupted file sytems or dead raid sets.
We've seen good raid->drive combinations that seem to behave well and we've seen problematic raid->drive combinations where timeouts are common and therefore raid issues appear.
Again, just my observations / comments.