Recently, I’ve started to have some issues with the HP MSA20 units attached to my SAN server at my office. These MSA20 units stored all my Virtual Machines inside of a VMFS filesystem which was presented to my vSphere cluster hosts over iSCSI using Lio-Target. In the last while, these logical drive has just been randomly disappearing, causing my 16+ virtual machines to just halt. This always requires me to shut off the physical hosts, shut off the SAN server, shut off the MSA20s, and bring everything all the way back up. This causes huge amounts of downtime, and it just a pain in the butt…
I decided it was time for me to re-do my storage system. Preferably, I would have purchased a couple HP MSA60s and P800 controllers to hook it up to my SAN server, but unfortunately right now it’s not in the budget.
A few years ago, I started using software RAID. In the past I was absolutely scared of it, thought it was complete crap, and would never have touched it, but my opinion drastically changed after playing with it, and regularly using it. While I still recommend businesses to use Hardware based RAID systems, especially for mission critical applications, I felt I could try out software RAID for the above situation since it’s more of a “hobby” setup.
I read that most storage enthusiasts use either the Super Micro AOC-SASLP-MV8, or the LSI SAS 9211-8i. Both are based off different chipsets (both of which are widely used in other well known cards), and both have their own pro’s and con’s.
During my research, I noticed a lot of people who run Windows Home Server were utilizing the AOC Super Micro Card. And while using WHS, most reported no issues whatsoever, however it was a different story when reading posts/blog articles from people using Linux. I don’t know how accurate this was, but apperently a lot of people had issues with this card under heavy load, and some just couldn’t get it running inside of linux.
Then there is the LSA 9211-8i (which is the same as the extremely popular IBM M1015). This bad boy supports basic RAID operations (1, 0, 10), but most people use it with JBOD and simply use Linux MD Software RAID. While there was numerous complaints about users having issues with their systems even detecting their card, other users also reported issues caused by the BIOS of this card (too much memory for the system to boot). When people did get this card working though, I read of mostly NO issues under Linux. Spent a few days confirming what I already had read and finally decided to make the purchase.
Both cards support SAS/SATA, however the LSI card supports 6Gb/sec SAS/SATA. Both also have 2 internal SFF8087 Mini-SAS connectors to hook up a total of 8 drives directly, or more using an SAS expander. The LSI card uses a PCIe (V.2) 8x slot, vs the AOC-SASLP which uses PCIe (V.1) 4x slot.
I went to NCIX.com and ordered the LSI 9211-8i along with 2 breakout cables (Card Part#: LSI00194, Cable Part#: CBL-SFF8087OCF-06M). This would allow me to hook up a total of 8 drives (even though I only plan to use 5). I already have an old computer I already use with an eSATA connector to a Sans Digital SATA Expander for NFS, etc… that I plan on installing the card in to. I also have an old Startech SATABAY5BK enclosure which will hold the drives and connect to the controller. Finished case:
(At this point I have the enclosure installed along with 5 X 1TB Seagate 7200.12 Barracuda drives)
Finally the controller showed up from NCIX:
I popped this card in the computer (which unfortunately only had PCIe V1), and connected the cables! This is when I ran in to a few issues…
-If no drives were connected, the system would boot and I could succesfully boot to CentOS 6.
-If at all I pressed CTRL+C to get in to the cards interface, the system would freeze during BIOS POST.
-If any drives were connected and detected by the cards BIOS, the system would freeze during BIOS POST.
I went ahead and booted in to CentOS 6. Downloaded the updated firmware and BIOS and flashed the card. The flashing manual was insane, but had to read it all to make sure I didn’t break anything. First I updated both the firmware and BIOS (which went ok), however I couldn’t convert the card from IR firmware to IT firmware due to errors. I google’d this and came up with a bunch of articles, but this one: http://brycv.com/blog/2012/flashing-it-firmware-to-lsi-sas9211-8i/ was the only one that helped and pointed me in the right direction. Essentially just stating you have to use the DOS flasher, erase the card (MAKING SURE NOT TO REBOOT OR YOU’D BRICK IT), and then flashing the IT Firmware. This worked for me, check out his post! Thanks Bryan!
Anyways, after updating the card and converting it to the IT firmware. I still had the BIOS issue. I tried the card in another system, and still had a bunch of issues. I finally removed 1 of 2 video cards and populated the card in a Video Card slot, and I finally could get in to the BIOS. First I enabled staggered spin-up (to make sure I don’t blow the PSU on the computer with a bunch of drives starting up at once), changed some other settings to optimize, and finally disabled the boot BIOS, and changed the option for the adapter to be disabled for boot, and only available to the OS. When removing the card, and putting it in the target computer, this worked. Also noticed that the staggered spin-up started during the Linux kernel startup when initializing the card. Here’s a copy of the kernel log:
mpt2sas version 08.101.00.00 loaded
mpt2sas 0000:06:00.0: PCI INT A -> Link[LNKB] -> GSI 18 (level, low) -> IRQ 18
mpt2sas 0000:06:00.0: setting latency timer to 64
mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3925416 kB)
mpt2sas 0000:06:00.0: irq 24 for MSI/MSI-X
mpt2sas0: PCI-MSI-X enabled: IRQ 24
mpt2sas0: iomem(0x00000000dfffc000), mapped(0xffffc900110f0000), size(16384)
mpt2sas0: ioport(0x000000000000e000), size(256)
mpt2sas0: sending message unit reset !!
mpt2sas0: message unit reset: SUCCESS
mpt2sas0: Allocated physical memory: size(7441 kB)
mpt2sas0: Current Controller Queue Depth(3305), Max Controller Queue Depth(3432)
mpt2sas0: Scatter Gather Elements per IO(128)
mpt2sas0: LSISAS2008: FWVersion(13.00.57.00), ChipRevision(0x03), BiosVersion(07.25.00.00)
mpt2sas0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
mpt2sas0: sending port enable !!
mpt2sas0: host_add: handle(0x0001), sas_addr(0x5000000080000000), phys(8)
mpt2sas0: port enable: SUCCESS
SUCCESS! Lot’s of SUCCESS! Just the way I like it! Haha, card intialized, had access to drives, etc…
Configured the RAID 5 Array using a 256kb chunk size. I also changed the “stripe_cache_size” to 2048 (the system has 4GB of RAM) to increase the RAID 5 performance.
cd /sys/block/md0/md/
echo 2048 > stripe_cache_size
At this point I simply formatted the drive using EXT4. Configured some folders, NFS exports, and then used Storage vMotion to migrate the Virtual Machines from the iSCSI target, to the new RAID5 array (currently using NFS). The main priority right now was to get the VMs off the MSA20 so I could at least create a backup after they have been moved. Next step, I’ll be re-doing the RAID5 array, configuring the md0 device as a iSCSI target using Lio-Target, and formatting it with VMFS. The performance of this Software RAID5 array is already blowing the MSA20 out of the water!
So there you have it! Feel free to post a comment if you have any questions or need any specifics. This setup is rocking away now under high I/O with absolutely no problems whatsoever. I think I may go purchase another 1-2 of these cards!
Hi,
how exactly did you enable staggered spinup? I found no option in the card’s BIOS to enable it (just group timings). Maybe I just overlooked. As it is now all the disks just spin up at start.
My card is flashed in IT mode, latest firmware (with BIOS)
Thanks
Dave
Hi Dave,
Take another look… There was definitely a spot in the BIOS for staggered spin ups. Let me know how you make out, as I use the same firmware.
Stephen
Hi Stephen,
I’ve rechecked and all I can find that may affect it was (name :[ my current setting]) ;
1) Boot support: [Enabled OS Only]
2) Direct Attached Spinup Delay (secs): [5]
3) Direct Attached max Targets to Spinup: [1]
4) Report Device missing Delay: [4]
5) IO Device Missing delay: [4]
But staggered spinup is not working like that. Directly after I press the power button all drives
spin up and eat 114W where they should not pass 67W approx.
Staggered spinup on the sata connectors on the motherboard works.
I have 4x WD30EFRS Red drives connected to the 9211. I’ve tried jumpers on pin 2-3 to enable PM2 (power management – should enable Power-up in Standby according to WD docs), but that makes no difference.
Setting Power-up in Standby as a drive firmware setting through HDPARM -s1 causes the controller to not detect the drive, after which the OS doesnt see it either.
Bios version 7.27.00.00
Firmware version 14.00.00.00
Anything rings a bell?
Perhaps you could take a look at your settings.
Thanks for the help,
Dave
Hi Dave,
To be honest, that second option (Direct Attached SPinup Delay), is actually what caused mine to spin up… What’s even more interesting, is that in my case the spin up occurs when the Linux kernel loads the driver module, and calls out to the drives… When the drives are called, that’s when my spin up occurs…
I’ll see if I can do some research which might help. Unfortunately in my case, I’m one of those people that had issues with the BIOS… For me to get in to the BIOS, I have to take this card out, and put it in a different computer (I’m sure you have read about these problems)…
Just curious, I see you have Boot support enabled… Do you need to boot off this card? Try disabling boot support all together if possible. Only do this if you’re not booting off the array. Let me know if this makes a change…
I remember that at the beginning I did have issues with all the drives EATING power when they all started together, but after going in to the BIOS, configuring it (in my case, no boot, staggered spinup, nothing fancy), everything actually came together.
Stephen
Hi Stephen,
Yes, I know about the BIOS issues. To reach my controller’s BIOS I need to unplug the boot drive and restart or select the controller as boot drive. I am running an Asus E45M1-M Pro motherboard. It’s an UEFI one so maybe there is a difference there.
I was not entirely clear on what the Boot support option was; the description was a bit vague, so I followed your lead. I don’t need to boot from them so I’ll try that and get back to you. Would be great if it could let linux handle the spinups.
The array itself has yet to be made; still in the testing phase so nothing much can go wrong.
By the way, have you had problems with spinning up disks in linux yet? From kernel 3.1.10 on there’s a bug that prevents spinup throught he 9211. You can test by bringing one down with ‘hdparm -y /dev/sdx’ and they should come up again with ‘hdparm -t /dev/sdx’. Should be safe but you may want to unmount your array first.
Thanks,
Dave
Come to think of it, which version of Linux / kernel do you run?
Hi Dave,
Let me know how it works out… I turned off everything to do with boot, and I agree the BIOS is somewhat vague (I think a lot of it was translated)… I just turned off everything I thought I wouldn’t need. My case is same as yours, I just needed linux to see the disks as is…
As far as my setup. I’m running CentOS 6… And I use whatever the current kernel is (I always keep the box up to date). I’ve actually never had any issues, and am currently on: 2.6.32-220.17.1.el6.x86_64 (keep in mind the redhat based enterprise distros are still running 2.6)…
Too be honest, I’ve been using this setup for a very LONG LONG time, and I absolutely friggin LOVE it. I got scared by doing my own thing (versus purchasing a enterprise storage unit), and I have to say I’m very impressed. The speeds are amazing. I actually had 1 drive fail, and the controller handled it beautifully. Even when populating a new disk you could see through the kernel messages that it was all handled perfectly (fully supported hot swapping).
Hi Stephen,
Yes, I’ve been in love with soft RAID for a few years now. Love the control and stability of it, and you learn a thing or two on the side.
For this I’m aiming to build a ZFS server, for the integrated filesystem checksums. I’m also aiming to keep it as low power as possible, hence this quest 😉
Your kernel version should not have issues (yet). Maybe I should have chosen CentOS too, but in any case It gives me a pointer. I’ll try with a pre-3.1.10 kernel.
Did you enable Power-up on Standby on your disks? (via HDPARM or a jumper). I can hardly imagine you didn’t if your disks arn’t spinning up right away.
Hi Dave,
Keep this on the hush hush, but I actually use all this stuff for mission critical services (storage for a virtualization cluser)… I have absolutely NO power management stuff turned on. Everything is setup to use as much juice as possible, when it want’s, etc… The only stuff I actually touched throughout me entire setup was:
1) Staggered Spinups (as we previously discussed)
2) Software RAID settings (for performance, and configuration due to array size)
3) iSCSI and NFS stuff to provide the array on the network
Hi Stephen,
Still no luck with the staggered spinup. I’m going for a slightly larger PSU and give it some time.
In a while, if I figure out what the specifics are of turning it on, I’ll get back to you.
Thanks for the help and good luck with your projects.
Dave
Hi Dave,
That would be great if you could keep us updated with your progress. If I get a chance, I’ll double check my config and stuff, however it’s extremely hard as once that box was setup, it just stays on, untouched, and has for almost a year now… It’s one of those “boxes” if you know what I mean.
Thanks,
Stephen
Installed the SAS 9211-8i under CentOS 6.5 as an HBA in IT mode.
Configured the following parameters:
Boot Support: Enabled OS Only
Advanced Adapter Properties:
Adapter Timing Properties:
Direct Attached Spinup Delay (secs): 2
Direct Attached Max Targets to Spinup: 1
Since I tend to run with the smallest power supply that works (in this case, 300W for six drives), I’m interested in anything that can help. Don’t know about staggered spin up (the drives are pretty quiet so it is hard to hear them start) but it didn’t slug the power supply with five drives attached to the HBA and one on the mobo.
The system booted fine and the drives just appeared with no monkeying around, after a fresh install (i.e. no device driver antics). I do take issue with the bogus drive name order assigned by the system at startup (i.e. bears no resemblance to the order in which the drives are plugged in) but I guess we’ve got to learn to love it. That’s why they allow mount by UUID in fstab, I suppose. Seems to be the only way to go, at this point.
Glad to hear it worked for you. I’m still REALLY REALLY impressed with this controller.