Recently I decided it was time to beef up my storage link between my demonstration vSphere environment and my storage system. My existing setup included a single HP DL360p Gen8, connected to a Synology DS1813+ via NFS.
I went out and purchased the appropriate (and compatible) HP 4 x 1Gb Server NIC (Broadcom based, 4 ports), and connected the Synology device directly to the new server NIC (all 4 ports). I went ahead and configured an iSCSI Target using a File LUN with ALUA (Advanced LUN features). Configured the NICs on both the vSphere side, and on the Synology side, and enabled Jumbo frames of 9000 bytes.
I connected to the iSCSI LUN, and created a VMFS volume. I then configured Round Robin MPIO on the vSphere side of things (as always I made sure to enable “Multiple iSCSI initators” on the Synology side).
I started to migrate some VMs over to the iSCSI LUN. At first I noticed it was going extremely slow. I confirmed that traffic was being passed across all NICs (also verified that all paths were active). After the migration completed I decided to shut down the VMs and restart to compare boot times. Booting from the iSCSI LUN was absolutely horrible, the VMs took forever to boot up. Keep in mind I’m very familiar with vSphere (my company is a VMWare partner), so I know how to properly configure Round Robin, iSCSI, and MPIO.
I then decided to tweak some settings on the ESXi side of things. I configured the Round Robin policy to IOPS=1, which helped a bit. Then changed the RR policy to bytes=8800 which after numerous other tweaks, I determined achieved the highest performance to the storage system using iSCSI.
This config was used for a couple weeks, but ultimately I was very unsatisfied with the performance. I know it’s not very accurate, but looking at the Synology resource monitor, each gigabit link over iSCSI was only achieving 10-15MB/sec under high load (single contiguous copies) that should have resulted in 100MB/sec and higher per link. The combined LAN throughput as reported by the Synology device across all 4 gigabit links never exceeded 80MB/sec. File transfers inside of the virtual machines couldn’t get higher then 20MB/sec.
I have a VMWare vDP (VMWare Data Protection) test VM configured, which includes a performance analyzer inside of the configuration interface. I decided to use this to test some specs (I’m too lazy to actually configure a real IO/throughput test since I know I won’t be continuing to use iSCSI on the Synology with the horrible performance I’m getting). The performance analyzer tests run for 30-60 minutes, and measure writes and reads in MB/sec, and Seeks in seconds. I tested 3 different datastores.
Synology DS1813+ NFS over 1 X Gigabit link (1500MTU):
- Read 81.2MB/sec, Write 79.8MB/sec, 961.6 Seeks/sec
Synology DS1813+ iSCSI over 4 x Gigabit links configured in MPIO Round Robin BYTES=8800 (9000MTU):
- Read 36.9MB/sec, Write 41.1MB/sec, 399.0 Seeks/sec
Custom built 8 year old computer running Linux MD Raid 5 running NFS with 1 X Gigabit NIC (1500MTU):
- Read 94.2MB/sec, Write 97.9MB/sec, 1431.7 Seeks/sec
Can someone say WTF?!?!?!?! As you can see, it appears there is a major performance hit with the DS1813+ using 4 Gigabit MPIO iSCSI with Round Robin. It’s half the speed of a single link 1 X Gigabit NFS connection. Keep in mind I purchased the extra memory module for my DS1813+ so it has 4GB of memory.
I’m kind of choked I spent the money on the extra server NIC (as it was over $500.00), I’m also surprised that my custom built NFS server from 8 years ago (drives are 4 years old) with 5 drives is performing better then my 8 drive DS1813+. All drives used in both the Synology and Custom built NFS box are Seagate Barracuda 7200RPM drives (Custom box has 5 X 1TB drives configured RAID5, the Synology has 8 x 3TB drives configured in RAID 5).
I won’t be using iSCSI or iSCSI MPIO again with the DS1813+ and actually plan on retiring it as my main datastore for vSphere. I’ve finally decided to bite the bullet and purchase an HP MSA2024 (Dual Controller with 4 X 10Gb SFP+ ports) to provide storage for my vSphere test/demo environment. I’ll keep the Synology DS1813+ online as an NFS vDP backup datastore.
Feel free to comment and let me know how your experience with the Synology devices using iSCSI MPIO is/was. I’m curious to see if others are experiencing the same results.
UPDATE – June 6th, 2014
The other day, I finally had time to play around and do some testing. I created a new FileIO iSCSI Target, I connected it to my vSphere test environment and configured round robin. Doing some tests on the newly created datastore, the iSCSI connections kept disconnecting. It got to the point where it wasn’t usable.
I scratched that, and tried something else.
I deleted the existing RAID volume and I created a new RAID 5 volume and dedicated it to Block I/O iSCSI target. I connected it to my vSphere test environment and configured round robin MPIO.
At first all was going smoothly, until again, connection drops were occurring. Logging in to the DSM, absolutely no errors were being reported and everything was fine. Yet, I was at a point where all connections were down to the ESXi host.
I shut down the ESXi host, and then shut down and restarted the DS1813+. I waited for it to come back up however it wouldn’t. I let it sit there and waited for 2 hours for the IP to finally be pingable. I tried to connect to the Web interface, however it would only load portions of the page over extended amounts of time (it took 4 hour to load the interface). Once inside, it was EXTREMELY slow. However it was reporting that all was fine, and everything was up, and the disks were fine as well.
I booted the ESXi host and tried to connect to it, however it couldn’t make the connection to the iSCSI targets. Finally the Synology unit became un-responsive.
Since I only had a few test VMs loaded on the Synology device, I decided to just go ahead and do a factory reset on the unit (I noticed new firmware was available as of that day). I downloaded the firmware, and started the factory reset (which again, took forever since the web interface was crawling along).
After restarting the unit, it was not responsive. I waited a couple hours and again, the web interface finally responded but was extremely slow. It took a couple hours to get through the setup page, and a couple more hours for the unit to boot.
Something was wrong, so I restarted the unit yet again, and again, and again.
This time, the alarm light was illuminated on the unit, also one of the drive lights wouldn’t come on. Again, extreme unresponsiveness. I finally got access to the web interface and it was reporting the temperature of one of the drives as critical, but it said it was still functioning and all drives were OK. I shut off the unit, removed the drive, and restarted it again, all of a sudden it was extremely responsive.
I removed the drive, hooked it up to another computer and confirmed that it was failed (which it was).
I replaced the drive with a new one (same model), and did three tests. One with NFS, one with FileIO iSCSI, and one with BlockIO iSCSI. All of a sudden the unit was working fine, and there was absolutely NO iSCSI connections dropping. I tested the iSCSI targets under load for some time, and noticed considerable performance increases with iSCSI, and no connection drops.
Here are some thoughts:
- Two possible things fixed the connection drops, either the drive was acting up all along, or the new version of DSM fixed the iSCSI connection drops.
- While performance has increased with FileIO to around ~120-160MB/sec from ~50MB/sec, I’m still not even close to maxing out the 4 X 1Gb interfaces.
- I also noticed a significant performance increase with NFS, so I’m leaning towards the fact that the drive had been acting up since day one (seeks per second increased by 3 fold after replacing the drive and testing NFS). I/O wait has been significantly reduced
- Why did the Synology unit just freeze up once this drive really started dying? It should have been marked as failed instead of causing the entire Synology unit not to function.
- Why didn’t the drive get marked as failed at all? I regularly performed SMART tests, and checked drive health, there was absolutely no errors. Even when the unit was at a standstill, it still reported the drive as working fine.
Either way, the iSCSI connection drops aren’t occurring anymore, and performance with iSCSI is significantly better. However, I wish I could hit 200MB+/sec.
At this point it is usable for iSCSI using FileIO, however I was disappointed with BlockIO performance (BlockIO should be faster, no idea why it isn’t).
For now, I have an NFS datastore configured (using this for vDP backup), although I will be creating another FileIO iSCSI target and will do some more testing.
Update – August 16, 2019: Please see these additional posts regarding performance and optimization:
Hey,
Congrats on your new purchase. Couple things you may or may not know already. Make sure to upgrade to the latest DSM 5.0 on Synology’s support site. They have made huge improvements to the OS for iSCSI (they are saying like 6x) from what I have read. I just bought my DS1813+ a few weeks back as well and it came with a really old version (I think 4.2) of the OS. I immediately upgraded after reading how much better DSM 5.0 handles iSCSI. IMHO, DSM 5 seems very stable and has a number of added features that might be of interest to you. How many drives do you have and what RAID level/config did you go with? I am using RAID-10 with 8x WD RE4 2TB drives and its fast. I feel I am getting great speeds. I am also using iSCSI MPIO-RR with ESXi 5.5 Update-1. I didn’t play the IOPS setting as you mentioned.
Here are some results from DSM’s resource monitor while running Crystal HD Benchmark v3.0.3 x64
Volume/iSCSI:
– Utilization: 58
– Transfer Rate: 128MB/s
– Peak Read IOPS: Doesn’t Register for some reason
– Peak Write IOPS: 10204
Network:
– Peak Sent: 139MB/s
– Peak Received: 112MB/s
If you haven’t already, look in to creating what Synology calls “Regular Files” iSCSI LUN and enabled the “Advanced LUN Features” option. When you enable this feature, the NAS supports VAAI capabilities. VAAI speeds up your many of your storage operations by offloading the operations to the NAS unit, instead of the hypervisor having to read and rewrite the data back to the NAS, the NAS copies the blocks within itself so the data never travels the network.
I love my Synology DS1813+
Good Luck!
I should have probably mentioned I am only using 2 of the 4 1GbE ports on the Synology NAS and 2 1GbE uplinks on ESXi with Jumbo Frames enabled on an isolated VLAN.
Regards,
Ash
Hi Ash,
Thanks for your comments. I’ve actually had this running for over a year. Before trying out the iSCSI feature, I was using NFS which provided the highest amount of throughput in my scenario. When I upgraded to DSM 5 a week ago, I noticed no major increase in speed with the iSCSI Target.
To answer your questions: As I mentioned in the blog post, I was using the Advanced LUN features options. Also, I mentioned that I’m using 8 X 3TB Seagate 7200 drives in RAID 5.
Please note that the information provided from the Resource Monitor isn’t exactly accurate as this is information provided by the Linux kernel running on the Synology device. Real world speeds are actually slower. And to see more accurate speeds (which still will be inaccurate), it’s better to look at the “Disk” instead of the “Volume/iSCSI” inside of the resource manager.
To properly benchmark, you’ll need to configure a Virtual Machine and run benchmarks inside of it against the storage, which will actually measure the iSCSI LUN throughput and IOPS. The measurements I provided in the blog post above reflect this.
Also one other thing I wanted to mention in the blog post which I forgot: under extremely heavy load some of the paths would actually be lost and go offline to the iSCSI target. At one point, 3 of 4 of my paths went offline, I was really nervous I was going to lose access to the LUN.
Don’t get me wrong, I absolutely love the Synology products. I just won’t be using them for MPIO iSCSI anytime soon.
Stephen
Steven,
Sorry, to be honest, I read your blog early on and didn’t read over it again once I went to post. I was only trying to be helpful.
You can measure performance from the VM level as well like you mentioned, and you can also measure IOPS and network throughput with the ESXTOP utility on the hypervisor.
I am experiencing the problem you described with connections being lost (completely including Web GUI, SSH) when high utilization occurs. I noticed only one Synology NAS is listed on the compatibility support matrix for ESXi 5.5. I started a ticket with Synology recently to follow up on the problem.
Regards,
Ash
No worries at all! 🙂
Just so you know, in my case, I could still access SMB, SSH, NFS, and the Web GUI. It was only the iSCSI connections to the ESXi hosts that were dropping.
Have there been any updates from synology yet that improve this situation at all?
Ash – did you get anywhere with synology with your issues? I see that DSM 5.0-4482 says “Improved the stability of iSCSI target service”. Has that worked for you?
Hi,
Just read your post regarding your Link Aggregated performance on a Synology NAS and iSCSI.
One thing that has become apparent is a mix of link aggregation methods, your ESXi host is set to use a RoundRobin policy of sending information, however this method is not supported on a Synology NAS, I have checked on My NAS and can see there is either a Failover option or a LACP option, this is the IEEE 802.3ad spec, and uses all the link at the same time.
Attempting to use the conflicting methods will cause several issues as the Aggregation type doesn’t match.
Additionally, (from unfortunate experience) the type of drive in use will cause performance issues, Desktop and NAS grade drives are not recommended for installations over 4 drives as they have no vibration tolerance and the iops are affected by sympathetic vibrations.
Based on your blog, The 1st issue I think is your primary concern, as mixing types of Link Aggregation is a BAD idea.
Hi Paul,
Unfortunately there hasn’t been any updates. Since installing the latest versions of DSM, there’s been no change in performance, and the issues of the links going down.
Hi Tim,
Actually the unit does support MPIO round robin, also I am not using link aggregation. But you are correct that you cannot use both.
Link Aggregation does not create any performance increases with a single ESX host since both iSCSI and NFS don’t create multiple connections, and (as per design) multiple connections are required to see increases in performance with Link Aggregation, this is why link aggregation is seldom used with virtualization for storage purposes, and is why iSCSI MPIO using round robin is the choice for most people who virtualize.
While link aggregation does increase fault tolerance, people still regularly choose iSCSI MPIO due to performance reasons. Technically link aggregation would increase performance in a multiple host environment with storage, however you could gain even more performance by using MPIO round robin instead (since it creates more connections to each single host).
Stephen
Hi Stephen,
I spoke to Synology tech support about your issue as I’m keen to find out if the DS1813+ will work for me. They said categorically that round robin isn’t supported on their devices. I couldn’t see anything in their specs that mentioned it either?
Cheers,
Paul.
Hi Paul,
I think you may just be talking to someone who doesn’t know. It’s definitely supported, and numerous people use it. Synology even has documents setup to explain how to configure, and they advertise the device as VMWare Ready, which is because it supports multiple iSCSI connections and handles it properly.
Here’s 2 documents they have for MPIO with Windows:
https://www.synology.com/en-us/support/tutorials/552
http://forum.synology.com/wiki/index.php/How_to_use_iSCSI_Targets_on_Windows_computers_with_Multipath_I/O
Here’s 2 documents they have for MPIO with VMWare ESXi:
http://www.synology.com/en-global/support/tutorials/551
http://forum.synology.com/wiki/index.php/How_to_use_iSCSI_Targets_on_VMware_ESXi_with_Multipath_I/O
These are all support documents they have on their site, and wiki available for customers to configure the units. It’s all supported.
Hope this helps,
Stephen
Hi Stephen,
Thanks for that. Yes I’ve just been reading more on it and agree. What I’m not up to speed on yet is round robin – is that another form of MPIO or just another term for MPIO?
Thanks.
Hi Paul,
Essentially MPIO is just Multiple Path Input Output. I guess you could say this defines multiple iSCSI connections from the initiator to the target (target being iSCSI server). When MPIO is enabled, a host will have multiple connections to the iSCSI target. Keep in mind that this is just a “state” you could call it, of having the multiple connections active.
Underneath MPIO (part of MPIO), there is numerous different “configurations”. I can’t remember off the top of my head of the actual names, but there’s one for “Recently Used” which will use the last available/used path, there’s “Fixed” (which I staticly configured by the user I believe), and then there is “Round Robin” which actually alternates between all available paths. It will send a chunk of data on connection 1, then another chunk on connection 2, etc and circulate, ultimately it provides increased speed, and also provides fail over in case one of the connections go down.
Now going further in to MPIO and Round Robin, by default RR has a pre-configured static amount of IOPSs it will send until the number is hit, in which case it jumps to the next connection. This can be changed using the VMWare CLI to the amount of IOPS you’d like (most people change this value to 1), you can also change it from IOPS to bytes. Some people do this, and set the byte value to the maximum jumbo frame packet size they have configured on the NICs and SAN network.
Hi Stephen
I am a member of the UK Support Team at Synology UK.
We would like to help with this and see if we can look into what’s happening for you. Please contact us directly at [email protected] and we will be able to help resolve any performance issues.
Thank you
Thanks for reaching out Synology support! I appreciate it.
Since I ran in to these issues, I moved everything back to NFS (which has been working great). Since I have this unit in production, I can’t do anything with it for a couple of weeks.
In two weeks I’m implementing a SAN and will be able to take the DS-1813+ offline and experiment. Once I do this, I’ll be able to contact support and test/play with the device.
I actually want to change over from FileIO to BlockIO on the iSCSI target, removing all data from the disks and trying again with the entire RAID volume dedicated as an iSCSI block target (whereas the issue mentioned above occurred with a File based iSCSI target). To see if I have any better results.
Thanks again,
Stephen
Definitely keen to keep following your updates Stephen – I’ve been looking at getting a pair of these units with 8x 4TB WD Reds as a “DR” data store for my Veeam replicas (one in each DC)
With the performance issues mentioned, I may need to reconsider; which is a shame, I have 2x of the smaller units at home (DS413j, DS214se) and I think the are marvellous.
Michael
Hi Michael,
I don’t want you to get the wrong idea from this post. I actually LOVE this device. NFS works AMAZING! And you can definitely max out the 1Gb connection via NFS.
For the cost/performance, it’s totally worth it. I’d highly recommend the Synology devices (and I do to clients/friends that require something like this)!
The issue occurring as mentioned in my post, I’ve only observed when using iSCSI with multiple connections (MPIO). I’m actually thinking that once I wipe the RAID array and dedicate the entire array to a BlockIO iSCSI target, it may resolve the issue (by removing the host filesystem IO overhead).
I’d definitely say that this device would be the perfect DR target. And to be honest, once I get my new SAN configured, my DS1813+ is what I’ll be using for my vDp backup datastore.
Unfortunately I can’t play with anything until the SAN shows up.
(And just a side note, the only reason why I’ve upgraded to a full enterprise SAN is due to work I’m doing with test environments that requires an enterprise SAN. It’s not due to performance limitations of the Synology device.)
PS.
Michael, I’ve got my new HP MSA 2040 coming in tomorrow, possibility if it clears customs I can get it all configured by the end of the weekend.
If this is the case, I’ll be wiping the DS1813+, and setting up the device as BlockIO. If I have time I’ll try to get the iSCSI MPIO configured with BlockIO and do some tests and post some results.
Hey Gang,
Synology support was able to eventually resolve my issue (after 3 long weeks) where the unit would lock up while cloning two or more VMs asynchronously, specifically to different iSCSI datastores. They said the fix will be included in the next patch release (after 4482). I don’t know what exactly they changed, but they had me replace some binaries in the iSCSI modules folder. I waited a while to report back, but the problem is definitely resolved, as I have been pushing the unit really hard ever since. Originally I also had a problem with the LUN Backup feature not working, but that is since fixed by the 4482 patch.
In case you are curious, my setup is the following:
Synology DS1813+
1x 1Gbps Management Interface
1x 1Gbps NFS Interface (VLAN/Subnet isolated with Jumbo Frames)
2x 1Gbps Interfaces dedicated for iSCSI (VLAN/Subnet isolated with Jumbo Frames)
2x iSCSI (Regular Files) Volumes [Thin Provisioning: YES / Advance LUN Features: YES]
8x WD RE 2TB Hard Drives in RAID-10
ESXi 5.5.0 #1746018
2x 1Gbps Interfaces dedicated for iSCSI and NFS traffic
2x iSCSI VMFS-5 Datastores in a Datastore Cluster (VAAI Enabled)
2x iSCSI VMKernel Interfaces (Port Binding)
Round Robin Multipathing Policy Load Balanced by 8800-bytes (Jumbo Frames)
[esxcli storage nmp psp roundrobin deviceconfig set –device=naa.############ –type=bytes –bytes=8800]
I did some performance tests using SQLIO and IOMeter, here are my results:
IOMeter:
64KB 100% Sequential READS = 108 Mbps or 1739 IOPS (almost line rate 1Gbps)
64KB 100% Sequential WRITES = 111 Mbps or 1778 IOPS (almost line rate 1Gbps)
SQLIO:
64KB 100% Sequential READS = 117 Mbps or 1880 IOPS (almost line rate 1Gbps)
64KB 100% Sequential WRITES = 104 Mbps or 1664 IOPS (almost line rate 1Gbps)
ESXTOP showed the workloads were balanced evenly and reflected the numbers the guest reported.
Again, I am pretty happy with this NAS as the features are nice and the speed tolerable. The speed use to be really bad I guess prior to DSM 5.0 as I understand. I started out with DSM 5 so I wouldn’t know, but I hear its much better.
Regards,
Ash
Hi Ash,
Thanks for chiming in and posting the update from Synology…
Do you know exactly what was changed inside of the updates kernel modules?
And one other quick question I have. Before Synology replaced the modules, what exact types of issues were you having with the Synology device? Did you have any of the multiple links go down in an MPIO configuration under load?
I recently got my new SAN setup, so I’m finally going to have some time to troubleshoot mine when I get back home next week. I’m looking forward to getting these issues resolved as I have plans to set the DS1813+ up as a VMWare vDP datastore.
Just on a side note to brag, I mentioned inside of this blog post that I was planning on purchasing a new HP MSA 2040. Just an update, the unit came in about 3 days ago and set it up. Over multiple 10Gb links I’m actually getting up to 1 GB/sec on the new SAN. It’s actually pretty cool.
Will be posting info soon.
Stephen
Stephen,
No I don’t know what was changed but the developers found the bug. My issue was when cloning multiple VMs simultaneously to different iSCSI datastores, the unit would drop all connections (ssh, web, cifs, nfs, iSCSI, syslog, etc…) on all interfaces. I disabled jumbo frames, disabled MPIO and used Fixed paths, and switched from multiple iSCSI vmk interface to single vmk. The problem seemed to happen when cloning from one iSCSI datastore to another so it seemed. Does that sounds like your problem?
Congrats on the new MSA. How much did that run you?
Hi Ash,
It kinda sounds like my problem. I was experiencing two problems:
1) Horrible speeds (Synology was reporting MASSIVE I/O wait)
2) MPIO iSCSI links were going offline.
I actually started playing around today with the DS1813+. I deleted the RAID volume, created a new RAID 5 volume and used Block I/O to create an iSCSI target. Surprisingly enough I’ve been rock solid since switching to Block I/O (no connection drops). So that’s interesting!
As for your question re: the MSA: too much! lol
Stephen
Hey Stephen,
Glad to hear you have narrowed the scope of your problem down to the Regular Files iSCSI volumes. I never tried a block LUN because I wanted to take advantage of the VAAI and Thin provisioning capabilities.
Do you happen to have numbers on the performance you are getting? I know there are a lot variables that determine the numbers but maybe you could run some tests like the ones I mentioned above to see how your performance compares?
I hope Synology releases the fix for the iSCSI issues soon for the rest of the users trying to make good use of their investment. I still think once the problem is resolved the Synology NAS/SAN feature set is amazing especially for the money.
Cheers Stephen!
Regards,
Ash
Ash,
Scratch that, the issue is still occurring. Overnight a link dropped a few times in the MPIO config with the DS1813+.
I’ll be reaching out to support either today to tomorrow to see if I can get my hands on that fix.
Thanks,
Hi there, nice reading and really usefull infos.
I’m struggling with same performance issue, today I’ve installed the new release DSM5.0-4493.
No disconnection in my iscsi luns but there are still slow performance.
my Syno presents massime IO overload (100%) on volume/iscsi chart while disk remains almost at 50%.
Hope to share some good news.
Regards,
Domenico
I was just reading the release notes on new DSM update. Here is the note about the iSCSI connection drops I was experiencing.
4.) Enhanced the stability of iSCSI target service.
As far as the performance, what kind of performance are you talking about specifically. Can you provide us more information? I am curious as to what speeds you guys are getting.
Hi Ash,
Inside a simple VM CentOS I just run a dd into vmdk disk. Writing speed is almost 7MB/s and inside the web interface I have chats showing DiskActivity ad 40% and Volume/ISCSI Activity at 100% with IO lag at 3000 ms (in esxi host with esxtop).
If i mount a iSCSI Lun in my windows workstation I get 100 MB/s.
It seems to me a compatibility issue between ESX iscsi software driver and Syno, but I’m still not able to figure out how to fix it. I’m sticking ad NFS shares (gives me 80 MB/s writing speed with 120 ms Lag).
I have no disconnection issues btw.
bye
Well here is a little update to my situation… (I’ve also inserted it in to the original blog post)
The other day, I finally had time to play around and do some testing. I created a new FileIO iSCSI Target, I connected it to my vSphere test environment and configured round robin. Doing some tests on the newly created datastore, the iSCSI connections kept disconnecting. It got to the point where it wasn’t usable.
I scratched that, and tried something else.
I deleted the existing RAID volume and I created a new RAID 5 volume and dedicated it to Block I/O iSCSI target. I connected it to my vSphere test environment and configured round robin MPIO.
At first all was going smoothly, until again, connection drops were occurring. Logging in to the DSM, absolutely no errors were being reported and everything was fine. Yet, I was at a point where all connections were down to the ESXi host.
I shut down the ESXi host, and then shut down and restarted the DS1813+. I waited for it to come back up however it wouldn’t. I let it sit there and waited for 2 hours for the IP to finally be pingable. I tried to connect to the Web interface, however it would only load portions of the page over extended amounts of time (it took 4 hour to load the interface). Once inside, it was EXTREMELY slow. However it was reporting that all was fine, and everything was up, and the disks were fine as well.
I booted the ESXi host and tried to connect to it, however it couldn’t make the connection to the iSCSI targets. Finally the Synology unit became un-responsive.
Since I only had a few test VMs loaded on the Synology device, I decided to just go ahead and do a factory reset on the unit (I noticed new firmware was available as of that day). I downloaded the firmware, and started the factory reset (which again, took forever since the web interface was crawling along).
After restarting the unit, it was not responsive. I waited a couple hours and again, the web interface finally responded but was extremely slow. It took a couple hours to get through the setup page, and a couple more hours for the unit to boot.
Something was wrong, so I restarted the unit yet again, and again, and again.
This time, the alarm light was illuminated on the unit, also one of the drive lights wouldn’t come on. Again, extreme unresponsiveness. I finally got access to the web interface and it was reporting the temperature of one of the drives as critical, but it said it was still functioning and all drives were OK. I shut off the unit, removed the drive, and restarted it again, all of a sudden it was extremely responsive.
I removed the drive, hooked it up to another computer and confirmed that it was failed (which it was).
I replaced the drive with a new one (same model), and did three tests. One with NFS, one with FileIO iSCSI, and one with BlockIO iSCSI. All of a sudden the unit was working fine, and there was absolutely NO iSCSI connections dropping. I tested the iSCSI targets under load for some time, and noticed considerable performance increases with iSCSI, and no connection drops.
Here are some thoughts:
-Two possible things fixed the connection drops, either the drive was acting up all along, or the new version of DSM fixed the iSCSI connection drops.
-While performance has increased with FileIO to around ~120-160MB/sec from ~80MB/sec, I’m still not even close to maxing out the 4 X 1Gb interfaces.
-I also noticed a significant performance increase with NFS, so I’m leaning towards the fact that the drive had been acting up since day one (seeks per second increased by 3 fold after replacing the drive and testing NFS). I/O wait has been significantly reduced
-Why did the Synology unit just freeze up once this drive really started dying? It should have been marked as failed instead of causing the entire Synology unit not to function.
-Why didn’t the drive get marked as failed at all? I regularly performed SMART tests, and checked drive health, there was absolutely no errors. Even when the unit was at a standstill, it still reported the drive as working fine.
Either way, the iSCSI connection drops aren’t occurring anymore, and performance with iSCSI is significantly better. However, I wish I could hit 200MB+/sec.
At this point it is usable for iSCSI using FileIO, however I was disappointed with BlockIO performance (BlockIO should be faster, no idea why it isn’t).
For now, I have an NFS datastore configured (using this for vDP backup), although I will be creating another FileIO iSCSI target and will do some more testing.
Stephen
Hi Stephen,
Interesting about the drive issues! Glad you’re getting better speeds 🙂
Well I’ve got the 3x DS1813+’s loaded with 8x 4TB WD Reds a piece. Due to network architecture on the production servers, two of them will be NFS over the 4GBe’s LAGG’d. The third one (dev) however is running over iSCSI.
Been benching the iSCSI all weekend, (in addition to a new 10GBe PowerVault array) – specifically around MPIO.
The benchmark setup in relation to a single server and the Synology is the server with 2x 10GBe connections (with iSCSI offload) back to a switch, on separate VLANs, and 2x 1Gb connections into the same VLANs from the Synology.
Synology is configured with 6 of the drives in Synology Hybrid RAID, 2 disk failure. A single 14TB (or so) LUN is created and exported Block Level.
All networking components configured to a 9000 MTU. Benchmark performed inside a Win2008R2 VM, with a 100GB VDisk on the Synology for testing, the OS drive is on the local server storage.
Benching using CrystalDiskMark to begin with –
Single Gig Link – (both connected but VMware set to Fixed path selection)
86.99 MB/sec sequential read
108 MB/sec sequential write
Dual Gig link – round robin at 8800 bytes
64.74 MB/sec sequential read —- 20MB slower than single!!
164MB/sec sequential write — this is MPIO’ing quite well, I imagine this is the cache coming into play too.
Dual Gig link – round robin at 1 IO
65.74 MB/sec sequential read
164MB/sec sequential write
So MPIO a perfomance hit on the reads, but an improvements on the write. The roundrobin policy doesn’t seem to come into play.
I’m not unhappy with the performance, as I said it’ll just be used for Veeam backups, but the more I can squeeze out of it the better.
I will see if I can give the NFS option a try – it’d work better in my environment for presenting a single 14TB LUN (I’m limited to VMFS3 because I have one ESXi 4.1 host that can’t be upgraded :-\ ) however I’ll need to pop into the office to do that – cant rejig network cables from home!
Will keep you posted.
M.
Hi Michael,
I actually noticed the exact same thing. Finally after that weird drive issue, I noticed that the writes were amazing, but the reads were taking a hit using MPIO. It was bizarre.
Just as a side note, in case this helps. A few months back I was doing some testing with round robin to the Synology, I tried IOPS=1, and then decided to change the policy from IOPS to bytes. I noticed I got the best performance for sustained speed by changing IOPS policy to bytes, and setting it at 8800 (leaving enough room for iSCSI overhead). But then again, this was back before the drive wigged out (I think I was running in limp mode since day 1), still surprised the drive never got marked as failed. Might be worthwhile giving it a shot?
One thing I wish I would have done is setup a RAID 10 array and set up MPIO. It’s too bad it never crossed my mind.
I’m not so much concerned about the amount of storage (I’ve got access to over 60TB lmao), but aiming more for speed and IOPS.
Let me know how NFS turns out for you! The other day I decided to just go back to NFS, and set it up with VMWare Data Protection. It’s been working great! Wouldn’t mind a little bit more speed though…
Cheers
Hi Stephen,
Given the lack of speed difference between the round-robin policies, I think I’ll just leave it at the default.
On the PowerVault and 10GBe, however, I noticed that IOPS=1 significantly improved performance over Bytes=8800. Can’t remember the exact numbers, but Bytes=8800 was giving me around 650MB/sec reads, and IOPS=1 pushed that to around 820MB/sec! 30% more 🙂
I’ll give the RAID 10 a go, just to see if it’s some overhead in relation to the SHR, I’m in a position where I can test it. I will have to stick to SHR in the long term, however, as I’m needing the size for the backups.
Interestingly, the two spare slots in the Syno have another 2x 4TB reds in a Raid-1, exported out over SMB via the first NIC to the LAN, as a place for File-Level backups for our SVN etc to go to. I can max out the 1GB link in both directions over SMB, with room to spare!
FWIW, I have also updated to the latest DSM that was released the other day too.
Thanks for sharing the info everyone!
Hi Stephen, have you had any further success with iSCSI?
I’m having the same performance issues with iSCSI. It’s as if Synology has “capped” the speed.
Please see my post regarding it here http://forum.synology.com/enu/viewtopic.php?f=147&t=87990&p=331344
Screenshot of the issue here: http://i.imgur.com/3YllXl6.png
Hi Zuldan,
I experimented with it a bit, noticed I did get some performance increases using an iSCSI Block LUN vs. File LUN, however ultimately I gave up and just utilized NFS. The increases weren’t what I was expecting.
I’m now using the DS1813+ as a backup replication store, so I chose to go with NFS for the performance, and also since vSphere handles NFS volume disconnects a little bit easier then iSCSI.
Hi Stephen
we have weird behavior , and unable to trace the issue , we have 2 boxes DS1813+ setup in HA
we have 2 volumes both exported as NFS Share and mounted into our esxi servers
the issue is when we do copy through scp from external storage or from different data store located in any other nas / server , copy a big file * VMDK for example from different data store any thing more then 20 gb in size
we start seen all esxi losing connectivity to DS1813+ form 10 to 30 sec , this a production environment
and we are out of options
synology is unable to solve the issue for the last 7 weeks now
we are thinking that when we saturated the network interfaces , this happen by doing the copy
a quick over view in the design
2 DS1813+ in HA Setup
both DS1813+ updated to latest release
6 ESXI connected each with 1 g link to nas switch
DS1813+
LAN 1 stand alone – static ip
LAN 2 stand alone – static ip
LAN 3 stand alone – static ip
LAN 4 direct connection for HA setup with the other DS1813+
things we tried
1- copying using different protocols SFTP – SCP – CP – FTP
2- Changing the nas switch
3- ESXI To mount the nfs through a different path – different network card
all report the same behaviour once a big file is copied – reach 20 gb of its original size of 100 gb for example
NFS timeout
4- disabling ASNC -in nfs cache only slowed performance badly – and the issue still happen
any ideas
Regards
Amr
Hi Amr,
To be honest, I’ve never seen any behavior like that, either with the Synology device, or any other NFS hosts for that matter.
One thing to note is that when using NFS, it’s extremely difficult to max out the Synology device. I don’t think this is a saturation issue.
Have you tried copying anything over with HA disabled on the Synology device? If you were able to do this, we could rule out that HA is having an effect.
Also, have you confirmed that the health of all your disks are good? I had an issue where one drive was failing, but it wasn’t reporting as failing or failed. Performance came to a grinding halt, and all access to the Synology unit was lost. After restarting the unit a number of times, all my data was lost. I replaced a disk that finally was marked as failed, and all my issues went away. Thankfully in my case, all my VMs were backed up.
Also, have you tried copying a vmdk via the vSphere client data store browser? This may be an idea to see if it also happens with that.
And one last question I have to ask. When you’re copying, I’m assuming the VM has been removed from inventory and powered off? Just to make sure there aren’t any locks on the files, or anything like that.
Stephen
Hi Stephen
Thank you for your quick respond i been going in circles as this first time i see such behaviour
only thing i didn’t try is disable the HA , since its production environment its very sensitive subject
to answer your questions
all disks are running fine and been audited weekly
we have 2 volumes
4 disks samsung ssd pro 512 gb configured in raid 5
4 disks wd 4 TB configured in raid 10
in daily operation everything is running smooth and we getting good IOPS from both volume
i notice that network speed is not reaching its utilization but since performance is good on the vm’s it self we didn’t care
another note that i did try the upload direct to data store from vsphere client same issue big vmdk file or any file more then 30 gb , once the transfer reach 20 gb we start seen esxi losing connection
another then if we have windows 2008 vm for example already on the ssd volume and it generating backup on another disk located within the same folder same volume
same issue happen
regarding vm lock question confirmed of course no locks on the file no errors in write
one more thing i did test
copying the file away from nfs by scp
the idea was SCP From esxi direct to synlogy box via ssh so we avoid nfs
same thing
Regards
Hi Amr,
This is definitely an interesting issue. I know you said you can’t do it, but I can’t comment until you disable HA. Or atleast take one of the HA nodes offline.
I’m thinking this may be causing the issue somehow. Maybe when it hits 20GB it starts replicating, or maybe it’s live replicating, but something is causing an issue when it hits that 20GB mark.
Just curious, have you SSH’ed in to the Synology device and checked out the kernel messages when the units freeze up?
That may shed some insight!
Stephen
I have no tried the to look for kernel messages due the system logs generated by synology they said they does not have any thing
btw the units do not freeze the esxi lose nfs connectivity and restore it automaticly window of losing connectivity between 5-10 sec to 90 sec
but i will check it out kernel messages in /var/log ?
i am going with my thoughts same direction that it might be replication – HA issue
i will try to schedule a down time or so to test
Amr,
Next time you try this, SSH in to the Synology device. Wait for the transfer to freeze and the connectivity to be lost, and then type in “dmesg” (without quotations). This will output the kernel logs on the Synology device. There may be some helpful information in there.
But again, I’m leaning towards HA.
Incredible discussion. Thanks to all of you. We were excited about using a Synology as an iSCSI SAN on a new vSphere deployment in the next few weeks… We are now leaning towards an HP SAN instead.
Hi Stephen,
try modify First burst lenght and Max receive data segment length according to
https://forums.freenas.org/index.php?threads/iscsi-multipathing-slower-then-single-path-for-writes.22475
Good thread Stephen. I’ve seen the iSCSI connectivity issues too under high load – though on a lower end model but still with MPIO, four NICs etc. Diagnosing my logs with VMware they highlighted SCSI resets (SENSE code H:0x8) which they felt were likely to be the storage array (it wasn’t the hosts) though I’m still waiting to see what Synology have to say. I’ll post an update if I get anything concrete.
[…] Synology DS1813+ – iSCSI MPIO Performance vs NFS | … – Recently I decided it was time to beef up my storage link between my demonstration vSphere environment and my storage system. My existing setup included a single HP …… […]
I can’t believe how helpful this thread has been!
I purchased a RS814+ with 4x 1Gbs ports, and have struggled for months with iSCSI performance.
DSM iis now version 5.2, and these issues keep existing… Based on the entire thread discussion, I guess I’ll try migrating my iSCSI LUNs to NFS.
It’s so sad that our NASes cannot achieve full 4Gbps out-of-the-box…
Hello all.
I know this thread is old so please don’t flame me for that. It seems to me that the problem still exist.
My environment is set up strictly for testing different configurations. It does not have any form of production connected to the system.
My Synology environment is :
DS1815+ 8x WD RED 1TB 7200 RPM )
DX513 2x (( 4x WD Green 1TB 5400 RPM ) + (1 256 GB SSD ))
DSM 5.2-5644 Update 2
RAM 6144 MB
Configuration
Diskgroup 1 ( 8x WD RED 1 TB 7200 RPM ) (DS1815+) RAID 0
Diskgroup 2 ( 4x WD Green 1TB 5400 RPM ) (DX513_1) RAID 0
Diskgroup 3 ( 4x WD Green 1TB 5400 RPM ) (DX513_2) RAID 0
No services are running
No volumes have been created
iSCSI Target
LUN 1 ( Diskgroup 1) block
Server environment
2 x ESXi with 6 separate 1Gbit networks ( 1 management, 1 client, 4 backbone storage network )
Guest Windows 2012 Server ( 6 CPU 8GB RAM )
When doing testing the storage I have set up MPIO with 4x1Gbit net which with CrystalDiskMark 5.1.0 (x64) gave me Read 406 MB/s Write 350,5 MB/s
and a iSCSI LUN utitization for this specific LUN of 75-80 %.
However with IOMeter the numbers are quite different.
IOPS
8192 75% Read 0% random 6606
4096 75% Read 0% random 6525
512 75% Read 0% random 3400
The numbers are ok, and with this test LUN utilization went up a few percent up to 80 -85%.
But when introducing 25% random read the numbers fell quite hard.
IOPS
8192 75% Read 25% random 763
4096 75% Read 25% random 863
512 75% Read 25% random 911
At the same time network traffic was reduced and LUN utilization went up to 99-100%.
During the testing the disk utilization per disk never goes beyond 35%.
To me this seems to indicate that the reason for the performance on the sequential read is due to a read-ahead caching mechanism on the raid-controller.
When the disk utilization still remains at or below 35% during the test wile the LUN utilization caps out at 100% this to me indicated that the problem really is
located with at Synology raid controller which seems to be unable to control the disk devices during random access.
Kjell Erik Furnes
Hello everyone!
I know I am posting here late but I am hoping to get a small tip or advice to vent my frustration with Synology DS1815+. It has 4 of those Gig ports and I put in HGST drives that advertise a sustained transfer rate of 227MB/s. I have put 4 of those drives in RAID 10 so with mirroring I was hoping to see at least 250MB/s but unfortunately I am stuck at 80MB/s split across the 3 NICs that I am iSCSIing to my Dell 730xd server running ESXI 6. Each NIC is reporting around 25-28MB/s. I have tried to bind them together on the NAS under one IP but the change is insignificant (~90). I am using one big (File Level) LUN to be able to get advantage of VAAI and the vSphere shows the HW Acceleration as ‘supported’.
Should I already give up on iSCSI and think of NFS although we are a small setup? Any help that would get me to 150 would be awesome.
Thank you for your time.
Hi Ashan,
First and foremost, do not use link aggregation to combine the ports… This will not provide any speed combination. Link aggregation should never be used with iSCSI, instead MPIO (multipathing) should be used.
Are you using multiple subnets on each link to the server? I’m assuming the server has 4 NICs? Also, have you tried a different RAID level on the Synology unit?
Ultimately, I don’t think you are going to get the performance you were expecting. This is one of the reasons why I moved away from my DS-1813+ and purchased a SAN (HPe MSA 2040).
Stephen
Hi Stephen,
Thank you for getting back on this so quick.
I am using MPIO with round robin. I just wanted to test by binding the Ethernet ports together but since it didn’t make any significant difference I reverted back to the original 3 port config. I followed all the steps from a VMWARE KB so I am pretty sure I have set it up correctly.
Yes the server has 4 ports. I am using 3 ports for iSCSI on both server and the NAS. All server NICs and NAS NICs are on the same (isolated) broadcast domain/subnet going to a Gbps switch. (10.0.100.X/24)
So far I have tried only RAID 10 since it is going to be a big array 4x4TB RAW. Thought RAID 5 wouldn’t be a good idea in case of failure / rebuild.
Our budget is limited so going to full scale SAN wont get approval. This datastore is intended to be a part of the storage for one of the virtual machines that would actually reside on the internal storage of the ESXI host, the vm wont run from it. It would simply be part of a dynamic storage, say an extension of the D drive.
You mentioned moving to NFS did improve the performance. Do you recall how much were you able to get while testing the same NAS over NFS?
Lastly, this DS1815+ is already fully updated with DSM 6. How helpful have you found Synology tech support, just wondering.
Thanks again!
Hi Ahsan,
If they are on the same subnet, I just want to confirm that you configured iSCSI port binding?
To be honest, I never contacted Synology support. I just read numerous forums and threads stating issues people were having…
When I tested my unit with iSCSI MPIO (4 links), I could never get the performance to match, or beat NFS with 1 link. So I just stayed with NFS.
Ultimately, I stopped using the Synology device for other reasons. While speed was a factor, after a few failed drives resulting in massive issues, I decided I didn’t want to store and critical/production data on it.
I still use my unit, but I re-purposed it as an NFS datastore, for vSphere Data Protection. It’s currently configured as a replication partner.
Hey Stephen,
I just wanted to let you know that this article helped me a lot in troubleshooting my own issues with the synology and with MPIO. I’m running a DS1515+ with Four 2TB Reds, and one SSD being used as a read only cache.
After doing further troubleshooting, and contacting synology support, I was able to get my unit to do ~275 read, and 200 write.
Its all in the Synology and esxi configuration. Thanks again for your detailed articale!
Hi Ricardo,
Would you be able to post what you did to increase the performance? Provide us with the information?
I’m sure this will help others in the exact same situation!
Cheers, and thank you!
I had the same problems on a RS10613xs+
I had a windows 2012 R2 Server with the Synology connected via ISCSI and through two dedicated ports running MPIO. The Server was running backup exec 2016 and using the synology as a disk repository. Backups were very slow and I kept getting \Hardisk1\DR2 errors in the windows logs.
Slow copy to anything on the Synology. The copy would also pause randomly if you were watching it through windows srv 2012 task manager
Upgraded to the latest version of DSM 6.0.2-8451 did not help
Deleted a volume and recreated the volume with the settings below and this fixed the issue
The old volume was :
Ext4
The new volume was
Ext4
The old LUN was :
ISCSI File Level
thin provisioned
Advanced File LUN
The new LUN was
ISCSI File Level
thick provisioned
Regular File LUN
So the only difference was the the thick provisioning and the choice of a a regular LUN type.Im not sure if the upgrade to the DSM actually did anything to fix the issue when I recreated the volume so in the end you may require this as well.
Hope this helps someone spend less time in a dark room and more time having fun
Stephen,
I had the same problem at first. I was able to fix the problem by creating a LAG on my switch (must support 802.3ad). After I created the lag I created a bond in the network settings of the ds1515+. I’m now getting around 400mb/s transfer speed to the iscsi LUN. I’ll check when I get home for the exact settings and post another comment. Transfers over the network are only around 100mb/s but transfers to the iscsi are hitting 400mb/s. Hope this helps.
Tim
Wow, just wow!
I had about the same situation: Trying to configure two “clustered” Synology RS815+ as ISCSI Storage for my hyperV-Cluster.
They are containing 3 regular 3,5″ Inch SATA-Disks, configured as Raid 5.
The ISCSI Performance (Block-based-LUN) was just terrible bad: 60 MB/s Read & 15-20 MB/s write…
I spend DAYS on trying about every optimization I could find for ISCSI… nothing really worked out – until I found your post:
I’ve now created a File-based-Lun on the Raid and ran another test:
Boom! 230,8 MB/s Read and 147,6 MB/s Write!!
That’s about “perfect” for the given hardware! (2 x 1 Gbit NIC, limiting the Read + Raid 5 limiting the write)
(I found a lot of other guys, suffering from bad iscsi performance with Synology NAS – I’ll spread the word!)
[…] reader you know that from my original plans (post here), and than from my issues later with iSCSI (post here), that I finally ultimately setup my Synology NAS to act as a NFS datastore. At the moment I use my […]
Just wanted to add my 2 cents.
I, too was having performance issues. Someone may have explained this further up in the thread, but what finally fixed it for me was Multipath IO in the Microsoft iscsi connector.
– tried LACP link aggregation through synology (found out that only affects reliability not throughput)
– tried SMB repository in veeam versus iscsi, no difference
– we were having tons of errors with Thin provisioned LUN on synology with Microsoft iscsi
As Clive pointed out above, a Thick provisioned LUn fixes those errors
What fixed it and our setup now:
– all 4 Gb nics in synology with their own IP addresses, no bond
– The guest windows VM on ESX has 4 vinrtual nics each with own ip address
– Microsoft iscsi initiator configured for MPIO (not MCS) with source IP and destination IP specified in the 4 sessions (it was a little confusing to setup)
– synology shows 4 connections to iscsi LUN from the 4 IP addresses of the VM
Now we get 4x the throughput, essentially 4Gbs minus overhead (somewhere around 330MBs versus 50MB for no MPIO)