Linux Archives

How to build your own traffic shaping device with CentOS and tc (traffic control)

Aug 182018

Let’s say that you’re hosting someone’s equipment and they start to abuse their connection speed. Let’s say that you’re limited in your bandwidth, and you want to control your own bandwidth to make sure you don’t max out your own internet connection. You can take care of both of these problems by building your own traffic shaping network control device using CentOS and using the “tc” linux command.

In this post I’m going to explain what traffic shaping is, why you’d want to use traffic shaping, and how to build a very basic traffic shaping device to control bandwidth on your network.

What is traffic shaping

Traffic shaping is when one attempts to control a connection in their network to prioritize, control, or shape traffic. This can be used to control either bandwidth or packets. In this example we are using it to control bandwidth such as upload and download speeds.

Why traffic shaping

For service providers, when hosting customer’s equipment, the customer may abuse their connection or even max it out legitimately. This can put a halt on the internet connection if you share it with them, or cause bigger issues if it’s shared with other customers. In this example, you would want to implement traffic shaping to allot only a certain amount of bandwidth so they wouldn’t bring the internet connection or network to a halt.

For normal people (or a single business), as fast as the internet is today, it’s still very easy to max your connection out. When this happens you can experience packet loss, slow speeds, and interruption of services. If you host your own servers this can cause even a bigger issue with interruption of those services as well. You may want to limit your own bandwidth to make sure that you don’t bring your internet to a halt, and save some for other devices and/or users.

Another reason is just to implement basic QoS (Quality of Service) across your network, to keep usage and services in harmony and eliminate any from hogging the network connections up.

How to build your own basic traffic shaping device with CentOS and tc

In this post we will build a very simple traffic shaping device that limits and throttles an internet connection to a defined upload and download speed that we set.

You can do this with a computer with multiple NICs (preferably one NIC for management, one NIC for internet, and one NIC for network and/or the hosts to be throttled). If you want to get creative, there are also a number of physical network/firewall appliances that are x86 based, that you can install Linux on. These are very handy as they come with many NICs.

When I set this up, I used an old decommissioned Sophos UTM 220 that I’ve had sitting around doing nothing for a couple years (pic below). The UTM 220 provides 8 NICs, and is very easy to install Linux on to.

Sophos UTM 220 Running CentOS Linux

Please Note: The Sophos UTM 220 is just a fancy computer in a 1U rack mounted case with 8 NICs. All I did was install CentOS on it like a normal computer.

Essentially, all we’ll be doing is installing CentOS Linux, installing “tc”, configuring the network adapters, and then configuring a startup script. In my example my ISP provides me 174Mbps download, and 15Mbps upload. My target is to throttle the connection to 70Mbps download, and 8Mbps upload. I will allow the connection to burst to 80Mbps down, and 10Mbps up.

To get started:

Install CentOS on the computer or device. The specifics of this are beyond the scope of this document, however you’ll want to perform a minimal install. This device is strictly acting as a network device, so no packages are required other than the minimal install option.
During the CentOS install, only configure your main management NIC. This is the NIC you will use to SSH to, control the device, and update the device. No other traffic will pass through this NIC.
After the install is complete, run the following command to enable ssh on boot:
```
chkconfig sshd on
```
Install “tc” by running the command:
```
yum install tc
```
Next, we’ll need to locate the NIC startup scripts for the 2 adapters that will perform the traffic shaping. These adapters are the internet NIC, and the NIC for the throttled network/hosts. Below is an example of one of the network startup scripts. You’re NIC device names will probably be different.
```
/etc/sysconfig/network-scripts/ifcfg-enp2s0
```

Now you’ll need to open the file using your favorite text editor and locate and set ONBOOT to no as shown below. You can ignore all the other variables. You’ll need to repeat this for the 2nd NIC as well.

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=enp2s0
UUID=xxxxxxxx-xxxx-xxx-xxxx-xxxxxxxxxxxx
DEVICE=enp2s0
ONBOOT=no

Now we can configure the linux startup script to configure a network bridge between the two NICs above, and then configure the traffic shaping rules with tc. Locate and open the following file for editing:
```
/etc/rc.d/rc.local
```

Append the following text to the rc.local file:

# Lets make that bridge
brctl addbr bridge0

# Lets add those NICs to the bridge
brctl addif bridge0 enp5s0
brctl addif bridge0 enp2s0

# Confirm no IP set to NICs that are shaping
ifconfig enp5s0 0.0.0.0
ifconfig enp2s0 0.0.0.0

# Bring the bridge online
ifconfig bridge0 up

# Clear out any existing tc policies
tc qdisc del dev enp2s0 root
tc qdisc del dev enp5s0 root

# Configure new traffic shaping policies on the NICs
# Set the upload to 8Mbps and burstable to 10mbps
tc qdisc add dev enp2s0 root tbf rate 8mbit burst 10mbit latency 50ms
# Set the download to 70Mbps and burstable to 80Mbps
tc qdisc add dev enp5s0 root tbf rate 70mbit burst 80mbit latency 50ms

Restart the linux box:
```
shutdown -r now
```
You now have a traffic shaping network device!

Final Thoughts

Please note that normally you would not place the script in the rc.local file, however we wanted something quick and simple. The script may not survive in the rc.local file when updates/upgrades are applied against on the Linux install, so keep this in mind. You’ll also need to test to make sure that you are throttling in the correct direction with the 2 NICs. Make sure you test this setup and allow time to confirm it’s working before putting it in a production network.

Configure DUO MFA on CentOS 7.x Linux with pam_duo

Duo MFA, Linux, Security 36 Responses »

May 062018

I’m a big fan of MFA, specifically Duo Security‘s product (I did a corporate blog post here). I’ve been using this product for some time and use it for an extra level of protection on my workstations, servers, and customer sites. I liked it so much so that my company (Digitally Accurate Inc.) became a partner and now resells the services.

Here’s a demo of DUO MFA being used with CentOS Linux:

Today I want to write about a couple issues I had when deploying the pam_duo module on CentOS Linux 7. The original duo guide can be found at https://duo.com/docs/duounix, however while it did work for the most part, I noticed there were some issues with the pam configuration files, especially if you are wanting to use Duo MFA with usernames and passwords, and not keys for authentication.

A symptom of the issue: I noticed that when following the instructions on the website for deployment, after entering the username, it would skip the password prompt, and go right for DUO authentication, completely bypassing the password all together. I’m assuming this is because the guide was written for key authentication, but I figured I’d do a quick crash-course post on the topic and create a simple guide. I also noticed that sometimes even if an incorrect password was typed in, it would allow authentication if DUO passed as successful.

Ultimately I decided to learn about PAM, understand what it was doing, and finally configure it properly. Using the guide below I can confirm the password and MFA authentication operate correctly.

To configure Duo MFA on CentOS 7 for use with usernames and passwords

First and foremost, you must log in to your Duo Account and go to applications, click “Protect an Application” and select “Unix Application”. Configure the application and document/log your ikey, secret key, and API hostname.

Now we want to create a yum repo where we can install, and keep the pam_duo module up to date. Create the file /etc/yum.repos.d/duosec.repo and then populate it with the following:

[duosecurity]
name=Duo Security Repository
baseurl=http://pkg.duosecurity.com/CentOS/$releasever/$basearch
enabled=1
gpgcheck=1

We’ll need to install the signging key that the repo uses, and then install the duo_unix package. By using yum, we’ll be able to keep this package regularly up to date when we update the server. Run the following commands:

rpm --import https://duo.com/RPM-GPG-KEY-DUO
yum install duo_unix

Configure the pam_duo module by editing the /etc/duo/pam_duo.conf file. You’ll need to populate the lines with your ikey, secret key, and API hostname that you documented above. We use “failmode=safe” so that in the event of an internet disconnection, we can still login to the server without duo. It’s safe to enable this fail-safe, as the purpose is to protect it against the internet. Please see below:

[duo]
; Duo integration key
ikey = XXXXXXXXXXXXXXXXXXXX
; Duo secret key
skey = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
; Duo API host
host = XXXXXXXXX.duosecurity.com
; Send command for Duo Push authentication
pushinfo = yes
; failmode safe if no internet it works (secure locks it up)
failmode = safe

Configure sshd to allow Challenge Response Authentication by editing /etc/ssh/sshd_config, then locate and change “ChallengeResponseAuthentication” to yes. Please note that the line should already be there, and you should simply have to move the comment symbol to comment the old line, and uncomment the below line as shown below:

ChallengeResponseAuthentication yes

And now it gets tricky… We need to edit the pam authentication files to incorporate the Duo MFA service in to it’s authentication process. I highly recommend that throughout this, you open (and leave open) an additional SSH session, so that if you make a change in error and lock yourself out, you can use the extra SSH session to revert any changes to the system to re-allow access. It’s always best to make a backup and copy of these files so you can easily revert if needed.

DISCLAIMER: I am not responsable if you lock yourself out of your system. Please make sure that you have an extra SSH session open so that you can revert changes. It is assumed you are aware of the seriousness of the changes you are making and that you are taking all precautions (including a backup) to protect yourself from any errors.

Essentially two files are used for authentication that we need to modify. One file is for SSH logins, and the other is for console logins. In my case, I wanted to protect both methods. You can do either, or both. If you are doing both, it may be a good idea to test with SSH, before making modifications to your console login, to make sure your settings are correct. Please see below for the modifications to enable pam_duo:

/etc/pam.d/password-auth (this file is used for SSH authentication)

#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        required      pam_faildelay.so delay=2000000
#auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_unix.so nullok try_first_pass
auth  sufficient pam_duo.so
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 1000 quiet
account     required      pam_permit.so

password    requisite     pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type=
password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok


password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
-session     optional      pam_systemd.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so

/etc/pam.d/system-auth (this file is used for console authentication)

auth        required      pam_env.so
auth        sufficient    pam_fprintd.so
#auth        sufficient    pam_unix.so nullok try_first_pass
# Next two lines are for DUO mod
auth        requisite     pam_unix.so nullok try_first_pass
auth        sufficient    pam_duo.so
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 1000 quiet
account     required      pam_permit.so

password    requisite     pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type= ucredit=-1 lcredit=-1 dcredit=-1 ocredit=-1
password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok remember=5
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
-session     optional      pam_systemd.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so

Now, we must restart sshd for the changes to take affect. Please make sure you have your extra SSH session open in the event you need to rollback your /etc/pam.d/ files. Restart the sshd service using the following command:

service sshd restart

Attempt to open a new SSH session to your server. It should now ask for a username, password, and then prompt for Duo authentication. And you’re done!

More information on Duo Multi Factor Authentication (MFA) can be found here.

Fedora upgrade from 27 to 28 (FC27 to FC28) fails on nss-pem

Fedora, Linux 10 Responses »

May 012018

When attempting to upgrade from Fedora Core 27 to Fedora Core 28, it may fail on the nss-pem package.

I spent some time trying to find the solution for this, and came across numerous posts on the “Red Hat Bugzilla”, particularly this post.

Unfortunately no fix was found.

See below for an example on the failed upgrade output:

[root@SYSTEMZ01 ~]# dnf system-upgrade download --releasever=28
Before you continue ensure that your system is fully upgraded by running "dnf --refresh upgrade". Do you want to continue [y/N]: y
Fedora 28 - x86_64 - Updates
$
Fedora 28 - x86_64
$
google-chrome
$
RPM Fusion for Fedora 28 - Free - Updates
$
RPM Fusion for Fedora 28 - Free
$
RPM Fusion for Fedora 28 - Nonfree - Updates
$
RPM Fusion for Fedora 28 - Nonfree
$
skype (stable)
$
Last metadata expiration check: 0:00:00 ago on Tue 01 May 2018 04:28:04 PM MDT.
Error:
 Problem: nss-pem-1.0.3-6.fc27.i686 has inferior architecture
  - nss-pem-1.0.3-6.fc27.x86_64 does not belong to a distupgrade repository
  - problem with installed package nss-pem-1.0.3-6.fc27.i686

To resolve this, manually install the nss-pem packages from FC28 prior to the upgrade using the following command.

dnf install nss-pem-1.0.3-9.fc28 --releasever=28

After doing so, re-attempt to upgrade and the upgrade should now proceed.

Release unused space on host from guest filesystems with thin-provisioned VMDK

ESXi, Linux, Sophos, Sophos UTM, Storage, VMware, vSphere 1 Response »

Jan 182018

The Problem

I run a Sophos UTM firewall appliance in my VMware vSphere environment and noticed the other day that I was getting warnings on the space used on the ESXi host for the thin-provisioned vmdk file for the guest VM. I thought “Hey, this is weird”, so I enabled SSH and logged in to check my volumes. Everything looked fine and my disk usage was great! So what gives?

After spending some more time troubleshooting and not finding much, I thought to myself “What if it’s not unmapping unused blocks from the vmdk to the host ESXi machine?”. What is unampping you ask? When files get deleted in a guest VM, the free blocks aren’t automatically “unmapped” and released back to the host hypervisor in some cases.

Two things need to happen:

The guest VM has to release these blocks (notify the hypervisor that it’s not using them, making the vmdk smaller)
The host has to reclaim these and issue the unmap command to the storage (freeing up the space on the SAN/storage itself)

On a side note: In ESXi 6.5 and when using VMFS version 6 (VMFS6), the datastores can be configured for automatic unmapping. You can still kick it off manually, but many administrators would prefer it to happen automatically in the background with low priority (low I/O).

Most of my guest VMs automatically do the first step (releasing the blocks back to the host). On Windows this occurs with the defrag utility which issues trim commands and “trims” the volumes. On linux this occurs with the fstrim command. All my guest VMs do this automatically with the exception being the Sophos UTM appliance.

The fix

First, a warning: Enable SSH on the Sophos UTM at your own risk. You need to know what you are doing, this also may pose a security risk and should be disabled once your are finished. You’ll need to “su” to root once you log in with the “loginuser” account.

This fix not only applies to the Sophos UTM, but most other linux based guest virtual machines.

Now to fix the issue, I used the “df” command which provides a list of the filesystems, their mount points, and storage free for those fileystems. I’ve included an example below (this wasn’t the full list):

hostname:/root # df
Filesystem                       1K-blocks     Used Available Use% Mounted on
/dev/sda6                          5412452  2832960   2281512  56% /
udev                               3059712       72   3059640   1% /dev
tmpfs                              3059712      100   3059612   1% /dev/shm
/dev/sda1                           338875    15755    301104   5% /boot
/dev/sda5                         98447760 13659464  79513880  15% /var/storage
/dev/sda7                        129002700  4624468 117474220   4% /var/log
/dev/sda8                          5284460   274992   4717988   6% /tmp
/dev                               3059712       72   3059640   1% /var/storage/chroot-clientlessvpn/dev

You’ll need to run the fstrim command on every mountpoint for file systems “/dev/sdaX” (X means you’ll be doing this for multiple mountpoints). In the example above, you’ll need to run it on “/”, “/boot”, “/var/storage”, “/var/log”, “/tmp”, and any other mountpoints that use “/dev/sdaX” filesytems.

Two examples:

fstrim / -v

fstrim /var/storage -v

Again, you’ll repeat this for all mount points for your /dev/sdaX storage (X is replaced with the volume number). The command above only works with mountpoints, and not the actual device mappings.

Time to release the unused blocks to the SAN:

The above completes the first step of releasing the storage back to the host. Now you can either let the automatic unmap occur slowly overtime if you’re using VMFS6, or you can manually kick it off. I decided to manually kick it off using the steps I have listed at: https://www.stephenwagner.com/2017/02/07/vmfs-unmap-command-on-vsphere-6-5-with-vmfs-6-runs-repeatedly/

You’ll need to use esxcli to do this. I simply enabled SSH on my ESXi hosts temporarily.

Please note: Using the unmap command on ESXi hosts is very storage I/O intensive. Do this during maintenance window, or at a time of low I/O as this will perform MAJOR I/O on your hosts…

I issue the command (replace “DATASTORENAME” with the name of your datastore):

esxcli storage vmfs unmap --volume-label=DATASTORENAME --reclaim-unit=8

This could run for hours, possibly days depending on your “reclaim-unit” size (this is the block size of the unit you’re trying to reclaim from the VMFS file-system). In this example I choose 8, but most people do something larger like 100, or 200 to reduce the load and time for the command to complete (lower values look for smaller chunks of free space, so the command takes longer to execute).

I let this run for 2 hours on a 10TB datastore, however it may take way longer (possibly 6+ hours, to days).

Finally, not only are we are left with a smaller vmdk file, but we’ve released the space back to the SAN as well!

Synology DS1813+ – iSCSI MPIO Performance vs NFS

ESXi, iSCSI, Linux, Storage, VMware, vSphere 58 Responses »

Apr 122014

Recently I decided it was time to beef up my storage link between my demonstration vSphere environment and my storage system. My existing setup included a single HP DL360p Gen8, connected to a Synology DS1813+ via NFS.

I went out and purchased the appropriate (and compatible) HP 4 x 1Gb Server NIC (Broadcom based, 4 ports), and connected the Synology device directly to the new server NIC (all 4 ports). I went ahead and configured an iSCSI Target using a File LUN with ALUA (Advanced LUN features). Configured the NICs on both the vSphere side, and on the Synology side, and enabled Jumbo frames of 9000 bytes.

I connected to the iSCSI LUN, and created a VMFS volume. I then configured Round Robin MPIO on the vSphere side of things (as always I made sure to enable “Multiple iSCSI initators” on the Synology side).

I started to migrate some VMs over to the iSCSI LUN. At first I noticed it was going extremely slow. I confirmed that traffic was being passed across all NICs (also verified that all paths were active). After the migration completed I decided to shut down the VMs and restart to compare boot times. Booting from the iSCSI LUN was absolutely horrible, the VMs took forever to boot up. Keep in mind I’m very familiar with vSphere (my company is a VMWare partner), so I know how to properly configure Round Robin, iSCSI, and MPIO.

I then decided to tweak some settings on the ESXi side of things. I configured the Round Robin policy to IOPS=1, which helped a bit. Then changed the RR policy to bytes=8800 which after numerous other tweaks, I determined achieved the highest performance to the storage system using iSCSI.

This config was used for a couple weeks, but ultimately I was very unsatisfied with the performance. I know it’s not very accurate, but looking at the Synology resource monitor, each gigabit link over iSCSI was only achieving 10-15MB/sec under high load (single contiguous copies) that should have resulted in 100MB/sec and higher per link. The combined LAN throughput as reported by the Synology device across all 4 gigabit links never exceeded 80MB/sec. File transfers inside of the virtual machines couldn’t get higher then 20MB/sec.

I have a VMWare vDP (VMWare Data Protection) test VM configured, which includes a performance analyzer inside of the configuration interface. I decided to use this to test some specs (I’m too lazy to actually configure a real IO/throughput test since I know I won’t be continuing to use iSCSI on the Synology with the horrible performance I’m getting). The performance analyzer tests run for 30-60 minutes, and measure writes and reads in MB/sec, and Seeks in seconds. I tested 3 different datastores.

Synology DS1813+ NFS over 1 X Gigabit link (1500MTU):

Read 81.2MB/sec, Write 79.8MB/sec, 961.6 Seeks/sec

Synology DS1813+ iSCSI over 4 x Gigabit links configured in MPIO Round Robin BYTES=8800 (9000MTU):

Read 36.9MB/sec, Write 41.1MB/sec, 399.0 Seeks/sec

Custom built 8 year old computer running Linux MD Raid 5 running NFS with 1 X Gigabit NIC (1500MTU):

Read 94.2MB/sec, Write 97.9MB/sec, 1431.7 Seeks/sec

Can someone say WTF?!?!?!?! As you can see, it appears there is a major performance hit with the DS1813+ using 4 Gigabit MPIO iSCSI with Round Robin. It’s half the speed of a single link 1 X Gigabit NFS connection. Keep in mind I purchased the extra memory module for my DS1813+ so it has 4GB of memory.

I’m kind of choked I spent the money on the extra server NIC (as it was over $500.00), I’m also surprised that my custom built NFS server from 8 years ago (drives are 4 years old) with 5 drives is performing better then my 8 drive DS1813+. All drives used in both the Synology and Custom built NFS box are Seagate Barracuda 7200RPM drives (Custom box has 5 X 1TB drives configured RAID5, the Synology has 8 x 3TB drives configured in RAID 5).

I won’t be using iSCSI or iSCSI MPIO again with the DS1813+ and actually plan on retiring it as my main datastore for vSphere. I’ve finally decided to bite the bullet and purchase an HP MSA2024 (Dual Controller with 4 X 10Gb SFP+ ports) to provide storage for my vSphere test/demo environment. I’ll keep the Synology DS1813+ online as an NFS vDP backup datastore.

Feel free to comment and let me know how your experience with the Synology devices using iSCSI MPIO is/was. I’m curious to see if others are experiencing the same results.

UPDATE – June 6th, 2014

The other day, I finally had time to play around and do some testing. I created a new FileIO iSCSI Target, I connected it to my vSphere test environment and configured round robin. Doing some tests on the newly created datastore, the iSCSI connections kept disconnecting. It got to the point where it wasn’t usable.

I scratched that, and tried something else.

I deleted the existing RAID volume and I created a new RAID 5 volume and dedicated it to Block I/O iSCSI target. I connected it to my vSphere test environment and configured round robin MPIO.

At first all was going smoothly, until again, connection drops were occurring. Logging in to the DSM, absolutely no errors were being reported and everything was fine. Yet, I was at a point where all connections were down to the ESXi host.

I shut down the ESXi host, and then shut down and restarted the DS1813+. I waited for it to come back up however it wouldn’t. I let it sit there and waited for 2 hours for the IP to finally be pingable. I tried to connect to the Web interface, however it would only load portions of the page over extended amounts of time (it took 4 hour to load the interface). Once inside, it was EXTREMELY slow. However it was reporting that all was fine, and everything was up, and the disks were fine as well.

I booted the ESXi host and tried to connect to it, however it couldn’t make the connection to the iSCSI targets. Finally the Synology unit became un-responsive.

Since I only had a few test VMs loaded on the Synology device, I decided to just go ahead and do a factory reset on the unit (I noticed new firmware was available as of that day). I downloaded the firmware, and started the factory reset (which again, took forever since the web interface was crawling along).

After restarting the unit, it was not responsive. I waited a couple hours and again, the web interface finally responded but was extremely slow. It took a couple hours to get through the setup page, and a couple more hours for the unit to boot.

Something was wrong, so I restarted the unit yet again, and again, and again.

This time, the alarm light was illuminated on the unit, also one of the drive lights wouldn’t come on. Again, extreme unresponsiveness. I finally got access to the web interface and it was reporting the temperature of one of the drives as critical, but it said it was still functioning and all drives were OK. I shut off the unit, removed the drive, and restarted it again, all of a sudden it was extremely responsive.

I removed the drive, hooked it up to another computer and confirmed that it was failed (which it was).

I replaced the drive with a new one (same model), and did three tests. One with NFS, one with FileIO iSCSI, and one with BlockIO iSCSI. All of a sudden the unit was working fine, and there was absolutely NO iSCSI connections dropping. I tested the iSCSI targets under load for some time, and noticed considerable performance increases with iSCSI, and no connection drops.

Here are some thoughts:

Two possible things fixed the connection drops, either the drive was acting up all along, or the new version of DSM fixed the iSCSI connection drops.
While performance has increased with FileIO to around ~120-160MB/sec from ~50MB/sec, I’m still not even close to maxing out the 4 X 1Gb interfaces.
I also noticed a significant performance increase with NFS, so I’m leaning towards the fact that the drive had been acting up since day one (seeks per second increased by 3 fold after replacing the drive and testing NFS). I/O wait has been significantly reduced
Why did the Synology unit just freeze up once this drive really started dying? It should have been marked as failed instead of causing the entire Synology unit not to function.
Why didn’t the drive get marked as failed at all? I regularly performed SMART tests, and checked drive health, there was absolutely no errors. Even when the unit was at a standstill, it still reported the drive as working fine.

Either way, the iSCSI connection drops aren’t occurring anymore, and performance with iSCSI is significantly better. However, I wish I could hit 200MB+/sec.

At this point it is usable for iSCSI using FileIO, however I was disappointed with BlockIO performance (BlockIO should be faster, no idea why it isn’t).

For now, I have an NFS datastore configured (using this for vDP backup), although I will be creating another FileIO iSCSI target and will do some more testing.

Update – August 16, 2019: Please see these additional posts regarding performance and optimization:

Older Entries Newer Entries