Feb 072017
 

With vSphere 6.5 came VMFS 6, and with VMFS 6 came the auto unmap feature. This is a great feature, and very handy for those of you using thin provisioning on your datastores hosted on storage that supports VAAI. However, you still have the ability to perform a manual UNMAP at high priority, even with VMware vSphere 7 and vSphere 8.

A while back, I noticed something interesting when running the manual unmap command for the first time. It isn’t well documented, but I thought I’d share for those of you who are doing a manual LUN unmap for the first time. This document will also provide you with the command to perform a manual unmap on a VMFS datastore.

Reason:

Automatic unmap (auto space reclamation) is on, however you want to speed it up or have a large chunk of block’s you want unmapped immediately, and don’t want to wait for the auto feature.

Problem:

I wasn’t noticing any unmaps were occurring automatically and I wanted to free up some space on the SAN, so I decided to run the old command to forcefully run the unmap to free up some space:

esxcli storage vmfs unmap --volume-label=DATASTORENAME --reclaim-unit=200

(The above command runs a manual unmap on a datastore)

After kicking it off, I noticed it wasn’t completing as fast as I thought it should be. I decided to enable SSH on the host and took a look at the /var/log/hostd.log file. To my surprise, it wasn’t stopping at a 200 block reclaim, it just kept cycling running over and over (repeatedly doing 200 blocks):

2017-02-07T14:12:37.365Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:37.978Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:38.585Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:39.191Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:39.808Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:40.426Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:41.050Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:41.659Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:42.275Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-9XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX
2017-02-07T14:12:42.886Z info hostd[XXXXXXXX] [Originator@XXXX sub=Libs opID=esxcli-fb-XXXX user=root] Unmap: Async Unmapped 200 blocks from volume XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXX

That’s just a small segment of the logs, but essentially it just kept repeating the unmap/reclaim over and over in 200 block segments. I waited hours, tried to issue a “CTRL+C” to stop it, however it kept running.

I left it to run overnight and it did eventually finish while I was sleeping. I’m assuming it attempted to unmap everything it could across the entire datastore. Initially I thought this command would only unmap the specified block size.

When running this command, it will continue to cycle in the block size specified until it goes through the entire LUN. Be aware of this when you’re planning on running the command.

Essentially, I would advise not to manually run the unmap command unless you’re prepared to unmap and reclaim ALL your unused allocated space on your VMFS 6 datastore. In my case I did this because I had 4TB of deleted data that I wanted to unmap immediately, and didn’t want to wait for the automatic unmap.

I thought this may have been occurring because the automatic unmap function was on, so I tried it again after disabling auto unmap. The behavior was the same and it just kept running.

If you are tempted to run the unmap function, keep in mind it will continue to scan the entire volume (despite what block count you set). With this being said, if you are firm on running this, choose a larger block count (200 or higher) since smaller blocks will take forever (tested with a block size of 1 and after analyzing the logs and rate of unmaps, it would have taken over 3 months to complete on a 9TB array).

Update May 11th 2018: When running the manual unmap command with smaller “reclaim-unit” values (such as 1), your host may become unresponsive due to a memory overflow. vMotion’s will cease to function, and your ESXi host may need a restart to become fully functional. I’ve experienced this behavior twice. I highly suggest that if you perform this command, you do so while the host is in maintenance mode, and that your restart the host after a successful unmap sweep.

Dec 082016
 

So you just completed your migration from an earlier version of vSphere up to vSphere 6.5 (particularly vCenter 6.5 Virtual Appliance). When trying to log in to the vSphere web client, you receive numerous “The VMware enhanced authentication plugin has updated it’s SSL certificate in Firefox. Please restart Firefox.”. You’ll usually see 2 of these messages in a row on each page load.

You’ll also note that the “Enhanced Authentication Plugin” doesn’t function after the install (it won’t pull your Active Directory authentication information).

To resolve this:

Uninstall all vSphere plugins from your workstation. I went ahead and uninstalled all vSphere related software on my workstation, this includes the deprecated vSphere C# client application, all authentication plugins, etc… These are all old.

Open up your web browser and point to your vCenter server (https://vCENTERSERVERNAME), and download the “Trusted root CA certificates” from VMCA (VMware certificate authority).

Download and extract the ZIP file. Navigate through the extracted contents to the windows certs. These root CA certificates need to be installed to your “Trusted Root Certification Authorities” store on your system, and make sure you skip the “Certificate Revocation List” file which ends in a “.r0”.

To install them, right click, choose “Install Certificate”, choose “Local Machine”, yes to UAC prompt, then choose “Place all certificates in the following store”, browse, and select “Trusted Root Certification Authorities”, and finally finish. Repeat for each of the certificates. Your workstation will now “trust” all certificates issued by your VMware Certificate Authority (VMCA).

You can now re-open your web browser, download the “Enhanced Authentication Plugin” from your vCenter instance, and install. After restarting your computer, the plugin should function and the messages will no longer appear.

Leave a comment!

Dec 072016
 

Well, I start writing this post minutes after completing my first vSphere 6.0 upgrade to vSphere 6.5, and as always with VMware products it went extremely smooth (although with any upgrade there are minor hiccups).

Thankfully with the evolution of virtualization technology, upgrades such as the upgrade to vSphere 6.5 is such a massive change to your infrastructure, yet the process is extremely simplified, can be easily rolled out, and in the event of problems has very simple clear paths to revert back and re-attempt. Failed upgrades usually aren’t catastrophic, and don’t even affect production environments.

Whenever I do these vSphere upgrades, I find it funny how you’re making such massive changes to your infrastructure with each click and step, yet the thought process and understanding behind it is so simple and easy to follow. Essentially, after one of these upgrades you look back and think: “Wow, for the little amount of work I did, I sure did accomplish a lot”. It’s just one of the beauties of virtualization, especially holding true with VMware products.

To top it all off you can complete the entire upgrade/migration without even powering off any of your virtual machines. You could do this live, during business hours, in a production environment… How cool is that!

Just to provide some insights in to my environment, here’s a list of the hardware and configuration:

-2 X HPE Proliant DL360p Gen8 Servers (each with dual processors, and each with 128GB RAM, no local storage)

-1 X HPE MSA2040 Dual Controller SAN (each host has multiple connections to the SAN via 10Gb DAC iSCSI, 1 connection to each of the dual controllers)

-VMware vSphere 6.0 running on Windows Virtual Machine (Windows Server 2008 R2)

-VMware Update Manager (Running on the same server as the vCenter Server)

-VMware Data Protection (2 x VMware vDP Appliances, one as a backup server, one as a replication target)

-VMware ESXi 6.0 installed on to SD-cards in the servers (using HPE Customized ESXi installation)

One of the main reasons why I was so quick to adopt and migrate to vSphere 6.5, was I was extremely interested in the prospect of migrating a Windows based vCenter instance, to the new vCenter 6.5 appliance. This is handy as it simplifies the environment, reduces licensing costs and requirements, and reduces time/effort on server administration and maintenance.

First and foremost, following the recommended upgrade path (you have to specifically do the upgrades and migrations for all the separate modules/systems in a certain order), I had to upgrade my vDP appliances first. For vDP to support vCenter 6.5, you must upgrade your vDP appliances to 6.1.3. As with all vDP upgrades, you must shut down the appliance, mark all the data disks as dependent, take a snapshot, and mount the upgrade ISO, and then boot and initiate the upgrade from the appliance web interface. After you complete the upgrade and confirm the appliance is functioning, you shut down the appliance, remove the snapshot, mark all data disks as independent (except the first Virtual disk, you only mark virtual disk 2+ and up as independent), and you’re done your upgrade.

A note on a problem I dealt with during the upgrade process for vDP to version 6.1.3 (appliance does not detect mounted ISO image) can be found here: http://www.stephenwagner.com/?p=1107

Moving on to vCenter! VMware did a great job with this. You load up the VMware Migration Assistant tool on your source vCenter server, load up the migration/installation application on a separate computer (the workstation you’re using), and it does the rest. After prepping the destination vCenter appliance, it exports the data from the source server, copies it to the destination server, shuts down the source VM, and then imports the data to the destination appliance and takes over the role. It’s the coolest thing ever watching this happen live. Upon restart, you’ve completed your vCenter Server migration.

A note on a problem I dealt with during the migration process (which involved exporting VMware Update Manager from the source server) can be found here: http://www.stephenwagner.com/?p=1115

And as for the final step, it’s now time to upgrade your ESXi hosts to version 6.5. As always, this is an easy task with VMware Update Manager, and can be easily and quickly rolled out to multiple ESXi hosts (thanks to vMotion and DRS). After downloading your ESXi installation ISO (in my case I use the HPE customized image), you upload it in to your new VMware Update Manager instance, add it to an upgrade baseline, and then attach the baseline to your hosts. To push this upgrade out, simply select the cluster or specific host (depending on if you want to rollout to a single host, or multiple at once), and remediate! After a couple restarts the upgrade is done.

A note on a problem I dealt with during ESXi 6.5 upgrade (conflicting VIBs marking image as incompatible when deploying HPE customized image) can be found here: http://www.stephenwagner.com/?p=1120

After all of the above, the entire environment is now running on vSphere 6.5! Don’t forget to take a backup before and after the upgrade, and also upgrade your VM hardware versions to 6.5 (VM compatibility version), and upgrade VMware tools on all your VMs.

Make sure to visit https://YOURVCENTERSERVER to download the VMware Certificate Authority (VMCA) root certificates, and add them to the “Trusted Root Certification Authorities” on your workstation so you can validate all the SSL certs that vCenter uses. Also, note that the vSphere C# client (the windows application) has been deprecated, and you now must use the vSphere Web Client, or the new HTML5 web client.

Happy Virtualizing! Leave a comment!

Dec 072016
 

When upgrading VMware vSphere and your ESXi hosts to version 6.5, 6.7, or 7.0, you may experience an error similar to:

"The upgrade contains the following set of conflicting VIBs: Mellanox_bootbank_net.XXXXversionnumbersXXXX. Remove the conflicting VIBs or use Image Builder to create a custom ISO."

This is due to conflicting VIBs on your ESXi host. This post will go in to detail as to what causes it, and how to resolve it.

The issue

After successfully completing the migration from vCenter 6.0 (on Windows) to the vCenter 6.5 Appliance, all I had remaining was to upgrade my ESXi hosts to ESXi 6.5.

In my test environment, I run 2 x HPE Proliant DL360p Gen8 servers. I also have always used the HPE customized ESXi image for installs and upgrades.

It was easy enough to download the customized HPE installation image from VMware’s website, I then loaded it in to VMware Update Manager on the vCenter appliance, created a baseline, and was prepared to upgrade the hosts.

I successfully upgraded one of my hosts without any issues, however after scanning on my second host, it reported the upgrade as incompatible and stated: “The upgrade contains the following set of conflicting VIBs: Mellanox_bootbank_net.XXXXversionnumbersXXXX. Remove the conflicting VIBs or use Image Builder to create a custom ISO.”

The fix

I checked the host to see if I was even using the Mellanox drivers, and thankfully I wasn’t and could safely remove them. If you are using the drivers that are causing the conflict, DO NOT REMOVE them as it could disconnect all network interfaces from your host. In my case, since they were not being used, uninstalling them would not effect the system.

I SSH’ed in to the host and ran the following commands:

esxcli software vib list | grep Mell

(This command above shows the VIB package that the Mellanox driver is inside of. In my case, it returned “net-mst”)

esxcli network nic list

(This command above verifies which drivers you are using on your network interfaces on the host)

esxcli software vib remove -n net-mst

(This command above removes the VIB that contains the problematic driver)
After doing this, I restarted the host, scanned for upgrades, and successfully applied the new vCenter 6.5 ESXi Customized HPE image.

Hope this helps! Leave a comment!

Dec 072016
 

During my first migration from VMware vCenter 6.0 to VMware vCenter 6.5 Virtual appliance, the migration failed. The migration installation UI would shutdown the source VM, and numerous errors would occur afterwards when the destination vCenter appliance would try finishing configuration.

If you were monitoring the source vCenter server, during the export process, one would notice that an error pops up while compressing the source data. The error presented is generated from Windows creating an archive (zip file), the error reads: “The compressed (zipped) folder is invalid or corrupted.”. The entire migration process halts until you dismiss this message, with the entire migration ultimately failing (at first it appears to continue, but ultimately fails).

If you continued, and had the migration fail. You’ll need to power off the failed (new) vCenter appliance (it’s garbage now), and you’ll need to power on the source (original) vCenter server. The active directory trust will no longer exist at this point, so you’ll need to log on with a local (non-domain) account (on the source server), and re-create the computer trust on the domain using the netdom command:

netdom resetpwd /s:SERVERNAMEOFDOMAINCONTROLLER /ud:DOMAIN\ADMINACCOUNT /pd:*

After re-creating the trust, restart the original vCenter server. You have now reverted to your original vCenter instance and can retry the migration.

Now back to the main issue. I tried a bunch of different things and wasted an entire evening (checking character lengths on paths/filenames, trying different settings, pausing processes in case timeouts were being hit, etc…) however finally I noticed that the compression archive would crash/fail on a file called “vum_registry”.

VUM brings VMware Update Manager to mind, which I do have installed, configured, and running.

I went ahead and uninstalled VMware Update Manager off my source server (as it’s easy enough to re-configure from scratch after the migration). I then proceeded to initiate a migration. To my surprise, the “data to migrate” went from 7.9GB to 2.4GB. This is a huge sign that something was messed up with my VMware update manager deployment (even though it was working fine). I’m assuming there were either filenames that were too long (exceeded the 260 character limit on paths and filenames), special characters were being used where they shouldn’t, or something else was messed up.

After the uninstall of Update Manager, the migration completed successfully. Leave a comment!