Jan 062018
 

Last night I updated my VMware VDI envionrment to VMware Horizon 7.4.0. For the most part the upgrade went smooth, however I discovered an issue (probably unrelated to the upgrade itself, and more so just previously overlooked). When connecting with Google Chrome to  VMware Horizon HTML Access via the UAG (Unified Access Gateway), an error pops up after pressing the button saying “Failed to connected to the connection server”.

The Problem:

This error pops up ONLY when using Chrome, and ONLY when connecting through the UAG. If you use a different browser (Firefox, IE), this issue will not occur. If you connect using Chrome to the connection server itself, this issue will not occur. It took me hours to find out what was causing this as virtually nothing popped up when searching for a solution.

Finally I stumbled across a VMware document that mentions on View Connection Server instances and security servers that reside behind a gateway (such as a UAG, or Access Point), the instance must be aware of the address in which browsers will connect to the gateway for HTML access.

The VMware document is here: https://docs.vmware.com/en/VMware-Horizon-7/7.0/com.vmware.horizon-view.installation.doc/GUID-FE26A9DE-E344-42EC-A1EE-E1389299B793.html

To resolve this:

On the view connection server, create a file called “locked.properties” in “install_directory\VMware\VMware View\Server\sslgateway\conf\”.

If you have a single UAG/Access Point, populate this file with:

portalHost=view-gateway.example.com

If you have multiple UAG/Access Points, populate the file with:

portalHost.1=view-gateway-1.example.com
portalHost.2=view-gateway-2.example.com

Restart the server

The issue should now be resolved!

On a side note, I also deleted my VMware Unified Access Gateways VMs and deployed the updated version that ship with Horizon 7.4.0. This means I deployed VMware Unified Access Gateway 3.2.0. There was an issue importing the configuration from the export backup I took from the previous version, so I had to configure from scratch (installing certificates, configuring URLs, etc…), be aware of this issue importing configuration.

 

Oct 272017
 

I went to re-deploy some vDP appliances today and noticed a newer version was made available a few months ago (vSphere Data Protection 6.1.5). After downloading the vSphereDataProtection-6.1.5.ova file, I went to deploy it to my vSphere cluster and it failed due to an invalid certificate and a message reading “The OVF package is signed with an invalid certificate”.

I went ahead and downloaded the certificate to see what was wrong with it. While the publisher was valid, the certificate was only valid from September 5th, 2016 to September 7th, 2017, and today was October 27th, 2017. It looks like the guys at VMware should have generated a new cert before releasing it.

 

 

To resolve this, you need to repackage the OVA file and skip the certificate using the VMware Open Virtualization Format Tool (ovftool) available at https://code.vmware.com/tool/ovf/4.1.0

Once you download and install this, the executable can be found in your Program Files\VMware\VMware OVF Tool folder.

Open a command prompt and change to the above directory and run the following:

ovftool.exe --skipManifestCheck c:\folder\vSphereDataProtection-6.1.5.ova c:\folder\vdpgood.ova

This command will repackage and remove the certificate from the OVA and save it as the new file named vdpgood.ova above. Afterwards deploy it to your vSphere environment and all should be working!

 

Feb 182017
 
Windows Server Volume Shadow Copy Volumes Snapshot Screenshot

On VMware vSphere ESXi 6.5, 6.7, and 7.0, a condition exists where one is unable to take a quiesced snapshot. This is an issue that effects quite a few people and numerous forum threads can be found on the internet by those searching for the solution.

This issues can occur both when taking manual snapshots of virtual machines when one chooses “Quiesce guest filesystem”, or when using snapshot based backup applications such as vSphere Data Protection (vSphere vDP), Veeam, or other applications that utilize quiesced snapshots.

The Issue

I experienced this problem on one of my test VMs (Windows Server 2012 R2), however I believe it can occur on newer versions of Windows Server as well, including Windows Server 2016 and Windows Server 2019.

When this issue occurs, the snapshot will fail and the following errors will be present:

An error occurred while taking a snapshot: Failed to quiesce the virtual machine.
An error occurred while saving the snapshot: Failed to quiesce the virtual machine.

Performing standard troubleshooting, I restarted the VM, checked for VSS provider errors, and confirmed that the Windows Services involved with snapshots were in their correct state and configuration. Unfortunately this had no effect, and everything was configured the way it should be.

I also tried to re-install VMWare tools, which had no effect.

PLEASE NOTE: If you experience this issue, you should confirm the services are in their correct state and configuration, as outlined in VMware KB: 1007696. Source: https://kb.vmware.com/s/article/1007696

The Fix

In the days leading up to the failure when things were running properly, I did notice that the quiesced snapshots for that VM were taking a long time process, but were still functioning correctly before the failure.

This morning during troubleshooting, I went ahead and deleted all the Windows Volume Shadow Copies (VSS Snapshots) which are internal and inside of the Virtual Machine itself. These are the shadow copies that the Windows guest operating system takes on it’s own filesystem (completely unrelated to VMware).

To my surprise after doing this, not only was I able to create a quiesced snapshot, but the snapshot processed almost instantly (200x faster than previously when it was functioning).

If you’re comfortable deleting all your snapshots, it may also be a good idea to fully disable and then re-enable the VSS Snapshots on the volume to make sure they are completely deleted and reset.

I’m assuming this was causing a high load for the VMware snapshot to process and a timeout was being hit on snapshot creation which caused the issue. While Windows volume shadow copies are unrelated to VMware snapshots, they both utilize the same VSS (Volume Shadow Copy Service) system inside of windows to function and process. One must also keep in mind that the Windows volume shadow copies will of course be part of a VMware snapshot since they are stored inside of the VMDK (the virtual disk) file.

PLEASE NOTE: Deleting your Windows Volume Shadow copies will delete your Windows volume snapshots inside of the virtual machine. You will lose the ability to restore files and folders from previous volume shadow copy snapshots. Be aware of what this means and what you are doing before attempting this fix.

Feb 142017
 

Years ago, HPE released the GL200 firmware for their HPE MSA 2040 SAN that allowed users to provision and use virtual disk groups (and virtual volumes). This firmware came with a whole bunch of features such as Read Cache, performance tiering, thin provisioning of virtual disk group based volumes, and being able to allocate and commission new virtual disk groups as required.

(Please Note: On virtual disk groups, you cannot add a single disk to an already created disk group, you must either create another disk group (best practice to create with the same number of disks, same RAID type, and same disk type), or migrate data, delete and re-create the disk group.)

The biggest thing with virtual storage, was the fact that volumes created on virtual disk groups, could span across multiple disk groups and provide access to different types of data, over different disks that offered different performance capabilities. Essentially, via an automated process internal to the MSA 2040, the SAN would place highly used data (hot data) on faster media such as SSD based disk groups, and place regularly/seldom used data (cold data) on slower types of media such as Enterprise SAS disks, or archival MDL SAS disks.

(Please Note: To use the performance tier either requires the purchase of a performance tiering license, or is bundled if you purchase an HPE MSA 2042 which additionally comes with SSD drives for use with “Read Cache” or “Performance tier.)

When the firmware was first released, I had no impulse to try it out since I have 24 x 900GB SAS disks (only one type of storage), and of course everything was running great, so why change it? With that being said, I’ve wanted and planned to one day kill off my linear storage groups, and implement the virtual disk groups. The key reason for me being thin provisioning (the MSA 2040 supports the “DELETE” VAAI function), and virtual based snapshots (in my environment, I require over-commitment of the volume). As a side-note, as of ESXi 6.5, ESXi now regularly unmaps unused blocks when using the VMFS-6 filesystem (if left enabled), which is great for SANs using thin provision that support the “DELETE” VAAI function.

My environment consisted of 2 linear disk groups, 12 disks in RAID5 owned by controller A, and 12 disks in RAID5 owned by controller B (24 disks total). Two weekends ago, I went ahead and migrated all my VMs to the other datastore (on the other volume), deleted the linear disk group, created a virtual disk group, and then migrated all the VMs back, deleted my second linear volume, and created a virtual disk group.

Overall the process was very easy and fast. No downtime is required for this operation if you’re licensed for Storage vMotion in your vSphere environment.

During testing, I’ve noticed absolutely no performance loss using virtual vs linear, except for some functions that utilize the VAAI storage providers which of course run faster on the virtual disk groups since it’s being offloaded to the SAN. This was a major concern for me as block linear based storage is accessed more directly, then virtual disk groups which add an extra level of software involvement between the controllers and disks (block based access vs file based access for the iSCSI targets being provided by the controllers).

Unfortunately since I have no SSDs and no extra room for disks, I won’t be able to try the performance tiering, but I’m looking forward to it in the future.

I highly recommend implementing virtual disk groups on your HPE MSA 2040 SAN!

Feb 082017
 

When running vSphere 6.5, 6.7, or 7.0 (or later) and utilizing a VMFS6 datastore, we now have access to automatic LUN reclaim (this unmaps unused blocks on your LUN), which automatically unmaps unused storage on your LUNs. This is very handy for thin provisioned storage.

Essentially when you unmap blocks, it “tells” the storage (SAN) that unused (deleted or moved data) blocks aren’t being used anymore and to unmap them, which decreases the allocated size on the storage layer and frees up storage space. Your storage LUN must support VAAI and the “Delete” function.

Now taking this a step further, most of you have noticed that storage reclaim in the vSphere client has two settings for priority in the web client; none, or low.

For those of you who feel daring or want to spice life up a bit, you can manually increase the priority of the automated space reclamation through the esxcli command. While I can’t recommend this (obviously VMware chose to hide these options due to performance considerations), you can follow these instructions to change the priority higher.

Manually Configure Storage Reclaim (UNMAP) Priority

To view the current settings:

esxcli storage vmfs reclaim config get --volume-label=DATASTORENAME

To set ESXi reclaim/unmap priority to medium:

esxcli storage vmfs reclaim config set --volume-label=DATASTORENAME --reclaim-priority=medium

To set ESXi reclaim/unmap priority to high:

esxcli storage vmfs reclaim config set --volume-label=DATASTORENAME --reclaim-priority=high

You can confirm these settings took effect by running the first “get” command to view the settings, or view the datastore in the storage section of the vSphere client. While the vSphere client will reflect the higher priority setting, if you change it lower and then want to change it back higher, you’ll need to use the esxcli command to bring it up to a higher priority again.

Happy Virtualizing! Leave a comment!