Jul 252023
 

When it comes to virtualized workloads, one thing I commonly see overlooked in the design of the solution, is the placement of workloads. In this post, I want to cover VMware vSphere VM placement rules using the “VM/Host Rules” feature.

This is a feature that I commonly see overlooked and not configured, especially in smaller single cluster environments, however I’ve also seen this happen in very large scale environments as well.

Let’s cover the why, what, who, and how…

VM Workloads

While VMware vSphere does have a number of technologies built in for redundancy, load-balancing, and availability, as part of the larger solution we often find our workloads, specifically 3rd party platforms, with their own solutions that accomplish the same thing.

We need to identify which HA (High Availability) or redundancy solution to use, based on the application, service, and how it works.

For example, using VMware vSphere HA, or High Availability, if vCenter (and/or vCLS) detects a host goes offline, it can restart the workload on other online hosts. There is time associated with the detection and boot time, resulting in a loss of service during this period.

Third party solutions often have their own high availability or redundancy built in to the solution, such as Microsoft Active Directory. In this case with a standard configuration, at any time, any domain controller can respond to a clients request for resources. If one DC goes offline, other DCs can respond to the request resulting in no downtime.

Obviously, in the case of Active Directory Domain Controllers, you’d much prefer to have multiple DCs in your environment, instead of using one with vSphere HA.

Additionally, if you did have multiple domain controllers, you’d want to make sure they aren’t all placed on the same ESXi host. This is where we start to incorporate VM placement in to our solution.

VM Placement

When it comes to 3rd party solutions like mentioned above, we need to identify these workloads and factor them in to the design of the solution we are either implementing, maintaining, or improving.

Example of VM workloads used with VM Placement

A few examples of these workloads with their own load-balancing and availability technologies:

  • Microsoft Windows Active Directory Domain Controllers
  • Microsoft Windows Servers running DNS/DHCP Servers
  • Virtualized Active/Active or Active/Passive Firewall Appliances
  • VMware Horizon UAG (Unified Access Gateway) configured in HA mode
  • Other servers/services that have their own availability systems

As you can see, the applications all have their own special solution for availability, so we must insure the different “nodes” or “instances” are running on different ESXi hosts to avoid a host failure bringing down the entire solution.

Unless otherwise specified by the 3rd party vendor, I would recommend using VM/Host Rules in combination with vSphere DRS and HA.

Configuring VM Placement with VM/Host Rules

To configure these rules, follow the instructions below:

  1. Log on to your VMware vCenter Server
  2. Select a Cluster
  3. Click on the “Configure” tab, and then “VM/Host Rules”
    • Here you can Add/Edit/Delete VM Host Rules
  4. Click on “Add”, and give the rule a new name (Example: Domain Controllers)
  5. For “Type”, select “Separate Virtual Machines”
  6. Click “Add” and select your Domain Controllers and add them to the rule.
Screenshot rule creation for VM placement using VM Host Rules
Domain Controller VM Placement VM Host Rule

After you click “OK”, the rule should now be saved, and DRS will make sure these VMs are now running on separate hosts.

Below you can see another example of a configured system, separating 2 Active/Passive Firewall appliances.

VM placement and VM/Host Rules for Firewall appliances
VM/Host Rules for Firewall Appliances

As you can see, VM placement with VM/Host Rules is very easy to configure and deploy.

Additional Considerations

Note, if you implement these rules and do not have enough hosts to fullfill the requirements, the hosts may fail to be evacuated by DRS when placing in maintenance mode, or remediating with vLCM (Lifecycle Manager).

In this case, you’ll need to manually vMotion the VM’s to other hosts (to violate the rule) or shut some down.

Jul 242023
 
Picture of an DL360p Gen8 1U Rack Server with IO-PEX40152 Installed

A few months ago, you may have seen my post detailing my experience with ESXi 7.0 on HP Proliant DL360p Gen8 servers. I now have an update as I have successfully loaded ESXi 8.0 on HPE Proliant DL360p Gen8 servers, and want to share my experience.

It wasn’t as eventful as one would have expected, but I wanted to share what’s required, what works, and stability observations.

Please note, this is NOT supported and NOT recommended for production environments. Use the information at your own risk.

A special thank you goes out to William Lam and his post on Homelab considerations for vSphere 8, which provided me with the boot parameter required to allow legacy CPUs.

ESXi on the DL360p Gen8

With the release of vSphere 8.0 Update 1, and all the new features and functionality that come with the vSphere 8 release as a whole, I decided it was time to attempt to update my homelab.

In my setup, I have the following:

  • 2 x HPE Proliant DL360p Gen8 Servers
    • Dual Intel Xeon E5-2660v2 Processors in each server
    • USB and/or SD for booting ESXi
    • No other internal storage
    • NVIDIA A2 vGPU (for use with VMware Horizon)
  • External SAN iSCSI Storage

Since I have 2 servers, I decided to do a fresh install using the generic installer, and then use the HPE addon to install all the HPE addons, drivers, and software. I would perform these steps on one server at a time, continuing to the next if all went well.

I went ahead and documented the configuration of my servers beforehand, and had already had upgraded my VMware vCenter vCSA appliance from 7U3 to 8U1. Note, that you should always upgrade your vCenter Server first, and then your ESXi hosts.

To my surprise the install went very smooth (after enabling legacy CPUs in the installer) on one of the hosts, and after a few days with no stability issues, I then proceeded and upgraded the 2nd host.

I’ve been running with 100% for 25+ days without any issues.

The process – Installing ESXi 8.0

I used the following steps to install VMware vSphere ESXi 8 on my HP Proliant Gen8 Server:

  1. Download the Generic ESXi installer from VMware directly.
    1. Link: Download VMware vSphere
  2. Download the “HPE Custom Addon for ESXi 8.1”.
    1. Link: HPE Custom Addon for ESXi 8.0 U1 June 2023
    2. Other versions of the Addon are here: HPE Customized ESXi Image.
  3. Boot server with Generic ESXi installer media (CD or ISO)
    • IMPORTANT: Press “Shift + o” (Shift key, and letter “o”) to interrupt the ESXi boot loader, and add “AllowLegacyCPU=true” to the kernel boot parameters.
  4. Continue to install ESXi as normal.
    • You may see warnings about using a legacy CPU, you can ignore these.
  5. Complete initial configuration of ESXi host
  6. Mount NFS or iSCSI datastore.
  7. Copy HPE Custom Addon for ESXi zip file to datastore.
  8. Enable SSH on host (or use console).
  9. Place host in to maintenance mode.
  10. Run “esxcli software vib install -d /vmfs/volumes/datastore-name/folder-name/HPE-801.0.0.11.3.1.1-Jun2023-Addon-depot.zip” from the command line.
  11. The install will run and complete successfully.
  12. Restart your server as needed, you’ll now notice that not only were HPE drivers installed, but also agents like the Agentless management agent, and iLO integrations.

After that, everything was good to go… Here you can see version information from one of the ESXi hosts:

ESXi 8 on HPE Proliant DL360p Gen8
VMware ESXi version 8.0.1 Build 21813344 on HPE Proliant DL360p Gen8 Server

What works, and what doesn’t

I was surprised to see that everything works, including the P420i embedded RAID controller. Please note that I am not using the RAID controller, so I have not performed extensive testing on it.

HPE P420i RAID Controller with VMware vSphere ESXi 8
HPE P420i RAID Controller with VMware vSphere ESXi 8

All Hardware health information is present, and ESXi is functioning as one would expect if running a supported version on the platform.

Additional Information

Note that with vSphere 8, VMware is deprecating vLCM baselines. Your focus should be to update your ESXi instances using cluster image based update images. You can incorporate vendor add-ons and components to create a customized image for deployment.

Jul 232023
 
Azure AD SSO with Horizon

With the release of VMware Horizon 2303, VMware Horizon now supports Hybrid Azure AD Join with Azure AD Connect using Instant Clones and non-persistent VDI.

So what exactly does this mean? It means you can now use Azure SSO using PRT (Primary Refresh Token) to authenticate and access on-premise and cloud based applications and resources.

What else? It allows you to use conditional access!

What is Hybrid Azure AD Join, and why would we want to do it with Azure AD Connect?

Historically, it was a bit challenging when it came to Understanding Microsoft Azure AD SSO with VDI (click to read the post and/or see the video), and special considerations had to be made when an organization wished to implement SSO between their on-prem non-persistent VDI deployment and Azure AD.

Screenshot of a Hybrid Azure AD Joined login
Hybrid Azure AD Joined Login

Azure AD SSO, the old way

The old way to accomplish this was to either implement Azure AD with ADFS, or use Seamless SSO. ADFS was bulky and annoying to manage, and Seamless SSO was actually intended to enable SSO on “downlevel devices” (older operating systems before Windows 10).

For customers without ADFS, I would always recommend using Seamless SSO to enable SSO on non-persistent VDI Instant Clones, until now!

Azure AD SSO, the new way with Azure AD Connect and Azure SSO PRTs

According to the release notes for VMware Horizon 2303:

Hybrid Azure Active Directory for SSO is now supported on instant clone desktop pools. See KB 89127 for details.

This means we can now enable and use Azure SSO with PRTs (Primary Refresh Tokens) using Azure AD Connect and non-persistent VDI Instant Clones.

Azure SSO with PRT and Non-Persistent VDI

This is actually a huge deal because not only does it allow us to use the preferred method for performing SSO with Azure, but it also allows us to start using fancy Azure features like conditional access!

Requirements for Hybrid Azure AD Join with non-persistent VDI and Azure AD Connect

In order to utilize Hybrid Join and PRTs with non-persistent VDI on Horizon, you’ll need the following:

  • VMware Horizon 2303 (or later)
  • Active Directory
  • Azure AD Connect (Implemented, Configured, and Functioning)
    • Azure AD Hybrid Domain Join must be enabled
    • OU and Object filtering must include the non-persistent computer objects and computer accounts
  • Create a VMware Horizon Non-Persistent Desktop Pool for Instant Clones
    • “Allow Reuse of Existing Computer Accounts” must be checked

When you configure this, you’ll notice that after provisioning a desktop pool (or pushing a new snapshot), that there may be a delay for PRTs to be issued. This is expected, however the PRT will be issued eventually, and subsequent desktops shouldn’t experience issues unless you have a limited number available.

*Please note: VMware still notes that ADFS is the preferred way for fast issuance of the PRT.

While VMware does recommend ADFS for performance when issuing PRTs, in my own testing I had no problems or complaints, however when deploying this in production I’d recommend that because of the PRT delay after deploying the pool or a new snapshot, to do this after hours or SSO will not function for some users who immediately get a new desktop.

Additional Considerations

Please note the following:

  • When switching from ADFS to Azure AD Connect, the sign-in process may change for users.
    • You must prepare the users for the change.
  • When using locally stored identifies and/or cached credentials, enabling Azure SSO may change the login process, or cause issues for users signing in.
    • You may have to delete saved credentials in the users persistent profile
    • You may have to adjust GPOs to account for Azure SSO
    • You may have to modify settings in your profile persistent solution
      • Example: “RoamIdentity” on FSLogix
  • I recommend testing before implementing
    • Test Environment
    • Test with new/blank user profiles
    • Test with existing users

If you’re coming from an environment that was previously using Seamless SSO for non-persistent VDI, you can create new test desktop pools that use newly created Active Directory OU containers and adjust the OU filtering appropriately to include the test OUs for synchronization to Azure AD with Azure AD Connect. This way you’re only syncing the test desktop pool, while allowing Seamless SSO to continue to function for existing desktop pools.

How to test Azure AD Hybrid Join, SSO, and PRT

To test the current status of Azure AD Hybrid Join, SSO, and PRT, you can use the following command:

dsregcmd /status

To check if the OS is Hybrid Domain joined, you’ll see the following:

+----------------------------------------------------------------------+
| Device State                                                         |
+----------------------------------------------------------------------+

             AzureAdJoined : YES
          EnterpriseJoined : NO
              DomainJoined : YES
                DomainName : DOMAIN

As you can see above, “AzureADJoined” is “YES”.

Further down the output, you’ll find information related to SSO and PRT Status:

+----------------------------------------------------------------------+
| SSO State                                                            |
+----------------------------------------------------------------------+

                AzureAdPrt : YES
      AzureAdPrtUpdateTime : 2023-07-23 19:46:19.000 UTC
      AzureAdPrtExpiryTime : 2023-08-06 19:46:18.000 UTC
       AzureAdPrtAuthority : https://login.microsoftonline.com/XXXXXXXX-XXXX-XXXXXXX
             EnterprisePrt : NO
    EnterprisePrtAuthority :
                 OnPremTgt : NO
                  CloudTgt : YES
         KerbTopLevelNames : XXXXXXXXXXXXX

Here we can see that “AzureAdPrt” is YES which means we have a valid Primary Refresh Token issued by Azure AD SSO because of the Hybrid Join.

Mar 122023
 

Are you running an HPE Nimble or HPE Alletra 6000 SAN in your VMware environment with iSCSI? A commonly overlooked component of the solution architecture and configuration when using these SAN’s is HPE Nimble and HPE Alletra 6000 SAN IP Zoning with an ISL (Inter-Switch Link).

When it comes to implementing these SANs, it’s all about data availability, performance, optimizations, and making sure it’s done properly.

I want to share with you some information, as I feel this important and required configuration consideration is often ignored, with many IT professionals not being aware it exists.

HPE Alletra 6000
HPE Alletra 6000 SAN

I recently had a customer that purchased and deployed two HPE Alletra 6010 SANs for their VMware environment, where I was contracted to implement these SANs. Even though the customer purchased HPE Technical Installation and Startup Services, the HPE installer was not aware of IP Address Zoning and it’s purpose, advising us to disable it.

I actually had to advise the technician that numerous HPE technical documents recommended to enable and configure it when you have an ISL. He then researched it, and confirmed we should have it enabled and configured.

IP Address Zoning

When you have SAN switches that include an ISL (inter-switch link) that are connected to an HPE Nimble or HPE Alletra SAN, it’s preferred not to have traffic go across that interlink, as it creates additional hops for packets, as well as increases latency.

However, in the event of a switch, NIC, and/or path failure, we do want to have the interlink available to facilitate data access and be available when required.

Using NCM (Nimble Connection Manager) and SCM (Storage Connection Manager) on your VMware ESXi hosts, the HPE Nimble and HPE Alletra storage solution can intelligently choose when to use the interlink depending on paths available, and the current health of SAN connectivity. It does this through IP Address Zones.

You must have the NCM or SCM plugin installed on your ESXi hosts to be able to use IP Address Zones, and use the HPE Nimble Storage path selection policy (NIMBLE_PSP_DIRECTED).

Implementing IP Address Zones

To implement this, you’ll need to assign an IP Zone to each of your switches. Please see below for a table from HPE Alletra documentation:

HPE Nimble and HPE Alletra SAN IP Address Zone Types for ISL configuration
HPE IP Address Zone Types for ISL Configuration

You can choose to either bisect the subnet, or use a method of dedicating even numbered IPs to one switch/zone, and dedicating odd numbered IPs to the other switch/zone.

This allows you to zone each switch, and keep traffic in the zone avoiding use of the interlink which would cause additional hops and latency. You’ll need to configure on the storage array the Zone Type you selected.

In the event of a failure, the interlink will be available for non-optimized path access to ensure continued data access.

Mar 062023
 
VMware vSphere 7 Logo

You might ask if/what the procedure is for updating Enhanced Linked Mode vCenter Server Instances, or is there even any considerations that apply?

vCenter Enhanced Link Mode is a feature that allows you to link a total of 15 vCenter Instances in to a single, Single Sign On (SSO) vSphere domain. This allows you to have a single set of credentials to manage all 15 instances, as well as the ability to manage all of them from a single pane of glass.

When it comes to environments with multiple vCenter instance and/or vCSA appliances, this really helps manageability, and visibility.

Enhanced Linked Mode Upgrade Considerations

To answer the question above: Yes, when you’re running Enhanced Linked Mode (ELM) to link multiple vCenter Server, special considerations and requirements exist when it comes to updating or upgrading your vCenter Server instances and vCSA appliances.

Multiple VMware vCenter Server Instances (vCSA) Running in Enhanced Link Mode (ELM)
Multiple VMware vCenter Server Instances (vCSA) Running in Enhanced Link Mode (ELM)

Not only have these procedures been documented in older VMware documentation, but I recently reviewed and confirmed the best practices with VMware GSS while on a support case.

Procedure for updating vCenter with ELM

  1. Configure/Confirm that the vCenter File-Based Backup in VAMI is configured, functioning, and that you are creating valid file based backups.
  2. Create a manual file-based backup with VAMI
  3. Power down all vCenter Instances and vCSA Appliances in your environment
  4. Perform a cold snapshot of all vCenter Instances and vCSA appliances
    • *This is critical* – You need a valid offline snapshot taken of all appliances powered off at the same point in time
  5. Power on the vCenter/vCSA Virtual Machines (VMs)
  6. Perform the update or upgrade

Recovering from a failed Update

IMPORTANT: In the event that an update or upgrade fails, you must revert all vCenter Instances and/or vCSA appliances back to the previous snapshot!

You cannot selectively choose single or individual instances, as this may cause mismatches in data and configuration between the instances as they have databases that are not in sync, and are from different points in time.

Additionally, if you are in a situation where you’re considering or planning to restore previous snapshots to recover from a failed update, you should do so sooner than later. As time progresses, service accounts and identifiers update in the VMware vSphere infrastructure. Delaying the restore too long could cause this information to get out of sync with the ESXi hosts after performing a snapshot restore/revert.