Oct 102020
 

If you’re like me and use an older Nvidia GRID K1 or K2 vGPU video card for your VDI homelab, you may notice that when using VMware Horizon that VMware Blast h264 encoding is no longer being offloaded to the GPU and is instead being encoded via the CPU.

The Problem

Originally when an environment was configured with an Nvidia GRID K1 or K2 card, not only does the card provide 3D acceleration and rendering, but it also offloads the VMware BLAST h264 stream (the visual session) so that the CPU doesn’t have to. This results in less CPU usage and provides a streamlined experience for the user.

This functionality was handled via NVFBC (Nvidia Frame Buffer Capture) which was part of the Nvidia Capture SDK (formerly known as GRID SDK). This function allowed the video card to capture the video frame buffer and encode it using NVENC (Nvidia Encoder).

Ultimately after spending hours troubleshooting, I learned that NVFBC has been deprecated and is no longer support, hence why it’s no longer functioning. I also checked and noticed that tools (such as nvfbcenable) were no longer bundled with the VMware Horizon agent. One can assume that the agent doesn’t even attempt to check or use this function.

Symptoms

Before I was aware of this, I noticed that while 3D Acceleration and graphics were functioning, I was experiencing high CPU usage. Upon further investigation I noticed that my VMware BLAST sessions were not offloading h264 encoding to the video card.

VMware Horizon Performance Tracker
VMware Horizon Performance Tracker with NVidia GRID K1

You’ll notice above that under the “Encoder” section, the “Encoder Name” was listed as “h264 4:2:0”. Normally this would say “NVIDIA NvEnc H264” (or “NVIDIA NvEnc HEVC” on newer cards) if it was being offloaded to the GPU.

Looking at a VMware Blast session (Blast-Worker-SessionId1.log), the following lines can be seen.

[INFO ] 0x1f34 bora::Log: NvEnc: VNCEncodeRegionNvEncLoadLibrary: Loaded NVIDIA SDK shared library "nvEncodeAPI64.dll"
[INFO ] 0x1f34 bora::Log: NvEnc: VNCEncodeRegionNvEncLoadLibrary: Loaded NVIDIA SDK shared library "nvml.dll"
[WARN ] 0x1f34 bora::Warning: GetProcAddress: Failed to resolve nvmlDeviceGetEncoderCapacity: 127
[WARN ] 0x1f34 bora::Warning: GetProcAddress: Failed to resolve nvmlDeviceGetProcessUtilization: 127
[WARN ] 0x1f34 bora::Warning: GetProcAddress: Failed to resolve nvmlDeviceGetGridLicensableFeatures: 127
[INFO ] 0x1f34 bora::Log: NvEnc: VNCEncodeRegionNvEncLoadLibrary: Some NVIDIA nvml functions unavailable, unloading
[INFO ] 0x1f34 bora::Log: NvEnc: VNCEncodeRegionNvEncUnloadLibrary: Unloading NVIDIA SDK shared library "nvEncodeAPI64.dll"
[INFO ] 0x1f34 bora::Log: NvEnc: VNCEncodeRegionNvEncUnloadLibrary: Unloading NVIDIA SDK shared library "nvml.dll"
[WARN ] 0x1f34 bora::Warning: GetProcAddress: Failed to resolve nvmlDeviceGetEncoderCapacity: 127
[WARN ] 0x1f34 bora::Warning: GetProcAddress: Failed to resolve nvmlDeviceGetProcessUtilization: 127
[WARN ] 0x1f34 bora::Warning: GetProcAddress: Failed to resolve nvmlDeviceGetGridLicensableFeatures: 127

You’ll notice it tries to load the proper functions, however it fails.

The Solution

Unfortunately the only solution is to upgrade to newer or different hardware.

The GRID K1 and GRID K2 cards have reached their EOL (End of Life) and are no longer support. The drivers are not being maintained or updated so I doubt they will take advantage of the newer frame buffer capture functions of Windows 10.

Newer hardware and solutions have incorporated this change and use a different means of frame buffer capture.

To resolve this in my own homelab, I plan to migrate to an AMD FirePro S7150x2.

Jul 222020
 
VMware Logon

When upgrading from any version of VMware vCSA to version 7.0, you may encounter a problem during the migration phase and be asked to specifiy a new “Export Directory”.

I’ve seen this occur on numerous upgrades and often find the same culprit causing the issue. I’ve found a very simple fix compared to other solutions online.

The full prompt for this issue is: “Enter a new export directory on the source machine below”

The Problem

When you upgrade the vCenter vCSA, the process migrates all data over from the source appliance, to the new vCSA 7 appliance.

This data can include the following (depending on your selection):

  • Configuration
  • Configuration and historical data (events and tasks)
  • Configuration and historical data (events, tasks, and performance metric)
  • vSphere Update Manager (updates, configuration, etc.)

This data can accumulate, especially the VMware vSphere Update Manager.

In the most recent upgrade I performed, I noticed that the smallest option (configuration only) was around 8GB, which is way over the 4.7GB default limit.

Could it be vSphere Update Manager?

I’ve seen VMware VUM cause numerous issues over the years with upgrades. VUM has caused issues upgrading from earlier versions to 6.x, and in this case it caused this issue upgrading to vCSA 7.x as well.

In my diagnosis, I logged in to the SSH console of the source appliance, and noticed that the partitions containing the VUM data (which includes update files) was around 7.4GB. This is the “/storage/updatemgr/” partition.

I wasn’t sure if this was included, but the 8GB of configuration, minus the 7.4GB of VUM data, could technically get me to around 0.6GB for migration if this was in fact included.

In my environment, I have the default (and simple) implementation of VUM with the only customization being the HPE VIBs depot. I figured maybe I should blast away the VUM and start from scratch on VMware vCSA 7.0 to see if this fixes the issue.

The Fix

To fix this issue, I simply completely reset the VMware Update Manager Database.

For details on this process and before performing these steps, please see VMware KB 2147284.

Let’s get to it:

  1. Close the migration window (you can reopen this later)
  2. Log in to your vCSA source appliance via SSH or console
  3. Run the applicable steps as defined in the VMware KB 2147284 to reset VUM (WARNING: commands are version specific). In my case on vCSA 6.5 I ran the following commands:
    1. shell
    2. service-control --stop vmware-updatemgr
    3. /usr/lib/vmware-updatemgr/bin/updatemgr-util reset-db
    4. rm -rf /storage/updatemgr/patch-store/*
    5. service-control --start vmware-updatemgr
  4. Open your web broswer and navigate to https://new-vcsa-IP:5480 and resume the migration. You will now notice a significant space reduction and won’t need to specify a new mount point.

That’s it! You have a shiny new clean VUM instance, and can successfully upgrade to vCSA 7.0 without having to specify a new mount point.

To reconfigure and restore any old configuration to VUM, you’ll do so in the “VMware Lifecycle Management” section of the VMware vCenter Server Appliance interface.

Alternatively, in the rare event it’s not related to the VUM data, you can set the export directory to somewhere in “/tmp/” which is another workaround this issue which may allow you to continue.

May 262020
 

So you want to add NVMe storage capability to your HPE Proliant DL360p Gen8 (or other Proliant Gen8 server) and don’t know where to start? Well, I was in the same situation until recently. However, after much research, a little bit of spending, I now have 8TB of NVMe storage in my HPE DL360p Gen8 Server thanks to the IOCREST IO-PEX40152.

Unsupported you say? Well, there are some of us who like to live life dangerously, there is also those of us with really cool homelabs. I like to think I’m the latter.

PLEASE NOTE: This is not a supported configuration. You’re doing this at your own risk. Also, note that consumer/prosumer NVME SSDs do not have PLP (Power Loss Prevention) technology. You should always use supported configurations and enterprise grade NVME SSDs in production environments.

Update – May 2nd 2021: Make sure you check out my other post where I install the IOCREST IO-PEX40152 in an HPE ML310e Gen8 v2 server for Version 2 of my NVMe Storage Server.

Update – June 21 2022: I’ve received numerous comments, chats, and questions about whether you can boot your server or computer using this method. Please note that this is all dependent on your server/computer, the BIOS/EFI, and capabilities of the system. In my specific scenario, I did not test booting since I was using the NVME drives purely as additional storage.

DISCLAIMER: If you attempt what I did in this post, you are doing it at your own risk. I won’t be held liable for any damages or issues.

NVMe Storage Server – Use Cases

There’s a number of reasons why you’d want to do this. Some of them include:

  • Server Storage
  • VMware Storage
  • VMware vSAN
  • Virtualized Storage (SDS as example)
  • VDI
  • Flash Cache
  • Special applications (database, high IO)

Adding NVMe capability

Well, after all that research I mentioned at the beginning of the post, I installed an IOCREST IO-PEX40152 inside of an HPE Proliant DL360p Gen8 to add NVMe capabilities to the server.

IOCREST IO-PEX40152 with 4 x 2TB Sabrent Rocket 4 NVME

At first I was concerned about dimensions as technically the card did fit, but technically it didn’t. I bought it anyways, along with 4 X 2TB Sabrent Rocket 4 NVMe SSDs.

The end result?

Picture of an HPE DL360p Gen8 with NVME SSD
HPE DL360p Gen8 with NVME SSD

IMPORTANT: Due to the airflow of the server, I highly recommend disconnecting and removing the fan built in to the IO-PEX40152. The DL360p server will create more than enough airflow and could cause the fan to spin up, generate electricity, and damage the card and NVME SSD.

Also, do not attempt to install the case cover, additional modification is required (see below).

The Fit

Installing the card inside of the PCIe riser was easy, but snug. The metal heatsink actually comes in to contact with the metal on the PCIe riser.

Picture of an IO-PEX40152 installed on DL360p PCIe Riser
IO-PEX40152 installed on DL360p PCIe Riser

You’ll notice how the card just barely fits inside of the 1U server. Some effort needs to be put in to get it installed properly.

Picture of an DL360p Gen8 1U Rack Server with IO-PEX40152 Installed
HPE DL360p Gen8 with IO-PEX40152 Installed

There are ribbon cables (and plastic fittings) directly where the end of the card goes, so you need to gently push these down and push cables to the side where there’s a small amount of thin room available.

We can’t put the case back on… Yet!

Unfortunately, just when I thought I was in the clear, I realized the case of the server cannot be installed. The metal bracket and locking mechanism on the case cover needs the space where a portion of the heatsink goes. Attempting to install this will cause it to hit the card.

Picture of the HPE DL360p Gen8 Case Locking Mechanism
HPE DL360p Gen8 Case Locking Mechanism

The above photo shows the locking mechanism protruding out of the case cover. This will hit the card (with the IOCREST IO-PEX40152 heatsink installed). If the heatsink is removed, the case might gently touch the card in it’s unlocked and recessed position, but from my measurements clears the card when locked fully and fully closed.

I had to come up with a temporary fix while I figure out what to do. Flip the lid and weight it down.

Picture of an HPE DL360p Gen8 case cover upside down
HPE DL360p Gen8 case cover upside down

For stability and other tests, I simply put the case cover on upside down and weighed it down with weights. Cooling is working great and even under high load I haven’t seen the SSD’s go above 38 Celsius.

The plan moving forward was to remove the IO-PEX40152 heatsink, and install individual heatsinks on the NVME SSD as well as the PEX PCIe switch chip. This should clear up enough room for the case cover to be installed properly.

The fix

I went on to Amazon and purchased the following items:

4 x GLOTRENDS M.2 NVMe SSD Heatsink for 2280 M.2 SSD

1 x BNTECHGO 4 Pcs 40mm x 40mm x 11mm Black Aluminum Heat Sink Cooling Fin

They arrived within days with Amazon Prime. I started to install them.

Picture of Installing GLOTRENDS M.2 NVMe SSD Heatsink on Sabrent Rocket 4 NVME
Installing GLOTRENDS M.2 NVMe SSD Heatsink on Sabrent Rocket 4 NVME
Picture of IOCREST IO-PEX40152 with GLOTRENDS M.2 NVMe SSD Heatsink on Sabrent Rocket 4 NVME
IOCREST IO-PEX40152 with GLOTRENDS M.2 NVMe SSD Heatsink on Sabrent Rocket 4 NVME

And now we install it in the DL360p Gen8 PCIe riser and install it in to the server.

You’ll notice it’s a nice fit! I had to compress some of the heat conductive goo on the PFX chip heatsink as the heatsink was slightly too high by 1/16th of an inch. After doing this it fit nicely.

Also, note the one of the cable/ribbon connectors by the SAS connections. I re-routed on of the cables between the SAS connectors they could be folded and lay under the card instead of pushing straight up in to the end of the card.

As I mentioned above, the locking mechanism on the case cover may come in to contact with the bottom of the IOCREST card when it’s in the unlocked and recessed position. With this setup, do not unlock the case or open the case when the server is running/plugged in as it may short the board. I have confirmed when it’s closed and locked, it clears the card. To avoid “accidents” I may come up with a non-conductive cover for the chips it hits (to the left of the fan connector on the card in the image).

And with that, we’ve closed the case on this project…

Picture of a HPE DL360p Gen8 Case Closed
HPE DL360p Gen8 Case Closed

One interesting thing to note is that the NVME SSD are running around 4-6 Celsius cooler post-modification with custom heatsinks than with the stock heatsink. I believe this is due to the awesome airflow achieved in the Proliant DL360 servers.

Conclusion

I’ve been running this configuration for 6 days now stress-testing and it’s been working great. With the server running VMware ESXi 6.5 U3, I am able to passthrough the individual NVME SSD to virtual machines. Best of all, installing this card did not cause the fans to spin up which is often the case when using non-HPE PCIe cards.

This is the perfect mod to add NVME storage to your server, or even try out technology like VMware vSAN. I have a number of cool projects coming up using this that I’m excited to share.

May 252020
 
vSphere Logo Image

When troubleshooting connectivity issues with your vMotion network (or vMotion VLAN), you may notice that you’re unable to ping using the ping or vmkping command on your ESXi and VMware hosts.

This occurs when you’re suing the vMotion TCP/IP stack on your vmkernel (vmk) adapters that are configured for vMotion.

This also applies if you’re using long distance vMotion (LDVM).

Why

The vMotion TCP/IP stack requires special syntax for ping and ICMP tests on the vmk adapters.

A screenshot of vmk adapters, one of which is using the vMotion TCP/IP Stack
VMK using vMotion TCP/IP Stack

Above is an example where a vmk adapter (vmk3) is configured to use the vMotion TCP/IP stack.

How

To “ping” and test your vMotion network that uses the vMotion TCP/IP stack, you’ll need to use the special command below:

esxcli network diag ping -I vmk1 --netstack=vmotion -H ip.add.re.ss

In the command above, change “vmk1” to the vmkernel adapter you want to send the pings from. Additionally, change “ip.add.re.ss” to the IP address of the host you want to ping.

Using this method, you can fully verify network connectivity between the vMotion vmks using the vMotion stack.

Additional information and examples can be found at https://kb.vmware.com/s/article/59590.

Apr 072020
 
VMware Horizon View Icon

In response to COVID 19, VMware has extended their VMware Horizon 7 trial offering up to 90 days and includes 100 users. This includes both VMware Horizon 7 On-Premise, as well as VMware Cloud on AWS.

This is great if you’re planning or about to implement and deploy VMware Horizon 7.

In it’s simplest form, Horizon 7 allows an organization to virtualize their end user computing. No more computers, no more desktops, only Zero clients and software clients. Not only does this streamline the end user computing experience, but it enables a beautiful remote access solution as well.

And Horizon isn’t limited to VDI… You can install the VMware Horizon Agent on a Physical PC so you can use VDI technologies like Blast Extreme to remote in to physical desktops at your office. It makes the perfect remote access solution. Give it a try today with an evaluation license!

To get your evaluation license, please visit https://my.vmware.com/en/web/vmware/evalcenter?p=horizon-7.

Update: VMware Horizon 8 has been released. To get the latest evaluation, visit https://my.vmware.com/en/web/vmware/evalcenter?p=horizon-eval-8.