May 142019
 

There may be a point in time where you may wish to clear and rebuild the search index catalog on your Microsoft Exchange 2016 Server. This will cause the server to rebuild the search index from scratch.

In my case, for the past month or so Outlook 2019 (Office 365) clients connecting to an on-premise Microsoft Exchange 2016 Server, have been seeing the message “We’re having trouble fetching results from the server…”. The user can click on “Let’s look on your computer instead.” and the search will complete.

When troubleshooting this issue, I tried all of the following:

  • Clearing and rebuilding the Search index on the client computers
  • Deleted the OST files to re-create the local cached copy on the client computers
  • Restarting the Exchange Server
  • Restarting the Client Computers
  • Analyzing the Event Log for any errors (none)

None of the above helped in troubleshooting.

We're having trouble fetching results from the server...
Outlook: “We’re having trouble fetching results from the server…”

Because of this, I decided to clear and rebuild the Search Index catalog for the mailbox database on the Exchange Server.

To check the status and to see if your index is corrupt, run the following command:

Get-MailboxDatabaseCopyStatus |  ft ContentIndexState

“ContentIndexState” will report as “Corrupt” if it is corrupt, or “Healthy” if it is healthy.

[PS] C:\Windows\system32>Get-MailboxDatabaseCopyStatus |  ft ContentIndexState

ContentIndexState
-----------------
          Healthy

My server reported as healthy, but I still chose to run the instructions below to rebuild the index.

Instructions

To do delete and re-create your Exchange Server Mailbox Database Search Index Catalog, please perform the following instructions.

Please Note: This is only for Exchange servers that are not part of a DAG. Do not perform these steps if your server is part of an Exchange cluster. Always make sure you have a complete backup of your server.

  1. Log on to your Exchange server.
  2. From the Start Menu, expand “Microsoft Exchange Server 2016”, and right-click on “Exchange Management Shell”, and select “Run as Administrator”.
    Administrative Exchange Management Shell
  3. Type the following commands to stop required search services.
    Stop-Service MSExchangeFastSearch
    Stop-Service HostControllerService
  4. Open a file browser (you may need Administrative privilege) and navigate to your Exchange mailbox directory.
    C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database NumberNumberNumber
  5. You’ll see a folder inside of the mailbox folder with a GUID type name with Single at the end. Delete or move this (preferred is move to alternate location). I’ve put an example below.
    12854239C-1823-8c32-ODJQ-SSDFK123CSDFG.1.Single

    This is the folder you want to move/delete.
  6. Go back to the “Exchange Management Shell”, and run the following commands to start the services.
    Start-Service MSExchangeFastSearch
    Start-Service HostControllerService
  7. As mentioned above, you can check the status of the rebuild by running the “Get-MailboxDatabaseCopyStatus” command, and looking at the “ContentIndexState” status.

That’s it! After running the command, you may notice your server will experience heavy CPU usage due to Exchange rebuilding the search index.

After rebuilding the search index, I noticed that my Outlook clients were able to successfully search on the server without having to select “Let’s look on your computer instead.”.

May 142019
 

On a fresh or existing WSUS install, you may notice that the WSUS Administrator MMC applet stops functioning and present the error “Error: Connection Error – An error occurred trying to connect to the WSUS Server.”

I originally experienced this on Windows Server Update Services running on Windows Server 2012 R2 and applied the fix. Recently, I deployed Windows Server Update Services on a new Windows Server 2019 – Server Core install, and experienced this issue during the first synchronization. Before realizing what the issue was, I attempted to re-install WSUS and IIS from scratch numerous times until I came across old notes. One would have thought they would have resolved this issue on a new server operating system.

When the issue occurs, all processes will appear to be running on the server. Looking at the server event log, you’ll notice multiple application errors:

  • Event ID: 13042 - Windows Server Update Services
    Description: Self-update is not working.
  • Event ID: 12002 - Windows Server Update Services
    Description: The Reporting Web Service is not working.
  • Event ID: 12012 - Windows Server Update Services
    Description: The API Remoting Web Service is not working.
  • Event ID: 12032 - Windows Server Update Services
    Description: The Server Synchronization Web Service is not working.
  • Event ID: 12022 - Windows Server Update Services
    Description: The Client Web Service is not working.
  • Event ID: 12042 - Windows Server Update Services
    Description: The SimpleAuth Web Service is not working.
  • Event ID: 12052 - Windows Server Update Services
    Description: The DSS Authentication Web Service is not working.
  • Event ID: 12072 - Windows Server Update Services
    Description: The WSUS content directory is not accessible.
    System.Net.WebException: The remote server returned an error: (503) Server Unavailable.
       at System.Net.HttpWebRequest.GetResponse()
       at Microsoft.UpdateServices.Internal.HealthMonitoring.HmtWebServices.CheckContentDirWebAccess(EventLoggingType type, HealthEventLogger logger)

You will also see the below error message when attempting to use the WSUS MMC.

WSUS Connection Error presented when memory issue occurs
WSUS 503 Error: Connection Error

The Problem

This issue occurs because the WSUS application pool in IIS “WsusPool” has reached it’s maximum private memory limit and attempts to recycle the memory usage.

Ultimately I believe this causes the IIS worker process to crash since it has run out of memory, and the pending command (whether it’s a synchronization or something else) fails to complete.

Previously, I noticed database corruption on a WSUS SQL Express database when this issue occurred, so I recommend applying the fix on a fresh install of WSUS.

The Fix

To resolve this issue, we need to adjust the max

  1. On the server running WSUS and IIS, open the “Internet Information Services (IIS) Manager” inside of the “Windows Administrative Tools” (found in the start menu, or Control Panel).
    Internet Information (IIS) Services in Start Menu
  2. On the left hand side under “Connections”, expand the server, and select “Application Pools”.
    IIS Application Pools Selected
  3. On the right hand side under “Application Pools” heading, right-click on “WsusPool” and select “Advanced Settings”.
    WsusPool Application Pool Selected with Right-Click
  4. In the “Advanced Settings” window, scroll down until you see “Private Memory Limit (KB)”. Either change this to “0” (as shown below) to set no memory limit, or increase the limit to the value you prefer.
    Private Memory Limit set to "0" in WsusPool IIS Application Pool
  5. Select “Ok” to close the window.
  6. Restart IIS by running “iisreset” from an administrative command prompt, restarting the server, or selecting “Restart” under “Manage Server” when looking at the default pane in IIS when the server is selected.

The issue should now be resolved and your WSUS server should no longer be crashing.

If you are applying this fix on a Server Core install, you’ll need to connect remotely to the IIS instance to apply the fix.

May 022019
 
Nvidia GRID Logo

I can’t tell you how excited I am that after many years, I’ve finally gotten my hands on and purchased an Nvidia Quadro K1 GPU. This card will be used in my homelab to learn, and demo Nvidia GRID accelerated graphics on VMware Horizon View. In this post I’ll outline the details, installation, configuration, and thoughts. And of course I’ll have plenty of pictures below!

The focus will be to use this card both with vGPU, as well as 3D accelerated vSGA inside in an HPE server running ESXi 6.5 and VMware Horizon View 7.8.

Please Note: As of late (late 2020), hardware h.264 offloading no longer functions with VMware Horizon and VMware BLAST with NVidia Grid K1/K2 cards. More information can be found at https://www.stephenwagner.com/2020/10/10/nvidia-vgpu-grid-k1-k2-no-h264-session-encoding-offload/

Please Note: Some, most, or all of what I’m doing is not officially supported by Nvidia, HPE, and/or VMware. I am simply doing this to learn and demo, and there was a real possibility that it may not have worked since I’m not following the vendor HCL (Hardware Compatibility lists). If you attempt to do this, or something similar, you do so at your own risk.

Nvidia GRID K1 Image

For some time I’ve been trying to source either an Nvidia GRID K1/K2 or an AMD FirePro S7150 to get started with a simple homelab/demo environment. One of the reasons for the time it took was I didn’t want to spend too much on it, especially with the chances it may not even work.

Essentially, I have 3 Servers:

  1. HPE DL360p Gen8 (Dual Proc, 128GB RAM)
  2. HPE DL360p Gen8 (Dual Proc, 128GB RAM)
  3. HPE ML310e Gen8 v2 (Single Proc, 32GB RAM)

For the DL360p servers, while the servers are beefy enough, have enough power (dual redundant power supplies), and resources, unfortunately the PCIe slots are half-height. In order for me to use a dual-height card, I’d need to rig something up to have an eGPU (external GPU) outside of the server.

As for the ML310e, it’s an entry level tower server. While it does support dual-height (dual slot) PCIe cards, it only has a single 350W power supply, misses some fancy server technologies (I’ve had issues with VT-d, etc), and only a single processor. I should be able to install the card, however I’m worried about powering it (it has no 6pin PCIe power connector), and having ESXi be able to use it.

Finally, I was worried about cooling. The GRID K1 and GRID K2 are typically passively cooled and meant to be installed in to rack servers with fans running at jet engine speeds. If I used the DL360p with an external setup, this would cause issues. If I used the ML310e internally, I had significant doubts that cooling would be enough. The ML310e did have the plastic air baffles, but only had one fan for the expansion cards area, and of course not all the air would pass through the GRID K1 card.

The Purchase

Because of a limited budget, and the possibility I may not even be able to get it working, I didn’t want to spend too much. I found an eBay user local in my city who had a couple Grid K1 and Grid K2 cards, as well as a bunch of other cool stuff.

We spoke and he decided to give me a wicked deal on the Grid K1 card. I thought this was a fantastic idea as the power requirements were significantly less (more likely to work on the ML310e) on the K1 card at 130 W max power, versus the K2 card at 225 W max power.

NVIDIA GRID K1 and K2 Specifications
NVIDIA GRID K1 and K2 Specification Table

The above chart is a capture from:
https://www.nvidia.com/content/cloud-computing/pdf/nvidia-grid-datasheet-k1-k2.pdf

We set a time and a place to meet. Preemptively I ran out to a local supply store to purchase an LP4 power adapter splitter, as well as a LP4 to 6pin PCIe power adapter. There were no available power connectors inside of the ML310e server so this was needed. I still thought the chances of this working were slim…

These are the adapters I purchased:

Preparation and Software Installation

I also decided to go ahead and download the Nvidia GRID Software Package. This includes the release notes, user guide, ESXi vib driver (includes vSGA, vGPU), as well as guest drivers for vGPU and pass through. The package also includes the GRID vGPU Manager. The driver I used was from:
https://www.nvidia.com/Download/driverResults.aspx/144909/en-us

To install, I copied over the vib file “NVIDIA-vGPU-kepler-VMware_ESXi_6.5_Host_Driver_367.130-1OEM.650.0.0.4598673.vib” to a datastore, enabled SSH, and then ran the following command to install:

esxcli software vib install -v /path/to/file/NVIDIA-vGPU-kepler-VMware_ESXi_6.5_Host_Driver_367.130-1OEM.650.0.0.4598673.vib

The command completed successfully and I shut down the host. Now I waited to meet.

We finally met and the transaction went smooth in a parking lot (people were staring at us as I handed him cash, and he handed me a big brick of something folded inside of grey static wrap). The card looked like it was in beautiful shape, and we had a good but brief chat. I’ll definitely be purchasing some more hardware from him.

Hardware Installation

Installing the card in the ML310e was difficult and took some time with care. First I had to remove the plastic air baffle. Then I had issues getting it inside of the case as the back bracket was 1cm too long to be able to put the card in. I had to finesse and slide in on and angle but finally got it installed. The back bracket (front side of case) on the other side slid in to the blue plastic case bracket. This was nice as the ML310e was designed for extremely long PCIe expansion cards and has a bracket on the front side of the case to help support and hold the card up as well.

For power I disconnected the DVD-ROM (who uses those anyways, right?), and connected the LP5 splitter and the LP5 to 6pin power adapter. I finally hooked it up to the card.

I laid the cables out nicely and then re-installed the air baffle. Everything was snug and tight.

Please see below for pictures of the Nvidia GRID K1 installed in the ML310e Gen8 V2.

Host Configuration

Powering on the server was a tense moment for me. A few things could have happened:

  1. Server won’t power on
  2. Server would power on but hang & report health alert
  3. Nvidia GRID card could overheat
  4. Nvidia GRID card could overheat and become damaged
  5. Nvidia GRID card could overheat and catch fire
  6. Server would boot but not recognize the card
  7. Server would boot, recognize the card, but not work
  8. Server would boot, recognize the card, and work

With great suspense, the server powered on as per normal. No errors or health alerts were presented.

I logged in to iLo on the server, and watched the server perform a BIOS POST, and start it’s boot to ESXi. Everything was looking well and normal.

After ESXi booted, and the server came online in vCenter. I went to the server and confirmed the GRID K1 was detected. I went ahead and configured 2 GPUs for vGPU, and 2 GPUs for 3D vSGA.

ESXi Graphics Settings for Host Graphics and Graphics Devices
ESXi Host Graphics Devices Settings

VM Configuration

I restarted the X.org service (required when changing the options above), and proceeded to add a vGPU to a virtual machine I already had configured and was using for VDI. You do this by adding a “Shared PCI Device”, selecting “NVIDIA GRID vGPU”, and I chose to use the highest profile available on the K1 card called “grid_k180q”.

Virtual Machine Edit Settings with NVIDIA GRID vGPU and grid_k180q profile selected
VM Settings to add NVIDIA GRID vGPU

After adding and selecting ok, you should see a warning telling you that must allocate and reserve all resources for the virtual machine, click “ok” and continue.

Power On and Testing

I went ahead and powered on the VM. I used the vSphere VM console to install the Nvidia GRID driver package (included in the driver ZIP file downloaded earlier) on the guest. I then restarted the guest.

After restarting, I logged in via Horizon, and could instantly tell it was working. Next step was to disable the VMware vSGA Display Adapter in the “Device Manager” and restart the host again.

Upon restarting again, to see if I had full 3D acceleration, I opened DirectX diagnostics by clicking on “Start” -> “Run” -> “dxdiag”.

DirectX Diagnostic Tool (dxdiag) showing Nvidia Grid K1 on VMware Horizon using vGPU k180q profile
dxdiag on GRID K1 using k180q profile

It worked! Now it was time to check the temperature of the card to make sure nothing was overheating. I enabled SSH on the ESXi host, logged in, and ran the “nvidia-smi” command.

nvidia-smi command on ESXi host showing GRID K1 information, vGPU information, temperatures, and power usage
“nvidia-smi” command on ESXi Host

According to this, the different GPUs ranged from 33C to 50C which was PERFECT! Further testing under stress, and I haven’t gotten a core to go above 56. The ML310e still has an option in the BIOS to increase fan speed, which I may test in the future if the temps get higher.

With “nvidia-smi” you can see the 4 GPUs, power usage, temperatures, memory usage, GPU utilization, and processes. This is the main GPU manager for the card. There are some other flags you can use for relevant information.

nvidia-smi with vgpu flag for vgpu information
“nvidia-smi vgpu” for vGPU Information
nvidia-smi with vgpu -q flag
“nvidia-smi vgpu -q” to Query more vGPU Information

Final Thoughts

Overall I’m very impressed, and it’s working great. While I haven’t tested any games, it’s working perfect for videos, music, YouTube, and multi-monitor support on my 10ZiG 5948qv. I’m using 2 displays with both running at 1920×1080 for resolution.

I’m looking forward to doing some tests with this VM while continuing to use vGPU. I will also be doing some testing utilizing 3D Accelerated vSGA.

The two coolest parts of this project are:

  • 3D Acceleration and Hardware h.264 Encoding on VMware Horizon
  • Getting a GRID K1 working on an HPE ML310e Gen8 v2

Highly recommend getting a setup like this for your own homelab!

Uses and Projects

Well, I’m writing this “Uses and Projects” section after I wrote the original article (it’s now March 8th, 2020). I have to say I couldn’t be impressed more with this setup, using it as my daily driver.

Since I’ve set this up, I’ve used it remotely while on airplanes, working while travelling, even for video editing.

Some of the projects (and posts) I’ve done, can be found here:

Leave a comment and let me know what you think! Or leave a question!

Feb 192019
 

Upgrading to Exchange 2016 CU12 may fail when using Let’s Encrypt SSL Certificates

On a Microsoft Exchange 2016 Server, utilizing Let’s Encrypt SSL Certificates, an upgrade to Cumulative Update 12 may fail. This is due to security permissions on the SSL certificate.

I later noticed that this occurs on all cumulative updates when using the Let’s Encrypt SSL certificates. This includes Exchange 2016 CU13 and CU14.

The CU install will fail, some services may function, but the server will not accept e-mail, or allow connections from Microsoft Outlook, or ActiveSync devices. PowerShell and EAC will not function.

The issue can be identified on this failure log:

[02/18/2019 19:24:28.0862] [2] Beginning processing Install-AuthCertificate
[02/18/2019 19:24:28.0867] [2] Ending processing Install-AuthCertificate
[02/18/2019 19:24:28.0868] [1] The following 1 error(s) occurred during task execution:
[02/18/2019 19:24:28.0868] [1] 0. ErrorRecord: Could not grant Network Service access to the certificate with thumbprint XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX because a cryptographic exception was thrown.
[02/18/2019 19:24:28.0868] [1] 0. ErrorRecord: Microsoft.Exchange.Management.SystemConfigurationTasks.AddAccessRuleCryptographicException: Could not grant Network Service access to the certificate with thumbprint XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX because a cryptographic exception was thrown. ---> System.Security.Cryptography.CryptographicException: Access is denied.
at Microsoft.Exchange.Security.Cryptography.X509Certificates.TlsCertificateInfo.CAPIAddAccessRule(X509Certificate2 certificate, AccessRule rule)
at Microsoft.Exchange.Security.Cryptography.X509Certificates.TlsCertificateInfo.AddAccessRule(X509Certificate2 certificate, AccessRule rule)
at Microsoft.Exchange.Management.SystemConfigurationTasks.ManageExchangeCertificate.EnableForServices(X509Certificate2 cert, AllowedServices services, String websiteName, Boolean requireSsl, ITopologyConfigurationSession dataSession, Server server, List`1 warningList, Boolean allowConfirmation, Boolean forceNetworkService)
--- End of inner exception stack trace ---
at Microsoft.Exchange.Configuration.Tasks.Task.ThrowError(Exception exception, ErrorCategory errorCategory, Object target, String helpUrl)
at Microsoft.Exchange.Configuration.Tasks.Task.WriteError(Exception exception, ErrorCategory category, Object target)
at Microsoft.Exchange.Management.SystemConfigurationTasks.InstallExchangeCertificate.EnableForServices(X509Certificate2 cert, AllowedServices services)
at Microsoft.Exchange.Management.SystemConfigurationTasks.InstallExchangeCertificate.InternalProcessRecord()
at Microsoft.Exchange.Configuration.Tasks.Task.b__91_1()
at Microsoft.Exchange.Configuration.Tasks.Task.InvokeRetryableFunc(String funcName, Action func, Boolean terminatePipelineIfFailed)
[02/18/2019 19:24:28.0883] [1] [ERROR] The following error was generated when "$error.Clear();
Install-ExchangeCertificate -services "IIS, POP, IMAP" -DomainController $RoleDomainController
if ($RoleIsDatacenter -ne $true -And $RoleIsPartnerHosted -ne $true)
{
Install-AuthCertificate -DomainController $RoleDomainController
}
" was run: "Microsoft.Exchange.Management.SystemConfigurationTasks.AddAccessRuleCryptographicException: Could not grant Network Service access to the certificate with thumbprint XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX because a cryptographic exception was thrown. ---> System.Security.Cryptography.CryptographicException: Access is denied.
at Microsoft.Exchange.Security.Cryptography.X509Certificates.TlsCertificateInfo.CAPIAddAccessRule(X509Certificate2 certificate, AccessRule rule)
at Microsoft.Exchange.Security.Cryptography.X509Certificates.TlsCertificateInfo.AddAccessRule(X509Certificate2 certificate, AccessRule rule)
at Microsoft.Exchange.Management.SystemConfigurationTasks.ManageExchangeCertificate.EnableForServices(X509Certificate2 cert, AllowedServices services, String websiteName, Boolean requireSsl, ITopologyConfigurationSession dataSession, Server server, List`1 warningList, Boolean allowConfirmation, Boolean forceNetworkService)
--- End of inner exception stack trace ---
at Microsoft.Exchange.Configuration.Tasks.Task.ThrowError(Exception exception, ErrorCategory errorCategory, Object target, String helpUrl)
at Microsoft.Exchange.Configuration.Tasks.Task.WriteError(Exception exception, ErrorCategory category, Object target)
at Microsoft.Exchange.Management.SystemConfigurationTasks.InstallExchangeCertificate.EnableForServices(X509Certificate2 cert, AllowedServices services)
at Microsoft.Exchange.Management.SystemConfigurationTasks.InstallExchangeCertificate.InternalProcessRecord()
at Microsoft.Exchange.Configuration.Tasks.Task.b__91_1()
at Microsoft.Exchange.Configuration.Tasks.Task.InvokeRetryableFunc(String funcName, Action func, Boolean terminatePipelineIfFailed)".
[02/18/2019 19:24:28.0883] [1] [ERROR] Could not grant Network Service access to the certificate with thumbprint XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX because a cryptographic exception was thrown.
[02/18/2019 19:24:28.0883] [1] [ERROR] Access is denied.
[02/18/2019 19:24:28.0883] [1] [ERROR-REFERENCE] Id=CafeComponent___ece23aa8c6744163B617570021d78090 Component=EXCHANGE14:\Current\Release\Shared\Datacenter\Setup
[02/18/2019 19:24:28.0895] [1] Setup is stopping now because of one or more critical errors.
[02/18/2019 19:24:28.0895] [1] Finished executing component tasks.
[02/18/2019 19:24:28.0925] [1] Ending processing Install-CafeRole
[02/18/2019 19:35:09.0688] [0] CurrentResult setupbase.maincore:396: 0
[02/18/2019 19:35:09.0689] [0] End of Setup

The Fix

Unfortunately because Exchange is not working, you won’t be able to use Powershell or the EAC to configure SSL certs.

To resolve this, open up the IIS Manager, right click on the Exchange Web Site, click “Edit Bindings”

IIS Exchange Edit Bindings
IIS Exchange Edit Bindings

Once the “Edit Bindings” windows is open, you’ll want to open BOTH https bindings, and click “Edit”, and then change the SSL Certificate from the Let’s Encrypt SSL cert, to the self-signed Exchange certificate that ships on the brand new install. The self-signed certification most likely will be labelled as the computer name.

Exchange SSL Bindings
Exchange SSL Bindings

If you configured the Let’s Encrypt SSL certificate on the “Exchange Backend” IIS site, you’ll also need to repeat these steps on that as well.

You can now restart the server, run the “setup.exe” on CU12 again, and it will attempt to continue and repair Exchange 2016 Cumulative Update 12.

Final Note

After the update is complete, you’ll want to restart the server. You’ll notice that the acme script, whether run automatically or manually, will not set the Let’s Encrypt certificate up again (because it’s not due for renewal). You’ll need to run the letsencrypt.exe file, and force an auto renewal which will kick off the Exchange configuration scripts (or you can manually set the certificate if you’re comfortable applying Exchange SSL certificates via PowerShell.

Additional Resources

Oct 282018
 

I have noticed an issue when after upgrading Microsoft Exchange 2016 CU10 to Exchange 2016 CU11, services may fail to start. This issue can be intermittent, where some restarts are able to start more services, and others restarts fewer. I have observed this on 2 separate Exchange upgrades, both were CU10 to CU11.

The Problem

Recently, a customer had an issue where a Microsoft Exchange security update bricked their entire Exchange CU10 installation. Files were missing and services would not start (even after manually re-configuring all system services to their prior settings, and force starting). To fix this, we weighed our options and decided the best course of action would be to attempt the latest CU (CU11). This is because each Microsoft Exchange Cumulative update is actually a full installer that completely removes the old version, and installs the new version cleanly.

After installing CU11 we were able to rescue the Exchange installation (services could now start, and functioned), however numerous errors and warnings were now present, and we also noticed that there were some new issues with services.

One service in particular called “Net.Tcp Port Sharing Service”, would occasionally not start in time and cause all the Exchange Services not to start (Exchange is dependent on this services). Other times, this service would start, however random Exchange services would timeout.

Some of the errors and warnings included:

Event ID 7000
Source: Service Control Manager
Description:
The MSComplianceAudit service failed to start due to the following error: 
The service did not respond to the start or control request in a timely fashion.

Event ID 7009
Source: Service Control Manager
Description:
A timeout was reached (30000 milliseconds) while waiting for the MSComplianceAudit service to connect.

Event ID 7000
Source: Service Control Manager
Description:
The MSExchangeRepl service failed to start due to the following error: 
The service did not respond to the start or control request in a timely fashion.

Event ID 7009
Source: Service Control Manager
Description:
A timeout was reached (30000 milliseconds) while waiting for the MSExchangeRepl service to connect.

I also observed that on a few restarts, the services that failed would eventually end up restarting 10-15 minutes later (this only occurred 50% of the time).

Originally I was concerned and believed these issues were related to the original issues the customer experienced, however I upgraded my own Exchange 2016 server to CU11 and experienced the same problems (my instance was a clean fully functioning install). I also attempted to upgrade .NET to version 4.7.2 to see if this had any effect, but it did not.

When you go in to services (services.msc) and manually start the services, Exchange functions perfectly and everything works.

The Solution

As of yet, I don’t have a proper solution. I did however notice that with my customer’s environment, after it was left to sit overnight (around 8 hours), that subsequent restarts actually were able to start the majority of the services properly. It almost seemed as if it just needed time to fix itself. I’m not sure if this is because of IO load, or some type of Exchange database maintenance, but I’m waiting to see if it clears up on my instance as well after an amount of time. I’ll be keeping this post updated.

UPDATE – October 29th: I’ve confirmed for the 2nd time that the issue resolves at least 6-8 hours after the upgrade. At the end of the day I restarted my machine and everything was functioning properly.

If you are experiencing this issue, or can make a comment on it, please leave a comment on this post!

Additional Resources