Jan 212018
 
Azure AD

This weekend I configured Azure AD Connect for pass through authentication for my on-premise Active Directory domain. This was a first for me and extremely easy to do, however there was a few issues with my firewall and SSL content filtering and scanning rules which was blocking the connection. I figured I’d create a post providing some information you’ll need to get this setup and running quickly.

In my environment, I have a Sophos UTM firewall which provides firewall services (port blocking), as well as HTTP and HTTPs scanning and filtering (web filtering).

The Problem

After running the Azure AD Connect wizard, all went good however there was an error at the end of the wizard notifying that synchronization was configured but is not occurring due to firewall. It provided a link for more information (that actually didn’t really contain the information needed).

While this issue is occurring, you’ll notice:

-Azure AD Connect in the Azure portal is reporting that pass-through authentication is Enabled, however after expanding the item, the Authentication Agent reports a status of Inactive on your internal domain controllers.

-In the Event log, under “Applications and Services Logs”, then “Microsoft”, then “AzureADConnect”, then “AuthenticationAgent”, and finally “Admin”, you’ll see the following error event:

Event ID: 12019

Source: Microsoft Azure AD Connect Authentication Agent (Microsoft-AzureADConnect-AuthenticationAgent)

Event:
The Connector stopped working because the client certificate is not valid. Uninstall the Connector and install it again. Request ID: '{WAJAJAJA-OHYA-YAAA-YAAAA-WAKAKAKAKAKAKAK}'

This event log above is due to the SSL and HTTPs content filtering.

-Azure Pass-Through authentication won’t work

The Fix

After doing some research, I came up with the following list of ports and hosts you’ll need to allow unfiltered to a specific list of hosts.

Ports

The following ports are used by Azure AD Connect:

Port 443 – SSL

Port 5671 – TCP (From the host running the Azure AD Connect to Internet)

Hosts (DNS Hosts)

Here’s the host list:

*blob.core.windows.net
*servicebus.windows.net
*adhybridhealth.azure.com
*management.azure.com
*policykeyservice.dc.ad.msft.net
*login.windows.net
*login.microsoftonline.com
*secure.aadcdn.microsoftonline-p.com
*microsoftonline.com
*windows.net
*msappproxy.net
*mscrl.microsoft.com
*crl.microsoft.com
*ocsp.msocsp.com
*www.microsoft.com

If you’re running a Sophos UTM like I am, you’ll need to create an HTTP(s) scanning exception and then import this list in to a rule “Matching these URLs”:

^https?://([A-Za-z0-9.-]*\.)?blob.core.windows.net/
^https?://([A-Za-z0-9.-]*\.)?servicebus.windows.net/
^https?://([A-Za-z0-9.-]*\.)?adhybridhealth.azure.com/
^https?://([A-Za-z0-9.-]*\.)?management.azure.com/
^https?://([A-Za-z0-9.-]*\.)?policykeyservice.dc.ad.msft.net/
^https?://([A-Za-z0-9.-]*\.)?login.windows.net/
^https?://([A-Za-z0-9.-]*\.)?login.microsoftonline.com/
^https?://([A-Za-z0-9.-]*\.)?secure.aadcdn.microsoftonline-p.com/
^https?://([A-Za-z0-9.-]*\.)?microsoftonline.com/
^https?://([A-Za-z0-9.-]*\.)?windows.net/
^https?://([A-Za-z0-9.-]*\.)?msappproxy.net/
^https?://([A-Za-z0-9.-]*\.)?mscrl.microsoft.com/
^https?://([A-Za-z0-9.-]*\.)?crl.microsoft.com/
^https?://([A-Za-z0-9.-]*\.)?ocsp.msocsp.com/
^https?://([A-Za-z0-9.-]*\.)?www.microsoft.com/

The exception I created skips:

  • Authentication
  • Caching
  • Antivirus
  • Extension Blocking
  • MIME type blocking
  • URL Filter
  • Content Removal
  • SSL Scanning
  • Certificate trust check
  • Certificate date check

After creating the exceptions, I restarted the “Microsoft Azure AD Connect Authentication Agent”. The errors stopped and Azure AD Pass-through started to function correctly! Also the status of the Authentication Agent now reports a status of active.

Oct 182017
 

Well, it’s October 18th 2017 and the Fall Creators update (Feature update to Windows 10, version 1709) is now available for download. In my particular environment, I use WSUS to deploy and manage updates.

Update: It’s now May 2018, and this article also applies to Windows 10 April 2018 update version 1803 as well!

Update: It’s now October 2018, and this article also applies to Windows 10 October 2018 update version 1809 as well!

Update: It’s now May 2019, and this article also applies to Windows 10 May 2019 update version 1903 as well!

I went ahead earlier today and approved the updates for deployment, however I noticed an issue on multiple Windows 10 machines, where the Windows Update client would get stuck on Downloading updates 0% status.

I checked a bunch of things, but noticed that it simply couldn’t download the updates from my WSUS server. Further investigation found that the feature updates are packaged in .esd files and IIS may not be able to serve these properly without a minor modification. I remember applying this fix in the past, however I’m assuming it was removed by a prior update on my Windows Server 2012 R2 server.

If you are experiencing this issue, here’s the fix:

  1. On your server running WSUS and IIS, open up the IIS manager.
  2. Expand Sites, and select “WSUS Administration”
  3. On the right side, under IIS, select “MIME Types”
  4. Make sure there is not a MIME type for .esd, if there is, you’re having a different issue, if not, continue with the instructions.
  5. Click on “Add” on the right Actions pane.
  6. File name extension will be “.esd” (without quotations), and MIME type will be “application/octet-stream” (without quotations).
  7. Reset IIS or restart WSUS/IIS server

You’ll notice the clients will now update without a problem! Happy Updating!

Feb 182017
 
Windows Server Volume Shadow Copy Volumes Snapshot Screenshot

On VMware vSphere ESXi 6.5, 6.7, and 7.0, a condition exists where one is unable to take a quiesced snapshot. This is an issue that effects quite a few people and numerous forum threads can be found on the internet by those searching for the solution.

This issues can occur both when taking manual snapshots of virtual machines when one chooses “Quiesce guest filesystem”, or when using snapshot based backup applications such as vSphere Data Protection (vSphere vDP), Veeam, or other applications that utilize quiesced snapshots.

The Issue

I experienced this problem on one of my test VMs (Windows Server 2012 R2), however I believe it can occur on newer versions of Windows Server as well, including Windows Server 2016 and Windows Server 2019.

When this issue occurs, the snapshot will fail and the following errors will be present:

An error occurred while taking a snapshot: Failed to quiesce the virtual machine.
An error occurred while saving the snapshot: Failed to quiesce the virtual machine.

Performing standard troubleshooting, I restarted the VM, checked for VSS provider errors, and confirmed that the Windows Services involved with snapshots were in their correct state and configuration. Unfortunately this had no effect, and everything was configured the way it should be.

I also tried to re-install VMWare tools, which had no effect.

PLEASE NOTE: If you experience this issue, you should confirm the services are in their correct state and configuration, as outlined in VMware KB: 1007696. Source: https://kb.vmware.com/s/article/1007696

The Fix

In the days leading up to the failure when things were running properly, I did notice that the quiesced snapshots for that VM were taking a long time process, but were still functioning correctly before the failure.

This morning during troubleshooting, I went ahead and deleted all the Windows Volume Shadow Copies (VSS Snapshots) which are internal and inside of the Virtual Machine itself. These are the shadow copies that the Windows guest operating system takes on it’s own filesystem (completely unrelated to VMware).

To my surprise after doing this, not only was I able to create a quiesced snapshot, but the snapshot processed almost instantly (200x faster than previously when it was functioning).

If you’re comfortable deleting all your snapshots, it may also be a good idea to fully disable and then re-enable the VSS Snapshots on the volume to make sure they are completely deleted and reset.

I’m assuming this was causing a high load for the VMware snapshot to process and a timeout was being hit on snapshot creation which caused the issue. While Windows volume shadow copies are unrelated to VMware snapshots, they both utilize the same VSS (Volume Shadow Copy Service) system inside of windows to function and process. One must also keep in mind that the Windows volume shadow copies will of course be part of a VMware snapshot since they are stored inside of the VMDK (the virtual disk) file.

PLEASE NOTE: Deleting your Windows Volume Shadow copies will delete your Windows volume snapshots inside of the virtual machine. You will lose the ability to restore files and folders from previous volume shadow copy snapshots. Be aware of what this means and what you are doing before attempting this fix.

Sep 232016
 

Well, recently one of the servers I monitor and maintain in a remote oil town recently started throwing out a Windows event log warning:

Event ID: 129

Source: HpCISSs2

Description: Reset to device, \Device\RaidPort0, was issued.

The server is an HP ML350p Gen8 (Windows Server 2008 R2) running latest firmware and management software. It has 2 RAID Arrays (RAID1, and RAID5), and a total of 6 disks.

Researching this error, I read that most people had this occur when running the latest HP WBEM providers, as well as anti-virus software. In our case, I actually tried to downgrade to an older version, but noticed the warning still occurs. While we do have anti-virus, it’s not actively scanning (only weekly scheduled scans).

In the process of troubleshooting, I noticed that under the HP Systems Management Homepage, one of the drives in the RAID1 array, had the following stats:

Hard Read Erros:  150
Recovery Read Errors:  7
Total Seeks:  0
Seek Errors:  0

I found these numbers to be very high in my experience. None of the other drives had anything close to this (in 4 years of running, only one other disk had a read error (a single one), this disk however had tons. For some reason the drive is still reporting as operational, when I’d expect it to be marked as a predicted failure, or failed.

While all online documentation was pointing towards at locks on the array by software, from my own experience I think it was actually the array waiting for a read operation on the array, and it was this single disk that was causing a threshold to be hit in the driver, that caused a retry to recover the read operation.

Called up HPE support, I mentioned I’d like to have the drive replaced. The support engineer consulted her senior engineer and reviewed the evidence I presented (along with ADU reports, and Active Monitoring health reports), the senior engineer concurred that the drive should be replaced.

Replacing the drive resolved the issue. I’m also noticing a performance increase on the array as well.

Make sure to always check the stats on the individual components of your RAID arrays, even if everything is operating sound.

Sep 102016
 

When initiating manual backups or occasionally when automatic/scheduled backups run, a user may notice that Windows Server Backup may appear to “hang” when the status is reporting: “Preparing media to store backups…”.

In some rare cases, it may actually be in a hang state, however most of the time, it’s actually consolidating and/or checking previous backups on the destination media.

To Confirm this:

Open the Task Manager as Administrator, then click on the “Performance” tab, click on “Open Resource Monitor”. Flip over to the “Disk” tab, expand “Disk Activity”, and sort by name. You should see the read requests on the destination media, you’ll also notice that it is slowly progressing consecutively through each backup set (increments of 1, accessing multiple at a time).

This confirms that the Windows Server Backup services are functioning and it is in fact running. In one case, I had 723 previous backups, and it took around 50 minutes to count from 1 to 723, and then the backup finally proceeded.

I have also seen this occur when a previous backup failed or was cancelled. This occurs with Windows Server Backup on Windows Server 2008, Windows Server 2008 R2, and Windows Server 2012 R2.