Jul 302016
 

I have identified and confirmed with 2 different HPE MSA 2040 SANs an issue with SMTP notifications. I’ve identified the issue with multiple firmware versions (even the latest version as of the date of this article being written). The issue stops e-mail notifications from being sent from the MSA 2040 when the SAN is configured with some SMTP relays. This issue also occurs on HPE MSA 2050 arrays, as well as HPE MSA 2052 arrays.

The main concern is that some administrators may configure the notification service believing it is working, when in fact it is not. This could cause problems if the SAN isn’t regularly monitored and if e-mail notifications alone are being used to monitor its health.

Configuration:

-MSA 2040 (2050/2052) Dual Controller SAN configured with SMTP notifications

-SMTP destination server configured as EXIM mail proxy (in my case a Sophos UTM firewall)

Symptoms:

-Test notifications are not received (even though the MSA confirms OK on transmission)

-Real notifications are not received

-Occasionally if numerous tests are sent in a short period of time (5+ tests within 3 seconds), one of the tests may actually go through.

Events and Logs observed:

/var/log/smtp/2016/06/smtp-2016-06-20.log.gz:2016:06:20-20:44:29 SERVERNAME exim-in[16539]: 2016-06-20 20:44:29 SMTP connection from [SAN:CONTROLLER:IP:ADDY]:36977 (TCP/IP connection count = 1)

/var/log/smtp/2016/06/smtp-2016-06-20.log.gz:2016:06:20-20:44:29 SERVERNAME exim-in[18615]: 2016-06-20 20:44:29 SMTP protocol synchronization error (input sent without waiting for greeting): rejected connection from H=[SAN:CONTEROLLER:IP:ADDY]:36977 input=”NOOP\r\n”

Resolution:

To resolve this issue, I tried numerous things however the only fix I could come up with, is configuring the SAN to relay SMTP notifications through a Exchange 2013 Server. To do this, you must create a special connector to allow SMTP relaying of anonymous messages (security must be configured on this connector to stop SPAM), and further modify security permissions on that send connector to allow transmission to external e-mail addresses. After doing this, e-mail notifications (and weekly SMTP reports) from the SAN are being received reliably.

Additional Notes:

-While in my case the issue was occurring with EXIM on a Sophos UTM firewall, I believe this issue may occur with other E-mail servers or SMTP relay servers.

-Tried configuring numerous exceptions on the SMTP relay with no effect.

-Rejected e-mail messages do not appear in the mail manager, only the SMTP relay log on the Sophos UTM.

-Always test SMTP notifications on a regular basis.

Jul 182016
 

Last Friday I read online Shaw had released a new offering for their coax (cable) customers. Speeds of 150mbps down and 15mbps up. Checked out their website and found the accompanying business package (Shaw Business Internet 150).

Called up, requested a quote and pulled the trigger. As always Shaw sweetened the deal for me as I’ve been a long time customer and have quite a few additional services (phone, extra cable modem, numerous static IPs, etc…).

Had the install booked for today, just got everything setup. Here’s some initial speed tests I want to share with you:

 

Speedtest.Net test of Business Internet 150

Speedtest.Net test of Business Internet 150

Speedtest.shaw.ca test of Business Internet 150

Speedtest.shaw.ca test of Business Internet 150

 

I have to say I’m quite impressed! I actually had to do some tweaking on my firewalls IPS system to handle the bandwidth.

The residential plan offers 1TB of data per month, whereas I believe the business plan offers unlimited data.

Happy downloading!

 

Update: August 13th, 2016

I just wanted to post an update after running with this service for a while now. It’s been great, no changes in speed, and latency is great!

I have however identified one issue (observed at some client sites): When scheduled or emergency maintenance is performed on Shaw’s side, when the maintenance completes, the cable modem reports as being online, however the internet connection is lost and doesn’t come back up. A restart or power cycle is required on the Hitron modem to bring services back online. I noticed this around a month ago with a client, and found out as of 2 weeks ago it is a confirmed issue, and Shaw is working on resolving this with the Hitron modems.

Also, some users may be noticing issues with VPN connections. When packets go in/out that are larger than 1500 bytes and are fragmented, I noticed on one Hitron modem that the cable modem was dropping these fragmented packets. This is noticeable on VPN connections. Typically a power cycle temporarily resolves this issue, however it occurs again within a couple days. Shaw confirmed this was a firmware related issue and rolled back the cable modem’s firmware for that specific client and it resolved the issue. I have not seen this issue occur on my Hitron modem. To test for this issue, send a ping from the effected site towards the internet to a host using this command, or send a ping from the internet to an IP at the effected site:

ping enterhosthere -l 2000

This command will send a 2000 byte ICMP packet to a host. Typically MTUs on network are 1500, so the packet will be fragmented and should go through. If it drops and you know the destination should accept it, then you are experiencing this issue. You should place a support call, explain the issue and request a firmware downgrade. This may have been resolved by the time I posted this note.

Apr 102016
 

For those of you that use HP’s vibsdepot with VMWare Update Manager, you may have noticed that as of late you have not been able to synchronize patch definitions from the HP vibsdepot source.

I suspected this may have had something to do with the fact that in the past, the hp.com domain was being used to host these files, and with the company split, all enterprise related hosting has now moved to hpe.com

To fix this, simply log in to a vSphere client, jump to the “Admin View”, then “Download Settings” on the left. Right click on the HP related Download sources and simply update the URLs from hp.com to hpe.com and the problem is solved. After clicking on test, connectivity status updates to “Connected”.

Old URLS:

http://vibsdepot.hp.com/index.xml

http://vibsdepot.hp.com/index-drv.xml

New URLS:

http://vibsdepot.hpe.com/index.xml

http://vibsdepot.hpe.com/index-drv.xml

VMWare HPe vibsdepot

VMWare HPE vibsdepot

I later noticed this “notice” on HPE’s website (http://vibsdepot.hpe.com/):

HPE vibsdepot notice

HPE vibsdepot notice

Mar 262016
 

An issue that’s been making me rip my hair apart for some time… And a fix for you experiencing the same.

 

Equipment:

HP Proliant DL360 G6 Server (with a P800 Controller) running Server 2012 R2 and Backup Exec 2014

HP MSL-2024 Tape Library with a single HP SAS LTO-6 Tape Drive

 

Symptoms:

-After a clean restart, a backup job completes successfully. Subsequent jobs fail until server or services restarted.

-While the initial backup does complete, errors/warnings can be seen in the adamm.log and the Event Viewer even when successful.

-Subsequent backups failing report that the device is offline. The Windows Device Manager reports everything is fine.

-Windows Server itself does not report any device errors whatsoever.

 

Observations:

[5648] 03/05/16 07:50:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Device error 1167 on “\\.\Tape0”, SCSI cmd 0a, 1 total errors
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle 214, error 0
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Retry logic was engaged on device: HP       Ultrium 6-SCSI
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Original settings restored on device: HP       Ultrium 6-SCSI

Event ID 58053
Backup Exec Alert: Storage Error
(Server: “WhatsMySRVRname”) The device state has been set to offline because the device attached to the computer is not responding.

Ensure that the drive hardware is turned on and is properly cabled. After you correct the problem, right-click the device, and then click Offline to clear the check mark and bring the device online.

[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 17, new handle ffffffff, error 32
[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[09968] 03/05/16 01:22:07.867 PvlSession::DismountMedia( 0, 0, 0 )
Job = {JOBHEXNUMBERZZZZZZ} “ServerBackup-Full”
Changer    = {CHANGERZZZZ} “Robotic library 0001”
Drive      = {MYBACKUPDRVXZZZZZ} “Tape drive 0001”
Slot       = 13
Media      = {MEDIAZIDZZZZ} “BARCODEID”
ERROR = 0xE0008114 (E_PVL_CHANGER_NOT_AVAILABLE)

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

Event ID 1000
Faulting application name: wmiprvse.exe, version: 6.3.9600.17415, time stamp: 0x54505614
Faulting module name: MSVCR110.dll, version: 11.0.51106.1, time stamp: 0x5098826e

 

Research:

I spent a ton of time researching this… Old support threads were pointing me in all different directions, most of the threads being old, mentioning drivers, etc… Initially I thought it was hardware related, until through testing I got the gut feeling it was software related. There was absolutely no articles covering Backup Exec 2014 running on Windows Server 2012 R2 with this specific issue.

Tried a bunch of stuff, including swapping the P800 controller, for another HP P212. While it didn’t fix the issue, I gained some backup speed! 🙂

Updating the HP software (agents, providers, HP SMH, WBEM) had no effect.

Disabling the HP providers, and disabling the HP Monitoring, Insight, Management services had no effect whatsoever. Tried different firmware versions, also tried different drivers for the Library and Tape drive, no effect. Tried factory resets, no effect. Tried Library and Tape tools, all tests passed.

Disabled other monitoring software we have in place to monitor software/hardware on clients servers, no effect.

 

Resolution:

-Uninstalled the HP WBEM Providers and Agents.

-Added a “BusyRetryCount” 32-bit DWORD value of 250 (decimal) to the “Storport” key under “Device Parameters” in all the Tape Library and Tape Drive Registry entries. Example:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Changer&Ven_HP&Prod_MSL_G3_Series\5&334e8424&0&000500\Device Parameters\Storport]
“BusyRetryCount”=dword:000000fa

This needs to be added to ONLY and ALL the tape device entries (under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\) for the Tape Library and Tape Drives. You probably will have to create “Storport” key under the devices “Device Parameters” key.

After doing this, the backups run consecutively with absolutely no issues. The event log is CLEAN, and Adamm.log is clean, and the “Faulting application name: wmiprvse.exe” errors in the event log no longer occur.

Fixed!

 

Additional Notes:

-Both “fixes” were applied at the same time. I believe the WBEM providers/agent caused the Event ID 1000 errors on WMIPRVSE.exe. While the registry keys alone may have possibly resolved the backup issues, I believe there still would have been an underlying issue with WMIPRVSE.exe faulting that could have other consequences.

-I do not believe the original installation of the HP WBEM providers caused the issue, I have a feeling a subsequent Windows Update, Backup Exec update, other module update, or an update to the HP software may have caused the issue to occur at a later time than original install. I do remember we didn’t have an issue with the backups for months, until one day it started occurring.

-I will be re-installing the HP providers and agents at a later time. I will be uninstalling all of them, and re-installing from scratch the latest versions. I will post an update with my results.

-There is a chance the registry key is needed for the HP software to co-exist with Backup Exec backups for this configuration.

-There is a chance that the registry key isn’t needed if you never load the HP software.

Mar 052016
 

Just wanted to write about a couple issues that I’ve seen occur after migrating customers from Microsoft Small Business Server to Microsoft Server 2012 R2 (with Essentials Experience role), with Microsoft Exchange 2013 On-Premise.

Migration documents that were available were used at the time of migration. We still observed these issues after following. Please note that since these issues occurred, migration documents may have been updated.

Just an FYI: I provide Small Business Server Migration and consulting services. For more information, click here!

Windows SBS Company Web Connector ServerName

After the migration was complete we started seeing event logs pertaining to a “Windows SBS Company Web Connector ComputerName”, often mentioning it’s referencing an object in the Deleted Items container, also referencing the connector is not being activated due to no routes available.

Event ID: 5016

Microsoft Exchange could not discover any route to connector CN=Windows SBS Company Web Connector SERVERNAME,CN=Connections,CN=Exchange Routing Group (XXXXXXXXXXXXXXXXX),CN=Routing Groups,CN=Exchange Administrative Group (XXXXXXXXXXXXXXXXX),CN=Administrative Groups,CN=First Organization,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=domainname,DC=local in the routing tables with the timestamp 3/5/2016 1:55:34 PM. This connector will not be used.  Total source server count: 1; unknown source server count: 1; unrouted source server count: 0; non-active source server count: 0.

What is happening is that a “Foreign Connector” is still present in the Active Directory and Exchange Configuration for the SBS environments SharePoint e-mail to web feature. In my client’s environments SharePoint is no longer used, so it is safe for us to delete this connector. Only delete this connector if you know you’re not using it (it is used for SharePoint e-mail to web feature).

To list and get information on the orphaned connector, open Exchange Powershell and run:

Get-ForeignConnector | Format-List

To delete the orphaned connector, enter the following command in Exchange Powershell and update the connector name to match the name shown in the command above:

Remove-ForeignConnector “Windows SBS Company Web Connector SERVERNAME”

This will remove the orphaned connector and clean up these errors from occurring. You can also remove the connector using ADSIEDIT, however I prefer to use ADSIEDIT as a last resort, and find this method not only easier, but cleaner.

SMTP rejected a (P1) mail from ‘[email protected]

Initially post-migration we started observing this event on the server. Mail flow was not affected and everything was functioning properly.

Event ID: 1025

SMTP rejected a (P1) mail from ‘[email protected]’ with ‘Client Proxy EXCHSRVR’ connector and the user authenticated as ‘HealthMailboxXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX’. The Active Directory lookup for the sender address returned validation errors. Microsoft.Exchange.Data.ProviderError

Additionally, on our corporate firewall (that provides anti-spam), we would observe numerous undeliverable bouncebacks on outgoing messages to the e-mail address “[email protected]” with the subject “Inbound proxy probe”. These messages occur on exact 5 minute intervals continuously.

Using Exchange powershell to view the active Health Mailboxes, we see that each of these bounce backs are being stored on a particular health mailbox. Essentially the mailbox will continue to grow. Due to the growth, this issue needs to be resolved so the mailbox doesn’t continue to grow in size.

Numerous things can cause this, however in our case looking at transport logs, it is seen that a HealthMailbox is sending e-mail to another HealthMailbox but using an incorrect e-mail address. The Health Mailboxes on the Exchange server have “domain.com” e-mail addresses, while according to the transport logs, the e-mails are being sent to “domain.local”.

Something got mixed up, either with provisioning the Exchange E-Mail address policies, or the domain configured as “default domain”. Either way, Exchange is configured and running, so I wanted to correct this in a manor that would have minimal consequences or changes to the system.

To correct this issue, we need to go in to ADSI edit and modify the “ProxyAddresses” value for the HealthMailbox. Note that any type of mailbox can have numerous aliases and a single default alias. Inside of ADSIEdit for “ProxyAddresses” the value/format is case-sensitive, and uppercase SMTP configures default e-mail address, while lowercase smtp configures alternative aliases. An example value: “SMTP:[email protected]” for default, or “smtp:[email protected]” for an alternative alias.

Identifying the account from the event log (note the XXXXXXXXXXXXXXXX in the example), we found the account in the Monitoring Mailboxes container inside of ADSIEdit. We right-clicked on the specific HealthMailbox account, went to properties, and found the “ProxyAddresses” value. We then proceeded to create a new alias by clicking edit, using lowercase smtp and created “smtp:[email protected]” and added it to the list, we did not modify or delete any existing values. All we did is create an alternative alias.

So now the Health Mailbox is receiving e-mail for both “@domain.com”, and “@domain.local”. Immediately the bounce-backs stopped, and event logs disappeared.

PLEASE NOTE: For this fix to work, you MUST confirm that the issue is due to the domain .com and .local mismatch. I’m not quite sure, but this issue may also occur after changing the default domain, or default e-mail address policies, in which case you still could use this technique to resolve the issue.

Hope this helps some of you, cheers!