Nov 232024
 

In some scenarios, you may encounter an issue where the Veeam WAN Accelerator service fails to start.

This will cause backup and backup copy jobs to fail that use the Veeam WAN Accelerator, which is how this issue is usually first diagnosed.

In this post I’ll explain the problem, what can cause it, and how to resolve the issue.

The Problem

When this issue occurs, and when a Backup or Backup Copy job runs, it will usually first fail with the following error from the Veeam console:

Error: The RPC server is unavailable. RPC function call failed. Function name: [InvokerTestConnection]. Target machine: [IP.OF.WAN.ACC:6464].

Failed to process (VM Name).

See below for a screenshot of the example:

Veeam Backup Copy Job Failing due to Veeam WAN Accelerator Service failing

From the error above, the next step is usually to check the logs to find out what’s happening. Because this Backup Copy job uses the WAN accelerator, we’ll look at the log for the Veeam WAN Accelerator Service.

Svc.VeeamWANSvc.log

[23.11.2024 11:46:24.251] <  3440> srv      | RootFolder = V:\VeeamWAN
[23.11.2024 11:46:24.251] <  3440> srv      | SendFilesPath = V:\VeeamWAN\Send
[23.11.2024 11:46:24.251] <  3440> srv      | RecvFilesPath = V:\VeeamWAN\Recv
[23.11.2024 11:46:24.251] <  3440> srv      | EnablePerformanceMode = true
[23.11.2024 11:46:24.255] <  3440> srv      | ERR |Fatal error
[23.11.2024 11:46:24.255] <  3440> srv      | >>  |boost::filesystem::create_directories: The system cannot find the path specified: "V:\"
[23.11.2024 11:46:24.255] <  3440> srv      | >>  |Unable to apply settings. See log for details.
[23.11.2024 11:46:24.255] <  3440> srv      | >>  |An exception was thrown from thread [3440].
[23.11.2024 11:46:24.255] <  3440> srv      | Stopping service...
[23.11.2024 11:46:24.256] <  3440> srv      | Stopping retention thread...
[23.11.2024 11:46:24.257] <  4576>          | Thread started.  Role: 'Retention thread', thread id: 4576, parent id: 3440.
[23.11.2024 11:46:24.258] <  4576>          | Thread finished. Role: 'Retention thread'.
[23.11.2024 11:46:24.258] <  3440> srv      | Waiting for a client('XXX-Veeam-WAN:6165')
[23.11.2024 11:46:24.258] <  3440> srv      | Stopping server listening thread.
[23.11.2024 11:46:24.258] <  3440> srv      |   Stopping command handler threads.
[23.11.2024 11:46:24.258] <  3440> srv      |   Command handler threads have stopped.
[23.11.2024 11:46:24.258] <  4580>          | Thread started.  Role: 'Server thread', thread id: 4580, parent id: 3440.

In the Veeam WAN Service log file above, you’ll note a fatal error where the service is unable to find the paths configured, which caused the service to halt and stop.

In some configurations, iSCSI is used to access Veeam backup repository storage hosted on iSCSI targets. Furthermore, in some iSCSI configurations special vendor plugins are used to access the iSCSI storage, and configure items like MPIO (multipath input output), which can take additional time to initialize.

In this scenario, the Veeam WAN Accelerator Service was starting before the Windows iSCSI service, MPIO Service, and Nimble Windows Connection Manager plugin had time to initialize, resulting in the WAN accelerator failing because it couldn’t find the directories it was expecting.

The Solution

To resolve this issue, we want the Veeam WAN Accelerator Service to have a delayed start on the Windows Server operating system bootup sequence.

  1. Open Windows Services
  2. Select “Veeam WAN Accelerator Service”
  3. Change “Startup Type” to “Automatic (Delayed Start)”
  4. Click “Apply” to save, and then click “Start” to start the service.

As per the screenshot below:

Veeam WAN Accelerator Service Properties

The Veeam WAN Accelerator Service will now have a delayed start on system bootup, allowing the iSCSI initiator to establish and mount all iSCSI Target block devices, before starting the WAN service.

Oct 022022
 
Veeam-SQL

So, there’s a common problem where when performing a backup, you’ll see it fail with Veeam Unable to Truncate Microsoft SQL Server transaction logs.

This is usually due to permission problems either with the account used for guest processing, or with permissions inside of your SQL database. Typically in most cases this can be resolved by referencing the appropriate Veeam KB article which outlines the permissions required for proper guest processing of Microsoft SQL servers.

However, in some rare cases you may have everything configured properly, however the backup may continue to present these warnings with where it’s unable to truncate Microsoft SQL Server transaction logs.

The Problem

I recently deployed an SQL Server in a domain, and of course made sure to setup the proper backup procedures as I’ve done a million times.

However, when performing a backup, the backup would present a warning with the following message:

Error message on Veeam backup, Unable to Truncate Microsoft SQL Server transaction logs
Veeam Backup Warning – Unable to Truncate Microsoft SQL Server transaction logs.
Unable to truncate Microsoft SQL Server transaction logs. Details: Failed to call RPC function 'Vss.TruncateSqlLogs': Error code: 0x80004005. Failed to invoke func [TruncateSqlLogs]: Unspecified error. Failed to process 'TruncateSQLLog' command. Failed to logon user [ReallyLongDomainName\Admin-Account]. Win32 error:The user name or password is incorrect. Code: 1326.

This was very odd as I configured everything properly, and even confirmed it when referring the Veeam KB listed above in this post.

So I decided to look at this as if it was something different, something with credentials, or a different problem.

I noticed that in this specific customer environment, that their FQDN for their domain was so long, that the NETBIOS domain name did not equal their FQDN domain name.

In this example, the following was observed:

FQDN: LongCompanyName.com
NETBIOS DOMAIN: LCNDOMAIN

Due to the length of the domain, they shortened the NETBIOS domain with abbreviated letters.

When configuring the Veeam credentials for guest processing, one would assume that when using the “AD Search” function, it would have pulled the “LCNDOMAIN\BackupAdminProcessing” account, however when using the check feature, it actually created an entry for “LongCompanyName\BackupAdminProcessing”, which was technically incorrect as it didn’t match the SAM logon format for the account.

The Fix

Because of the observation noted above, I created a manual credential entry for “LCNDOMAIN\BackupAdminProcessing”, reconfigured the backup job to use those new credentials, and it worked!

The issue is because when using the AD search function in the credential manager, Veeam doesn’t translate and pull the NETBIOS domain, but uses the SAM logon format and assumes the UPN Domain matches the NETBIOS domain name.

While this may hold true in most scenarios, there may be rare situations (like above) where the NETBIOS domain name does not match the domain used in the UPN suffix.

May 112018
 
Veeam Logo

This morning I noticed that CentOS 7.5 (1804) was available for upgrade via yum. After upgrading my CentOS instance from 7.4 to 7.5 on Microsoft Azure, I noticed that when running a backup using the Veeam Linux Agent, the system would crash and become completely unresponsive. I would have to manually restart the OS.

Upon reboot, I analyzed the console messages log and ran the backup again to see what was happening.

Here’s a look at my /var/log/messages:

May 11 07:24:46 HOSTNAME kernel: Request for unknown module key 'Veeam Software AG: 9d063645550b483bec752cb3c0249d5ede714b3e' err -11
May 11 07:24:46 HOSTNAME kernel: veeamsnap: loading out-of-tree module taints kernel.
May 11 07:24:46 HOSTNAME kernel: WARNING: module 'veeamsnap' built without retpoline-enabled compiler, may affect Spectre v2 mitigation
May 11 07:24:46 HOSTNAME kernel: veeamsnap: module verification failed: signature and/or required key missing - tainting kernel
May 11 07:24:46 HOSTNAME kernel: veeamsnap: applying kernel_stack fix up
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init Loading
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init Version: 2.0.0.400
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init Author: Veeam Software AG
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init licence: GPL
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init description: Veeam Snapshot Kernel Module
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init zerosnapdata: 1
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init debuglogging: 0
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init snapstore enabled
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init start. container_alloc_counter=0
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init start. container_sl_alloc_counter=0
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init start. mem_cnt=0
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init start. vmem_cnt=0
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:ctrl_pipe_init .
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init Module major=243
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:blk_direct_bioset_create Specific bio set created.
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:blk_redirect_bioset_create Specific bio set created.
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:blk_deferred_bioset_create Specific bio set created.
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:snapimage_init .
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:snapimage_init Snapimage block device was registered. major=252
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init end. container_alloc_counter=0
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init end. container_sl_alloc_counter=0
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init end. mem_cnt=1
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:veeamsnap_init end. vmem_cnt=0
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:ctrl_open file=0xffff95e4543b1800
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:ctrl_pipe_new .
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:ioctl_compatibility_flags Get compatibility flags
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:ioctl_tracking_collect Collecting tracking device:
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:tracking_collect Have not device under CBT.
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:tracking_add Adding. dev_id=8:1
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:tracker_Create dev_id 8:1
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:tracker_Create SectorStart    =0x800
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:tracker_Create SectorsCapacity=0xfa000
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:tracker_cbt_start .
May 11 07:24:46 HOSTNAME kernel:    veeamsnap:cbt_map_create CBT map create.
May 11 07:24:47 HOSTNAME kernel: general protection fault: 0000 [#1] SMP
May 11 07:24:47 HOSTNAME kernel: Modules linked in: veeamsnap(OE) nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_security ext4 mbcache jbd2 dm_mirror dm_region_hash dm_log dm_mod sb_edac iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr i2c_piix4 sg hv_utils i2c_core ptp pps_core hv_balloon ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi ata_piix hv_storvsc hv_netvsc libata scsi_transport_fc hid_hyperv hyperv_keyboard scsi_tgt hyperv_fb crct10dif_pclmul crct10dif_common crc32c_intel hv_vmbus floppy serio_raw
May 11 07:24:47 HOSTNAME kernel: CPU: 1 PID: 1712 Comm: Lpb Server thre Tainted: G           OE  ------------   3.10.0-862.2.3.el7.x86_64 #1
May 11 07:24:47 HOSTNAME kernel: Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
May 11 07:24:47 HOSTNAME kernel: task: ffff95e447378000 ti: ffff95e45cbe0000 task.ti: ffff95e45cbe0000
May 11 07:24:47 HOSTNAME kernel: RIP: 0010:[]  [] page_array_memset+0x4d/0xa0 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: RSP: 0018:ffff95e45cbe3d60  EFLAGS: 00010246
May 11 07:24:47 HOSTNAME kernel: RAX: 0000000000000000 RBX: ffff95e449615200 RCX: 0000000000000200
May 11 07:24:47 HOSTNAME kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff187288716000
May 11 07:24:47 HOSTNAME kernel: RBP: ffff95e45cbe3d60 R08: ffffffffbe274fef R09: ffff95e460affa60
May 11 07:24:47 HOSTNAME kernel: R10: ffff95e460affa60 R11: 0000000000000000 R12: 0000000000000001
May 11 07:24:47 HOSTNAME kernel: R13: 00000000000fa000 R14: 0000000000000000 R15: 0000000000800001
May 11 07:24:47 HOSTNAME kernel: FS:  00007f336f7fe700(0000) GS:ffff95e495640000(0000) knlGS:0000000000000000
May 11 07:24:47 HOSTNAME kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 11 07:24:47 HOSTNAME kernel: CR2: 0000000000738df0 CR3: 00000002d3afc000 CR4: 00000000001406e0
May 11 07:24:47 HOSTNAME kernel: Call Trace:
May 11 07:24:47 HOSTNAME kernel: [] cbt_map_allocate+0x6e/0x160 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: [] cbt_map_create+0x73/0x100 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: [] tracker_cbt_start+0x5a/0xc0 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: [] tracker_Create+0x16a/0x650 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: [] tracking_add+0x2e0/0x450 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: [] ioctl_tracking_add+0x6c/0x170 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: [] ctrl_unlocked_ioctl+0x4e/0x60 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: [] do_vfs_ioctl+0x350/0x560
May 11 07:24:47 HOSTNAME kernel: [] ? __sb_end_write+0x31/0x60
May 11 07:24:47 HOSTNAME kernel: [] ? vfs_write+0x182/0x1f0
May 11 07:24:47 HOSTNAME kernel: [] SyS_ioctl+0xa1/0xc0
May 11 07:24:47 HOSTNAME kernel: [] system_call_fastpath+0x1c/0x21
May 11 07:24:47 HOSTNAME kernel: Code: 01 49 89 f9 48 0f af c2 49 8b 79 10 ba 00 10 00 00 40 f6 c7 01 75 47 40 f6 c7 02 75 51 40 f6 c7 04 75 2b 89 d1 c1 e9 03 83 e2 07  48 ab 74 0e 41 89 c8 83 c1 01 39 d1 42 88 34 07 72 f2 49 83
May 11 07:24:47 HOSTNAME kernel: RIP  [] page_array_memset+0x4d/0xa0 [veeamsnap]
May 11 07:24:47 HOSTNAME kernel: RSP 
May 11 07:24:47 HOSTNAME kernel: ---[ end trace 96b51a664f4baef9 ]---

It appeared the veeamsnap module was causing a protection fault with the kernel, causing the system to crash.

In an effort to troubleshoot, I uninstalled and reinstalled Veeam and tried rebuilding the kernel module, however the issue still persisted. After some searching, I came across these two posts:

https://forums.veeam.com/veeam-agent-for-linux-f41/veeam-agent-for-linux-and-rhel-7-5-crash-t50170.html

https://www.veeam.com/kb2569

According to the KB, the veeamsnap module isn’t compatible with kernel version 3.10.0-862.

Checking my system after upgrading CentOS 7.5:

[root@HOSTNAME ~]# uname -a
Linux HOSTNAME.somedomain.com 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@HOSTNAME ~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

 

Essentially, as of right now the Veeam Agent for Linux is not yet supported on CentOS 7.5, RHEL 7.5, or Oracle RHCK 7.5. If you’re running any of these, hold off and do not install Veeam Agent for Linux. If you are scheduling an upgrade, do not perform upgrade unless you want to break your backup. It sounds like this should be fixed in a future update.

Apr 292018
 
Directory Services Restore Mode

Running Veeam Backup and Replication, a Microsoft Windows Server Domain Controller may boot in to safe mode and directory services restore mode.

About a week ago, I loaded up Veeam Backup and Replication in to my test environment. It’s a fantastic product, and it’s working great, however today I had a little bit of an issue with a DC running Windows Server 2016 Server Core.

I woke up to a notification that the backup failed due to a VSS snapshot issue. Now I know that VSS can be a little picky at times, so I decided to restart the guest VM. Upon restarting, she came back up, was pingable, and appeared to be running fine, however the backup kept failing with new errors, the event log was looking very strange on the server, and numerous services that were set to automatic were not starting up.

This specific server was installed using Server Core mode, so it has no GUI and is administered via command prompt over RDP, or via remote management utilities. Once RDP’ing in to the server, I noticed the “Safe Mode” branding on each corner of the display, this was very odd. I restarted the server again, this time manually trying to start Active Directory Services manually via services.msc.

This presented:

Event ID: 16652
Source: Directory-Services-SAM
General Description: The domain controller is booting to directory services restore mode.

Screenshot:

Directory Services Restore Mode

The domain controller is booting to directory services restore mode.

 

This surprised me (and scared me for that matter). I immediately started searching the internet to find out what would have caused this…

To my relief, I read numerous sites that advise that when an active backup is running on a guest VM which is a domain controller, Veeam activates directory services restore mode temporarily, so in the event of a restore, it will boot to this mode automatically. In my case, the switch was not changed back during the backup failure.

Running the following command in a command prompt, verifies that the safeboot switch is set to dsrepair enabled:

bcdedit /v

To disable directory services restore mode, type the following in a command prompt:

bcdedit /deletevalue safeboot

Restart the server and the issue should be resolved!