Fix Blue Screen of Death STOP error on Veeam Backup Server

We recently fixed a very nasty server crash on a Veeam Backup Server that took months to resolve. Here's a summary of the environment:

  1. Veeam Backup Server. HPE DL360e G8 with two CPUs, 64GB of RAM running Windows Server 2016 and Veeam 9.5 with Update 4. An HPE H221 SAS Host Bus Adapter was used to connect the server to the MSL 4048 Tape Library.
  2. Tape Library. HPE MSL 4048 with two LTO7 SAS drives.

The server had the following problems:

  1. The server would crash every few days with a STOP: 0x0000000A (0x0000000000000000, 0x00000000000000FF, 0x0000000000000000, 0xFFFFF800D3C7D179))
  2. System Event Error 129. Source: lsa_sas2 Reset to device, \Device\RaidPort3, was issued.
  3. System Event Error 11. Source: lsa_sas2 The driver detected a controller error on \Device\RaidPort3.

The lsa_sas2 errors would only happen when Veeam was backing up data to the MSL 4048 tape library. The STOP errors would happen when a Veeam Backup job was running either to disk, tape or sometimes disk and tape. We performed the following troubleshooting steps:

  1. Update the firmware on the Backup Server.
  2. Update the drivers on the Backup Server.
  3. Update the firmware on the MSL 4048 Tape Library.
  4. Update the drivers for the MSL 4048 Tape Library.
  5. Replace the Motherboard.
  6. Replace the Memory.
  7. The hard drives were not replaced because they were only two months old.

Unfortunately the STOP and lsa_sas2 errors continued. We tried a complete reload of Windows Server 2016. This DID resolve the STOP error, but the lsa_sas2 errors still continued. HPE sent out two more items:

  1. Riser card that connects the motherboard to the expansion cards.
  2. New SAS Cable.

After these items were replaced the lsa_sas2 errors still continued during the tape backups.  When we replaced the cables, we did notice that each SAS tape drive had two connections:  Channel A and Channel B.  A SAS Cable was plugged into both Channel A and Channel B on each tape drive.  The other item we noticed is that the Tape Library and Drives were showing up twice in the Windows Device Manager (i.e. two Tape Libraries and Four Drives).  Evidently, for SAS Tape Drives that are installed in the MSL 4048 Tape Library ONLY Channel A should be connected.  Channel B is for testing ONLY.  We shut down the server and tape library, disconnected Channel B on both tape drives, re-ran the tape backup jobs and the lsa_sas2 went away!  In retrospect we should have purchased a new server instead of trying to fix the old server, but it still wouldn't have fixed the lsa_sas2 errors until the Channel B SAS connectors were removed from both LTO 7 tape drives.  We put a label on the back of the MSL 4048 to connect only Channel A so we won't repeat this mistake.

Servers

Get updated on the latest Information Technology news, Cybersecurity, Information Technology Trends, and recent real-world troubleshooting experiences.

SUBSCRIBE NOW!