…and after the amount of time I’ve wasted getting XenServer to play nicely with LIO iSCSI failover I tend to agree.
One oddity of Xen / XenServer’s storage subsystem is that it identifies iSCSI storage repositories via a calculated SCSI ID rather than the iSCSI Serial - which would be the sane thing to do.
Citrix’s less than ideal take on dealing with SCSI ID changes is for you to take your VMs offline, disconnected the storage repositories, recreate them, then go through all your VMs and re-attach their orphaned disks hoping that you remembered to add some sort of hint as to what VM they belong to, then finally wipe the sweat and tears from your face.
From CTX11641 - ‘How to Identify If SCSI Storage Repository has Changed SCSI IDs’:
“The SCSI ID of the logical unit number (LUN) changed. When this happened, the iSCSI storage repository became unplugged after a XenServer reboot.”
“To correct the issue you must recreate a PBD with the entry to reflect the right SCSI ID.”
A big thank you to Nicholas A. Bellinger from the Kernel SCSI mailing list who helped me a lot in this thread where he explained:
“The Company ID, VSI, and VSIE are generated by LIO based upon the current vpd_unit_serial configfs attribute value. So as long as vpd_unit_serial is persistent, and the same value for backend devices across export failover to different nodes, Xen will always see the same EVPD information.”
An example SCSI ID of 0x6001405bff3f42a49d84cfcb64e2b933 would thus be comprised of:
NAA 6, IEEE Company_id: 0x1405
Vendor Specific Identifier: 0xbff3f42a4
Vendor Specific Identifier Extension: 0x9d84cfcb64e2b933
In addition to the vpd_unit_serial we found that the iblock number must also remain the same between failovers.
If you’re using Corosync / Pacemaker for your target failover the vpd_unit_serial and iblock number must both be set in the iSCSILogicalUnit OCF provider: