HPE Synergy
1758360 Members
2705 Online
108868 Solutions
New Discussion

Synergy Server connections to vmware iSCSI and LACP

 
vitg
Senior Member

Re: Synergy Server connections to vmware iSCSI and LACP

The general rule is not to mix LACP with iSCSI. It is supported by some storage vendors on particular systems.

LACP introduces an extra layer of complexity between the Initiator and the target, and a range of variables you need to account for.

Generally speaking MPIO does increase throughput and will behave more consistently during a link failure, take the below:

On vmware environments, using lacp introduces a greater potential for APD or PDL as the time the network takes to re-converge following a link failure depends on the LACP timers themselves and other features such as Spanning tree.

If the Host, interconnect modules, switches and the Storage appliance are all using lacp, even with fast timers, that's potentially a theoretical minimum of 4 seconds just for LACP, then you have spanning tree on top.

Usually this results in a host not receiving PDL or other SCSI sense codes and it will eventually timeout and mark the datastore as down, requiring manual intervention.

From experience, this is the real world behavior. The Initiator looses a link that happens to be carrying iSCSI traffic, then the delay from re-convergance of LACP and Spanning tree knocks the datastore offline, if there are running vm's it can require force powering them down before the host will even mark the datastore as active again or even a reboot of the host itself.

We have synergy in our environment configured with portchannels with fallback to LACP (SoNIC) on our F32 modules, the blades themselves have multiple ports per mezzanine, across multiple bays and we don't have any lacp at the blade/host level. MPIO is used with round-robin to distribute across the bound iSCSI adapter ports. We usually see 1-4 dropped I/O requests during a loss of path, but other IO sessions on other links are not impacted.

With LACP, the throuput didn't improve (expected) and behavior during a loss of link almost always lead to APD or PDL on the host impacted.

This didn't account for iSER at the time we tested it.