Mid-2018 me and my friend (back then dearest colleague Andres Villalobos) ran into an issue with Linux teamd drivers and Cisco Nexus 9000 switches when using LACP vPC (virtual port channels) while doing validation and verification for the baseline control infrastructure for the Large Synoptic Survey Telescope (Update 2020: now Rubin Observatory). This issue triggered a traffic-blackhole scenario on the server side after reloading one of the Nexus switches.
The PoC setup was the following:
- A basic CLOS topology with 2x leafs (Nexus 93180TC-EX) and 2x spines (Nexus 9504) running Cisco ACI 3.2(2I).
- An LACP vPC with standard parameters from the leaf pair towards the server, 10GBASE-T.
- A Dell PowerEdge R430 running CentOS7 with a Kernel 3.10.X (I can’t remember the exact version at the moment) and teamd version 1.27.
- A teamd LACP configuration for 2x physical interfaces with standard parameters towards the leaf switches, 10GBASE-T on-board.
The issue is triggered after the reload of a leaf switch part of the vPC. When the server detected the interface being “up” (i.e. the ethernet link being successfully negotiated), teamd immediately started to forward traffic over the link facing the switch we just reloaded without waiting for any LACP control packets from the switch side, and on the switch side the switchport was on a suspended state due to a hardcoded behavior in Cisco ACI called vPC auto-recovery which for 240 seconds (by default) waits for its peer (the other leaf switch) to be stable before forwarding traffic to avoid split-brain scenarios. Additionally once the leaf moved the switchport from suspended to operational, teamd was not able to react accordingly and balance the traffic between the server ports in the bundle.
The issue was reported by Andres here https://github.com/jpirko/libteam/issues/37 and for RHEL there was already a workaround as of December 2016 https://access.redhat.com/solutions/2362921 but associated to switches running NX-OS, not Cisco ACI.
This “issue” was actually 2 separate issues, first with the teamd driver not playing nice and forwarding traffic over a port of the LACP bundle before confirmation from the switch that such link was ready, and Cisco ACI having the vPC auto-recovery feature enabled by default and hardcoded as confirmed by Cisco TAC on August 2018. On Nexus switches running NX-OS instead of ACI you can disable this feature, which could have aided in the troubleshooting process.
Above you can find the answer from Cisco TAC; I removed the engineer’s name since she was very patient and helpful during the
almost-debugging session which took over 8 hours, we found the issue and a workaround ourselves but kudos to her, not her fault (I’m talking to you-all “i hate ACI and SDN” old-wise men).
The workaround we implemented was the bonding driver for LACP in mode 4, which works just fine in all the scenarios teamd didn’t, that is, shutting down LACP links part of the vPC, shutting an entire leaf or reloading it.
Hopefully both the teamd and Cisco will fix their corresponding issues in the near future.