Catalyst 9500 and StackWise Virtual

Hi chaps. Fisrt of all, I would like to apologize for lack of activity on this blog. The company where I work was hit by NotPetya ransomware last summer. As a result, we worked absolutely crazy hours for many months to recover all our services and secure our network. I simply had no spare time to contribute to this blog. Anyway, things are much more stable and steady now, so I will try to get back to my hobbies.

Today I would like to give you a brief overview of StackWise Virtual technology, which Cisco has introduced in Denali 16.3.3 IOS-XE.

Originally, only Catalyst 3850 48XS switching platform supported this feature. At the moment of writing of these notes, Cisco announced support of the feature across all Catalyst 9500 series.

So, what is it then?

StackWise Virtual is basically a new name for well-known Virtual Switching System (VSS) feature. Cisco simplified VSS and named it differently. This is how SWV was born 😀

First of all, let’s cover main components of StackWise Virtual or Virtual Switching System technologies. From now on I will refer to both technologies as SWV (this post is about SWV after all)

  • SWV Domain – this is basically an ID of virtual entity, can be anything from 1 to 255, but both switches must have an identical ID, or SWV won’t be formed;
  • SWV Link – logical link that interconnects switches in SWV Domain. It is used to synchronize stateful information between Active and Hot-Standby switches, as well as forward data. SWV Link can be made of multiple physical interfaces, normally 2 or 4 (up to 8). It uses Port-Channel 128, which is reserved for this use case. This link is solely used by NSF and SSO protocols. All data sent over this link is being encapsulated using 64-bytes long SWV Header (SVH).
  • SWV Dual Active Detection Link – this link is required to detect Dual Active scenario, when SWV Link fails, but both switches are alive. If such condition occurs both switches take over Active Supervisor’s role. This is bad, ok? Hence, a dedicated link is used to provide heartbeat functionality. Note! Latest IOS-XE versions support Dual Active Detection over ePAgP MEC (see below).

SWV combines two (and only two) physical switches into one single logical entity. Hence the new name – StackWise Virtual. It literally looks like two switches have been stacked using fiber cables instead of Cisco’s StackWise-480 option. One switch becomes Active Supervisor, which is responsible for all Management and Control functions (L2 and L3 protocols). Both switches perform forwarding functions though. One of the biggest differences as opposed to standard StackWise-480, is ring’s bandwidth. Cisco StackWise-480 technology on Catalyst 3850 and Catalyst 9300 switches offers 480Gbps of ring’s bandwidth (speed of backplane). It’s practically impossible to achieve the same with SWV. For example, Catalyst C9500-24Q switch has 24 40Gbps ports. It is very likely that 2 or maximum 4 ports will be allocated for SWV link. This results into maximum virtual ring bandwidth of 160Gbps.

In case of classic EtherChannels (and StackWise-480), switches consult hash functions to decide which port/interface will be used to forward the traffic. It is not unusual for traffic to arrive on one switch and to be forwarded via the port on another switch – 480Gbps backplane capacity is what makes it latency-free. This behavior requires some improvement with SWV to make sure switches avoid sending data to remote switch over SWV Link. Multichassis EtherChannels (MEC) helps to overcome this problem.

In case of MEC, switch tries to send data using one of the local interfaces that are members of the same MEC. That is, hashing functions are still consulted, but scope is limited to local interfaces. If MEC consists of two physical interfaces (one per switch), then each switch will send data over its local interface. Alternatively, if MEC consists of four physical interfaces (two per switch, like on the picture above), then hash functions will load balance traffic across two local interfaces. That is, SWV Link will not be used to forward data towards Dual-Homed devices. In case of topology that is shown bove, if Switch on the left receives traffic that will be destined to Downstream switch (TOP) it will load balance between Fo1/0/1 and Fo1/0/2 interfaces and won’t use Fo2/0/1 and Fo2/0/2 as it involves SWV Link.

Data traverses SWV Link towards remote switch when (a) all local MEC members have failed; and (b) data comes from/to Single Homed devices (not recommended).

Configuration of SWV is easy, just follow these steps on both switches to create SWV entity (example is based on Catalyst 9500 24Q-A switches):

Switch(config-stackwise-virtual)#domain 10
Switch(config)#interface range Fo1/0/23 - 24
Switch(config-if-range)#stackwise-virtual link 1
Switch#write memory

Switches will boot as a single logical entity, like with StackWise-480. If you’re familiar with VSS, you probably know that switch numbers have to be configured within VSS domain configuration. In case of SWV you will have to use standard StackWise-480 commands to manage stack, such as

  • switch num priority value
  • switch num renumber new-num

Configuration shown above creates SWV Link and enables Dual Active Detection using Enhanced PAgP MEC (default behavior, see below). It is possible to configure dedicated Dual Active Detection link, but I personally see no point. In case of 40Gbps switches, why would you waste two 40Gbps ports (one per switch) for heartbeat function? Assuming there will be at least one MEC (which is very likely to be true), no dedicated interfaces are needed for Dual Active Detection to work. ePAgP Dual Active Detection is enabled by default, but requires static definition of what MECs can be used for this feature (i.e. trusted MEC). To configure Dual Active Detection feature over a specific MEC apply the following (MEC must be in admin shutdown state):

Switch(config)#interface Port-Channel ID
Switch(config-stackwise-virtual)#dual-active detection pagp « default behavior
Switch(config-stackwise-virtual)#dual-active detection pagp trust channel-group ID 
Switch(config-stackwise-virtual)#interface Port-Channel ID
Switch(config-stackwise-virtual)#no shutdown

Instead of using hello packets, Catalyst 9500 uses new ePAgP TLVs to notify downstream switch about its presence. Downstream switch replicates this information to remote SWV member. This is done by both members of SWV Domain. In simple words, downstream switch performs a function of repeater for fast hello packets. If SWV Link fails, both switches still see each other as long as at least one trusted MEC is functional. Once Dual Active condition is detected, all ports but SWV ones on the Active switch are immediately err-disabled. Single-Homed devices connected to Active switch become unreachable until SWV is fixed.

To confirm Dual Active Detection over ePAgP is operational execute the following:

Switch#show stackwise-virtual dual-active-detection pagp
Pagp dual-active detection enabled: Yes
In dual-active recovery mode: No

Channel group 1
        Dual-Active    Partner       Partner Partner
Port    Detect Capable Name          Port    Version
Fo1/0/1 Yes            SwitchA       Fo1/1/1 1.1
Fo2/0/1 Yes            SwitchA       Fo1/1/2 1.1
No interfaces configured in the channel group

Let’s simulate Dual Active condition by unplugnig cables from both interfaces that make SWV Link. The moment you do this the following events are generated on the switch with Active Supervisor (some output is omitted for simplicity):

*Apr 27 11:10:47.167: %NIF_MGR-6-PORT_LINK_DOWN: Switch 1 R0/0: nif_mgr: Port 0, LPN 23 on front side stack link 0 is DOWN.
*Apr 27 11:10:47.167: %NIF_MGR-6-PORT_CONN_DISCONNECTED: Switch 1 R0/0: nif_mgr: Port 0, LPN 23 on front side stack link 0 connection has DISCONNECTED: CONN_ERR_PORT_LINK_DOWN_EVENT
*Apr 27 11:11:04.004: %NIF_MGR-6-PORT_LINK_DOWN: Switch 1 R0/0: nif_mgr: Port 1, LPN 24 on front side stack link 0 is DOWN.
*Apr 27 11:11:04.004: %NIF_MGR-6-PORT_CONN_DISCONNECTED: Switch 1 R0/0: nif_mgr: Port 1, LPN 24 on front side stack link 0 connection has DISCONNECTED: CONN_ERR_PORT_LINK_DOWN_EVENT
Dual-active condition detected: Starting recovery-mode,all non-SVL interfaces have been shut down
*Apr 27 11:11:05.269: %PAGP_DUAL_ACTIVE-1-RECOVERY: PAgP running on Fo1/0/1 triggered dual-active recovery: active id 68ca.e462.b700 received, expected 68ca.e462.b680
*Apr 27 11:11:05.269: SV In DUAL ACTIVE RECOVERY Process
*Apr 27 11:11:05.273: %PM-4-ERR_DISABLE: dual-active-recovery error detected on Fo1/0/1, putting Fo1/0/1 in err-disable state - this is repeated for EVERY port
*Apr 27 11:11:10.497: %RF-5-RF_RELOAD: Peer reload. Reason: EHSA standby down
*Apr 27 11:11:10.505: %IOSXE_REDUNDANCY-6-PEER_LOST: Active detected switch 2 is no longer standby

As you can see, Active Supervisor instantly detects a Dual Active scenario over ePAgP (Fo1/0/1 is MEC’s local member). Standby switch takes over Active Supervisor role, so both switches are now Active, but because Dual Active condition was also detected, original master shuts down all its interfaces. The following output confirms that DA condition was triggered by Active supervisor.

Switch(recovery-mode)#show stackwise-virtual dual-active-detection pagp
Pagp dual-active detection enabled: Yes
In dual-active recovery mode: Yes

Channel group 1
        Dual-Active    Partner Partner Partner
Port    Detect Capable Name    Port    Version
Fo1/0/1 No             None    None    N/A
No interfaces configured in the channel group Triggered by: PAgP
 Triggered on Interface: Fo1/0/1
 Triggered Time: 11:11:05.000 UTC Fri Apr 27 2018
 Received id: 68ca.e462.b700
 Expected id: 68ca.e462.b680

To recover from Dual Active condition (assuming SWV Link is operational again), you have to reboot switch that is in a recovery mode (ex-Active Supervisor). Ignore warnings that it is an Active switch and it will reboot the whole stack. This will not happen. Reload command will only reboot one switch in a split stack scenario. Once switch comes up and configuration is synchronized stack is formed again.

Note! Reloaded switch will become Standby Supervisor until whole stack is rebooted and election process selects new Active Supervisor (using configured priorities).

SWV configuration has to be removed manually. It is not removed if configuration is deleted using ‘write erase’. To get rid of SWV configuration perform the following

Switch(config)#interface range Fo1/0/23 - 24 , Fo2/0/23 - 24
Switch(config-if-range)#no stackwise-virtual link
Switch(config)#no stackwise-virtual
Switch#write memory

I hope this information was useful for you.

Leave a Reply

%d bloggers like this: