The journey to successful SD-WAN deployment

I’ve decided to put this article together as we’re approaching the end of our SD-WAN project. There are still couple challenging sites left to deploy (delay is caused by hardware delivery), but these are not going to have any negative impact on overall picture. I am pretty sure everyone in our company will agree that this project was a great success from a design and delivery perspective. We had few challenges, but managed to adjust our approach in a timely manner ensuring agility is not compromised. I’d like to say we could have done this quicker, but legal sector (at least our company!) doesn’t appreciate disruptive changes, so we had to balance.

Continue reading

Meet Ethanalyzer – Cisco NX-OS Control Plane Packet Capture

I was always under the impression that Cisco N5Ks deployed in our environment lack packet capture capabilities. Well, this is true for Data Plane, but apparently they have this feature to capture CPU-bound traffic, such as ARP, BGP, EIGRP, OSPF, ICMP, etc.

A bit of story. We’ve faced an issue with Silver-Peak SD-WAN appliances which (under the hood) use Unicast ARP frames as nexthop reachability check. Basically, they send Unicast ARP Request every 30-50 seconds and expect a response within 1 second. If ARP Response doesn’t come through, they repeat 3 times before marking nexthop unreachable and invalidating all relevant routes. Once this happens, Silver-Peak appliances fallback to discovery mode and use Broadcast ARP Request to find nexthop’s MAC address. It took us a while to understand it and of course we had to use packet capture feature on Silver-Peak appliances.

But then we engaged with Cisco TAC to see where the problem lies (Cisco Nexus, or Cisco FTD deployed in transparent mode). To my surprise Cisco TAC has used Ethanalyzer tool to capture traffic destined to N5K’s control plane (CPU). The command is

ethanalyzer local interface inbound-hi display-filter display-filter
ethanalyzer local interface inbound-low display-filter display-filter 

From N5K’s perspective CPU-bound traffic can be High Priority (inbound-hi), or Low Priority (inbound-low). High Priority includes, but is not limited to, ARP (unicast), STP, BGP, EIGRP, OSPF – anything that affects Layer 2 or Layer 3 convergence. Low Priority set includes ICMP, SSH, ARP (broadcast).

Display filters match logic available in Wireshark. If you’re familiar with the tool, you won’t have any issues filtering the output. Keep in mind, these filters control what’s sent to VTY, while in background everything is captured (some Nexus platforms support capture filters, which limit what’s captured). Cisco recommends to use display-filters on N5Ks. Here’s the amazing Wireshark Display Filters Cheatsheet created by Jeremy Stretch (packetlife). The following example will capture unicast arp traffic (inbound-hi) from endpoint with MAC address of AA:BB:CC:DD:EE:FF.

ethanalyzer local interface inbound-hi display-filter arp.src.hw_mac==aabb.ccdd.eeff

By default capture will run until it processes 10 packets, if you want to run it longer use limit-captured-frames X keyword-value pair. Set X to any number, or 0, if you want capture to run indefinitely (until CTRL+X is pressed).

Here are couple examples of real packet captures I did in our environment (IP addresses are obfuscated of course):

N5K-1# ethanalyzer local interface inbound-low display-filter eth.src==001B.BC11.5B5C limit-captured-frames 0
 Capturing on inband
 2020-12-04 00:46:11.454501 00:1b:bc:11:5b:5c -> ff:ff:ff:ff:ff:ff ARP Who has  Tell
 2020-12-04 00:46:12.586073 -> ICMP Echo (ping) request
 2020-12-04 00:46:22.586722 -> ICMP Echo (ping) request
 2020-12-04 00:46:32.587417 -> ICMP Echo (ping) request
 2020-12-04 00:46:41.493429 00:1b:bc:11:5b:5c -> ff:ff:ff:ff:ff:ff ARP Who has  Tell

As you can see, the capture was done for Low Priority traffic, therefore only ICMP and Broadcast ARP are shown. The following capture shows High Priority control traffic.

N5K-2# ethanalyzer local interface inbound-hi display-filter eth.src==001B.BC11.5B6C limit-captured-frames 0
 Capturing on inband
 2020-12-03 09:12:29.476910 -> BGP KEEPALIVE Message
 2020-12-03 09:12:31.053569 -> TCP bgp > 28839 [ACK] Seq=20 Ack=20 Win=444 Len=0
 2020-12-03 09:12:33.476911 00:1b:bc:11:5b:6c -> 00:2a:6a:20:79:c1 ARP Who has  Tell
 2020-12-03 09:12:38.680526 -> BGP KEEPALIVE Message
 2020-12-03 09:12:41.071191 -> TCP bgp > 28839 [ACK] Seq=39 Ack=39 Win=444 Len=0
 2020-12-03 09:12:48.199107 -> BGP KEEPALIVE Message
 2020-12-03 09:12:51.090945 -> TCP bgp > 28839 [ACK] Seq=58 Ack=58 Win=444 Len=0

Here you can see BGP packets (Keepalives and TCP ACK), as well as Unicast ARP Requests. Keep in mind, this capture only shows ingress traffic, egress traffic cannot be captured. It is possible to capture traffic into binary file and review it using Wireshark, such as

ethanalyzer local interface inbound-hi limit-captured-frames 1000 write bootflash:capture-file.pcap

As you can see, display-filters cannot be used if captured packets have to be writen down into file. You will have to use Wireshark to filter certain packets out.

More information can be found here: Nexus 3000/5000/7000 Use of the Ethanalyzer Tool

I hope it was useful!

Azure Networking through the eyes of Cisco engineer

It can be very confusing for an old school network engineer to grasp the concepts of software defined networking that exist in Public Clouds. With this article I will make an attempt to explain Azure’s network building blocks in Cisco’s terms. Because, let’s face it, in the modern world the focus moves from on-prem networking towards Public Clouds and there’s nothing we can do apart from educate ourselves to stay in demand.

Hopefully, after reading this blog post you will be able to digest intermediate-to-advanced level articles or maintain a conversation with cloud team in your company.

Continue reading

Cisco Live 2020 Techtorial Sessions Takeaways, Day 1

I’ve just returned from Cisco Live 2020. It was my third CL since I first went there in 2018. It is an amazing place to learn new things, discuss problems or issues with TAC engineers to find appropriate solutions, as well as try multiple technologies in lab environment. I’ve attended few breakout sessions this year and just want to post my notes here for my own benefit.

Continue reading

Cisco Smart License Conversion Trick

It’s been a while since Cisco has announced Smart License to replace Traditional PAK-based licensing. Overall, this new system brings loads of benefits, such as ability to track license utilization, see all managed instances by hostname, transfer licenses and product instances between virtual accounts (i.e. sub-domains) and many other features. However, it also brings some challenges and surprises that everyone must be aware of.

Continue reading

Cisco DNAC Journey Part 2 – SWIM

I’ve tried to upgrade a number of my lab devices using DNAC today. Generally speaking, it works perfectly fine, when everything is compatible (remember my note about Greenfield deployments in the last blog post?). I’ve managed to upgrade three Catalyst 3850 switches without any issues. All switches were running IOS-XE version 16.6.4 prior to upgrade and were upgraded to 16.6.4a.

The following SWIM methods have been tried:

  • Distribution, followed by instant Activation
  • Distribution, followed by delayed Activation
  • Distribution only, followed by Activation as a separate task

Continue reading

Cisco DNAC Journey Part 1 – Expectations and Reality

We’ve just finished Cisco DNA Assurance PoV… what can I say about it? Not much… It feels like a right product, but way too young for mass adoption. We only deployed the appliance itself with the aim to try DNA Assurance. However, we faced a number of issues from Day 1 even though it was a mentored install. I just thought I will share some information and maybe it will make someone’s life easier 🙂

Continue reading

Perpetual and Fast PoE are friends, m’kay?

PoE icon was borrowed from LightwareHave you ever heard about two new PoE enhancements that are available on some Catalyst 3850 and all Catalyst 9K platforms? If not, you’re not alone. I only found out about these few months back by accident when I was troubleshooting some issues. There are two great features available for you to configure on PoE-enabled switches:

  • Perpetual PoE: preserves power during warm reload
  • Fast PoE: provides power to the port after cold start at pre-outage levels

Continue reading

802.11 Duration/ID Field

I always knew that Duration/ID field is used by CSMA/CA to predict when wireless medium becomes free. However, I was confused by some publications which stated this field is set to the amount of time (in microseconds), required to transmit current frame, wait SIFS and then receive ACK. CWNA Sybex book has finally helped to understand this better.

Even though Duration/ID field tells STA how much time it has to wait before medium becomes free, it is set to the duration of SIFS + ACK. It doesn’t include the time required to transmit current frame. It kind of makes sense – to read the value from the field, STA needs to receive the frame in full and check its FCS before it can set the NAV with legitimate value.

Also, didn’t know ACK/Block ACK frames always have their Duration/ID field set to 0.

Again, makes sense. Transmission completed, all NAVs have to be reset to 0 – medium is ready for the next transmission.

Good to know: Historically this field was defined to identify STA’s association ID within PS-Poll frame (legacy power management) or Duration in any other frame. In reality, legacy power management is not being used and this field is mostly used as a Duration ONLY nowadays. However, name of the field is still Duration/ID.

EEM Tricks: Automatic Failover (Internet)

When our company decided to deploy local Internet breakouts in every single office (cloud readiness) there was a design concern around high availability. Even though our firewalls are being deployed using HA pair, a decision has been made not to overdesign service provider (SP) edge sublayer. In particular, we decided not to deploy more than one external switch. Even if we did, 99% of branches would have only one circuit deployed using single physical media. If switch and/or ISP fail, then manual intervention would be required (recabling, or routing adjustments)… In presence of regional Internet breakouts it was an obvious choice to include these into design as failover component. The question was… how to make users experience as seamless as possible if local Internet breakout fails? EEM was there to help! Continue reading