Cisco DNAC Journey Part 2 – SWIM

I’ve tried to upgrade a number of my lab devices using DNAC today. Generally speaking, it works perfectly fine, when everything is compatible (remember my note about Greenfield deployments in the last blog post?). I’ve managed to upgrade three Catalyst 3850 switches without any issues. All switches were running IOS-XE version 16.6.4 prior to upgrade and were upgraded to 16.6.4a.

The following SWIM methods have been tried:

  • Distribution, followed by instant Activation
  • Distribution, followed by delayed Activation
  • Distribution only, followed by Activation as a separate task

Had no problems with first and second approach. Software was pushed down to the switches and then activated automatically. In the third case, however, DNAC initially didn’t want to upgrade my switch and complained regarding available space on flash drive. I had to manually clean up multiple directories, including crash logs to free up space. I expected DNAC to offer me few clean up options via GUI.ย  Anyway, I managed to get green light from DNAC on Upgrade Rediness Check, but decided to try different distribution option.

If you’re familiar with Prime, you probably know that it supports software distribution and activation as separate tasks, i.e. it is possible to distribute software to a set of switches skipping activation completely. This allows engineers to perform distribution well in advance of activation (when activation date is yet unknown). Once change is approved and local supports are made aware, engineers can find completed distribution task in task list and use it to perform activation!

DNAC has you covered if you decide to use the same approach. It is done in a slightly different (more intelligent) way though.

To activate images that have been distributed earlier as a separate task, simply select same devices and perform new upgrade operation. Ignore the fact that you will not be able to skip distribution phase, just make sure it is set to Distribute Now and then click next. Define activation date and time and press finish. DNAC is smart enough to recognize that image already exists on the device. It will skip distribution step automatically and will proceed with activation.

There are couple things that you need to know before you can start using SWIM tool in DNAC

  • Integration with cisco.com is highly recommended (using CCO credentials with appropriate access) – this will enable DNAC to automatically download images from the official repository.
  • DNAC shows a list of available images from cisco.com, but limits it to the Latest and Suggested only. If you want to add custom/old image to repository, you can do it by manually importing from a device (usually won’t support import from switches as they unpackage the bin file – check Install Mode (16.6.3) on the picture above and two crosses on the right). Otherwise, you can upload previously downloaded image directly into DNAC repository from GUI. Please note, in some cases manual import can fail due to hash mismatch. Go to Settings, Integrity Verification and make sure KGV file is up to date and if not, update manually. Usually this has to be done only once. DNAC will keep this file up to date on a regular basis.
  • There has to be at least one Image marked as Golden (per device type/model). Simply click on a star icon to mark Image as golden and initiate automatic download from cisco.com if image is not yet in the repository and assuming CCO credentials have been provided.
  • You can define multiple Golden images per Device Type/Model, one per Device Role. This offers some flexibility, e.g. you can have one image defined for Access switches, but different for Core switches (check pen icon on the picture above, ALL means applicable to ALL roles, but can be set to ACCESS, DISTRIBUTION, CORE, BORDER ROUTER, UNKNOWN or any combination of these).
  • Upgrade can only be initiated if Upgrade Readiness check passes. There are critical failures (red cross icon) and non critical (amber triangle with exclamation mark). For example, you can still perform an upgrade if there’s NTP clock synchronization problem. But it won’t even let you start if at least one critical check hasn’t passed, such as available flash space, or problems with SCP/HTTPS services.

There’s also one important thing. Kind of chicken and egg situation. DNAC requires TLS1.2 for SWIM to work. Pre-Denali IOS-XE doesn’t have TLS1.2 support. Unfortunately DNAC is unable to detect this and it shows the following message in Upgrade Readiness check results:

As you can see, according to DNAC it is not critical (amber triangle). If you then try to upgrade a device, DNAC will let you do this. However, upgrade will stuck at very first step (CPU Health Check) for 20 minutes and will eventually fail (see pic below).

It made us mad during PoV as no one seemed to understand what is going on. Eventually, TAC engineer confirmed that TLS1.2 is mandatory for DNAC to work. That’s why I said it’s chicken and an egg. You will have to upgrade all your network devices to Denali (16.3.x) or above (Everest / Fuji) before it will allow you to use its SWIM tool. Prime will be there to help you ๐Ÿ˜ˆ

To check if your device supports TLS1.2 execute

show ip http server status
show ip http client secure statu

In the output you should see TLS related information, such as shown below.

If device doesn’t support TLS1.2 then it won’t show any meaningful information about it.

Once you get all your devices upgraded to the compatible version of IOS-XE using alternative way (good old Prime), you can start enjoying 21st century feel of DNAC’s GUI ๐Ÿ™‚

I would suggest the following improvements to Cisco:

  • Make DNAC smart enough to offer mitigation plan when it detects critical problems during Upgrade Readiness Check. For example, why not to let people clean up space on switch’s flash from the GUI? Certainly it’s not a big deal to add piece of code which can detect useless files in the flash (core crash, old images, backups) – engineers shouldn’t jump between GUI and CLI, really
  • Ideally, if DNAC detects legacy IOS-XE why not to let it upgrade without forcing TLS1.2 (but enforce for Denali+)?

Apart form this, this tool is sleek. I will make sure we will use it in our environment (once we get rid of legacy devices and upgrade everything to a compatible version). It is worth mentioning that it is very easy to export Upgrade Readiness status for ALL devices using CSV-formatted file and then feed it into custom script/application to do some custom magic.

Last one, I performed an upgrade of WLC 2504 using DNAC. Even though it successfully uploaded image to it, and rebooted to activate it, it wasn’t able to detect that upgrade completed successfully. It reported a failure, but device was up and running with the latest AireOS. I am not sure if it’s DNAC, unsupported WLC, or AireOS version problem/limitation. I will do few more tests and will update this post with findings at a later point.

Hope this was useful

Update #1 (11/11/2018): Apparently, if software is distributed without activation (intentionally skipped), DNAC will show that distribution is completed if you try to distribute/activate software at later stage. The only reason I haven’t noticed this behavior is because DNAC needs to re-sync the device after distribution – this is how the detection of flash contents is done. In my case, however, I distributed software and then tried to immediately activate it as a separate task. DNAC didn’t show me that software is already on a device, but as I said earlier you can simply ignore it as DNAC will skip this step anyway. It is intelligent enough to detect the image on flash on the fly.

ย 

Update #2 (17/11/2018, DNAC v1.2.6): Today I’ve tried to upgrade one site with 4 switch stacks (all Cat9Ks) from 16.6.3 to 16.6.4a. This operation was performed as a scheduled task, overnight. Unfortunately, DNAC only upgraded two stacks out of four, and reported two different issues for failed ones:

  • Software install command execution Failed: Failed to reload the device and distribute the image to the device due to pending reboot. Please reboot the device for the next software upgrade
  • Software install command execution Failed: Failed to distribute image to the device while installing software because the selected image is not extracted properly in the device

I am not quite sure what’s happened, but I managed to manually upgrade one of them (by using activate / commit sequence) without any problems. I then tried to upgrade second switch via DNAC, but got another error:

  • Installation fails as failed files available, for next run select “Erase” option to remove failed files so installation may get successful. Software install command execution Failed: Error occured while executing the command install add file flash:cat9k_iosxe.16.06.04a.SPA.bin activate commit. Command Output : install_add_activate_commit: START Sat Nov 17 07:14:33 UTC 2018 install_add_activate_commit: Adding PACKAGE FAILED: install_add_activate_commit : Super package already added. Add operation not allowed. ‘install remove inactive’ can be used to discard added packages

As you can see DNAC was not able to perform a successful upgrade. As opposed to the previous example, activate / commit sequence didn’t quite work here.

I tried to manually abort upgrade operation and remove inactive packages, but it also failed. I had to reload switch stack first, remove inactive packages, delete bin file from flash, resync with DNAC and then repeat distribution / activation tasks from scratch. It fixed the problem, but how painful is that?

Please note the following. In Cisco Prime, when engineers upgrade devices, they can select consecutive or parallel deployment method. Consecutive approach allows them to performs upgrades one by one and abort upgrade operation for the whole batch once any failure occurs. I haven’t found anything like this in DNAC v1.2.6. All upgrades happen in parallel. I can only assume that Cisco has improved IOS-XE to make sure no critical failures can happen, hence this extra layer of protection is not required anymore.

Conclusion: Upgrade on 50% of devices failed, same model, same IOS-XE. Would you use it? Let me know in the comments about your own experience.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: