Cisco DNAC Journey Part 2 – SWIM

I’ve tried to upgrade a number of my lab devices using DNAC today. Generally speaking, it works perfectly fine, when everything is compatible (remember my note about Greenfield deployments in the last blog post?). I’ve managed to upgrade three Catalyst 3850 switches without any issues. All switches were running IOS-XE version 16.6.4 prior to upgrade and were upgraded to 16.6.4a.

The following SWIM methods have been tried:

  • Distribution, followed by instant Activation
  • Distribution, followed by delayed Activation
  • Distribution only, followed by Activation as a separate task

Had no problems with first and second approach. Software was pushed down to the switches and then activated automatically. In the third case, however, DNAC initially didn’t want to upgrade my switch and complained regarding available space on flash drive. I had to manually clean up multiple directories, including crash logs to free up space. I expected DNAC to offer me few clean up options via GUI.  Anyway, I managed to get green light from DNAC on Upgrade Rediness Check, but decided to try different distribution option.

If you’re familiar with Prime, you probably know that it supports software distribution and activation as separate tasks, i.e. it is possible to distribute software to a set of switches skipping activation completely. This allows engineers to perform distribution well in advance of activation (when activation date is yet unknown). Once change is approved and local supports are made aware, engineers can find completed distribution task in task list and use it to perform activation!

DNAC has you covered if you decide to use the same approach. It is done in a slightly different (more intelligent) way though.

To activate images that have been distributed earlier as a separate task, simply select same devices and perform new upgrade operation. Ignore the fact that you will not be able to skip distribution phase, just make sure it is set to Distribute Now and then click next. Define activation date and time and press finish. DNAC is smart enough to recognize that image already exists on the device. It will skip distribution step automatically and will proceed with activation.

There are couple things that you need to know before you can start using SWIM tool in DNAC

  • Integration with cisco.com is highly recommended (using CCO credentials with appropriate access) – this will enable DNAC to automatically download images from the official repository.
  • DNAC shows a list of available images from cisco.com, but limits it to the Latest and Suggested only. If you want to add custom/old image to repository, you can do it by manually importing from a device (usually won’t support import from switches as they unpackage the bin file – check Install Mode (16.6.3) on the picture above and two crosses on the right). Otherwise, you can upload previously downloaded image directly into DNAC repository from GUI. Please note, in some cases manual import can fail due to hash mismatch. Go to Settings, Integrity Verification and make sure KGV file is up to date and if not, update manually. Usually this has to be done only once. DNAC will keep this file up to date on a regular basis.
  • There has to be at least one Image marked as Golden (per device type/model). Simply click on a star icon to mark Image as golden and initiate automatic download from cisco.com if image is not yet in the repository and assuming CCO credentials have been provided.
  • You can define multiple Golden images per Device Type/Model, one per Device Role. This offers some flexibility, e.g. you can have one image defined for Access switches, but different for Core switches (check pen icon on the picture above, ALL means applicable to ALL roles, but can be set to ACCESS, DISTRIBUTION, CORE, BORDER ROUTER, UNKNOWN or any combination of these).
  • Upgrade can only be initiated if Upgrade Readiness check passes. There are critical failures (red cross icon) and non critical (amber triangle with exclamation mark). For example, you can still perform an upgrade if there’s NTP clock synchronization problem. But it won’t even let you start if at least one critical check hasn’t passed, such as available flash space, or problems with SCP/HTTPS services.

There’s also one important thing. Kind of chicken and egg situation. DNAC requires TLS1.2 for SWIM to work. Pre-Denali IOS-XE doesn’t have TLS1.2 support. Unfortunately DNAC is unable to detect this and it shows the following message in Upgrade Readiness check results:

As you can see, according to DNAC it is not critical (amber triangle). If you then try to upgrade a device, DNAC will let you do this. However, upgrade will stuck at very first step (CPU Health Check) for 20 minutes and will eventually fail (see pic below).

It made us mad during PoV as no one seemed to understand what is going on. Eventually, TAC engineer confirmed that TLS1.2 is mandatory for DNAC to work. That’s why I said it’s chicken and an egg. You will have to upgrade all your network devices to Denali (16.3.x) or above (Everest / Fuji) before it will allow you to use its SWIM tool. Prime will be there to help you 😈

To check if your device supports TLS1.2 execute

show ip http server status
show ip http client secure statu

In the output you should see TLS related information, such as shown below.

If device doesn’t support TLS1.2 then it won’t show any meaningful information about it.

Once you get all your devices upgraded to the compatible version of IOS-XE using alternative way (good old Prime), you can start enjoying 21st century feel of DNAC’s GUI 🙂

I would suggest the following improvements to Cisco:

  • Make DNAC smart enough to offer mitigation plan when it detects critical problems during Upgrade Readiness Check. For example, why not to let people clean up space on switch’s flash from the GUI? Certainly it’s not a big deal to add piece of code which can detect useless files in the flash (core crash, old images, backups) – engineers shouldn’t jump between GUI and CLI, really
  • Ideally, if DNAC detects legacy IOS-XE why not to let it upgrade without forcing TLS1.2 (but enforce for Denali+)?

Apart form this, this tool is sleek. I will make sure we will use it in our environment (once we get rid of legacy devices and upgrade everything to a compatible version). It is worth mentioning that it is very easy to export Upgrade Readiness status for ALL devices using CSV-formatted file and then feed it into custom script/application to do some custom magic.

Last one, I performed an upgrade of WLC 2504 using DNAC. Even though it successfully uploaded image to it, and rebooted to activate it, it wasn’t able to detect that upgrade completed successfully. It reported a failure, but device was up and running with the latest AireOS. I am not sure if it’s DNAC, unsupported WLC, or AireOS version problem/limitation. I will do few more tests and will update this post with findings at a later point.

Hope this was useful

Update #1 (11/11/2018): Apparently, if software is distributed without activation (intentionally skipped), DNAC will show that distribution is completed if you try to distribute/activate software at later stage. The only reason I haven’t noticed this behavior is because DNAC needs to re-sync the device after distribution – this is how the detection of flash contents is done. In my case, however, I distributed software and then tried to immediately activate it as a separate task. DNAC didn’t show me that software is already on a device, but as I said earlier you can simply ignore it as DNAC will skip this step anyway. It is intelligent enough to detect the image on flash on the fly.

 

Update #2 (17/11/2018, DNAC v1.2.6): Today I’ve tried to upgrade one site with 4 switch stacks (all Cat9Ks) from 16.6.3 to 16.6.4a. This operation was performed as a scheduled task, overnight. Unfortunately, DNAC only upgraded two stacks out of four, and reported two different issues for failed ones:

  • Software install command execution Failed: Failed to reload the device and distribute the image to the device due to pending reboot. Please reboot the device for the next software upgrade
  • Software install command execution Failed: Failed to distribute image to the device while installing software because the selected image is not extracted properly in the device

I am not quite sure what’s happened, but I managed to manually upgrade one of them (by using activate / commit sequence) without any problems. I then tried to upgrade second switch via DNAC, but got another error:

  • Installation fails as failed files available, for next run select “Erase” option to remove failed files so installation may get successful. Software install command execution Failed: Error occured while executing the command install add file flash:cat9k_iosxe.16.06.04a.SPA.bin activate commit. Command Output : install_add_activate_commit: START Sat Nov 17 07:14:33 UTC 2018 install_add_activate_commit: Adding PACKAGE FAILED: install_add_activate_commit : Super package already added. Add operation not allowed. ‘install remove inactive’ can be used to discard added packages

As you can see DNAC was not able to perform a successful upgrade. As opposed to the previous example, activate / commit sequence didn’t quite work here.

I tried to manually abort upgrade operation and remove inactive packages, but it also failed. I had to reload switch stack first, remove inactive packages, delete bin file from flash, resync with DNAC and then repeat distribution / activation tasks from scratch. It fixed the problem, but how painful is that?

Please note the following. In Cisco Prime, when engineers upgrade devices, they can select consecutive or parallel deployment method. Consecutive approach allows them to performs upgrades one by one and abort upgrade operation for the whole batch once any failure occurs. I haven’t found anything like this in DNAC v1.2.6. All upgrades happen in parallel. I can only assume that Cisco has improved IOS-XE to make sure no critical failures can happen, hence this extra layer of protection is not required anymore.

Conclusion: Upgrade on 50% of devices failed, same model, same IOS-XE. Would you use it? Let me know in the comments about your own experience.

3 Comments

  1. ismail kalolwala says:

    Hi There,

    We are not on 16.6.X Train as of now. we are on old IOS Model – 3.6.6E Series. I am aware that it doesnt support TLS. But SCP Transfer is too slow and fails after 1 hour and couple of mins.

    Question 1 : We have ample bandwidth available, yet SCP is not picking up the bandwidth.
    HTTPS Transfer will definately fail due to un support of TLS on the IOS XE platform.

    Is there a way to use FTP / TFTP from DNAC side and make this distribution task happen.

    Regards
    Ismail kalolwala

  2. -Chris says:

    I am finding a few issues with DNAC and I am installing 1.2.10 currently. One install the certificate was signed by the MS CA and installed on DNAC. When adding a WLC the DNAC was installing a CA Trust list from before the cert install, causing the controller to not trust DNAC and not send assurance. I had to load the Trust-list manually on the controller but to find out if we provision the controller the DNAC over-rides that CA Trust and we don’t get assurance again.
    I have also had an issue pushing out the application policy (QoS) to 9500’s in a virtual stack-wise configuration. The DNAC attempts to push the config to the ports configured for stackwise and completely fails. I ended up taking the output and creating my own CLI template to push out the QoS successfully.

    There are still some growing pains but I see a lot of potential in the product. Cisco you just need to get these little issues worked out.

    • Thanks for your feedback. I do agree. This product has a great potential, but currently it is only suitable for greenfield deployments – i.e. where all devices are of compatible HW/SW and sites are fully deployed using DNAC (not a mix of CLI/Prime/etc). Cisco is trying to resurrect this project with us and help us get most of this product – hopefully I will be able to update in few weeks. At the moment, even though we upgraded to the latest version, I still run into a number of issues. For example, WLC 5520 with compatible AireOS (v8.5.131) is seen as Unsupported device and I have nothing in Assurance tab. I also had to go through the same pain as yourself – manually upload certificate to WLC to enable assurance with little luck. We had few issues with HA WLCs, where you have to make sure the discovery is done without using Loopback. DNAC is not smart enough to ignore this setting for WLCs and it adds HA cluster using secondary WLC IP – which makes the whole cluster Inactive/Offline from DNAC perspective immediately after discovery 🙂 They’ve completely changed how automation and 0-day provisioning works in the latest releases, so I’ve lost track on it and haven’t tried this feature as of yet – I can’t and don’t want to learn how feature works multiple times. It just shows that even Cisco sees this product as one for ‘early adopters’ at the moment. I cannot explain massive changes in GUI with each version (if it was a stable platform)

Leave a Reply

Your email address will not be published. Required fields are marked *

question razz sad evil exclaim smile redface biggrin surprised eek confused cool lol mad twisted rolleyes wink idea arrow neutral cry mrgreen

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: