Troubleshooting NX-OS Config Sync

I have spent last weeks configuring our new Cisco Nexus 5596UP switches in two data centers. The decision to use configuration synchronization feature (also known as Switch Profile) seemed logical as our new DC infrastructure design dictates to use Dual Homed FEXes with Active/Passive NIC teaming topology. This scenario (like any Dual Homed) requires almost all configuration to be identical on both switches that are part of vPC domain. Overall, I like this neat feature. In my humble opinion, Cisco had to come up with it years ago. It works like a charm if you are working on a clean deployment and follow Cisco guidelines. But… when it comes to the migration from Legacy configuration mode to the Switch Profile mode with both vPC domain switches already being pre-configured separately… well, you’ll definitely face some issues! I personally have spent few days trying to solve one puzzle that driven me nuts! 

Before reading further, I highly recommend you to familiarize yourself with my previous blog post and some official Cisco documents:

Now, imagine that you have both vPC domain switches fully configured without using Config Sync mode. Nothing special. This is absolutely normal scenario, especially for early deployments (i.e. NX-OS 5.x) when Config Sync mode had many various limitations. Unfortunately (well, for you) the management/TA department decided to implement this shiny feature to decrease administrative overhead of any future configuration tasks.

If you carefully follow Cisco’s guidelines while importing configuration, you probably won’t have any problems. But… you can face some odd issues, like in my case, and can spend crazy hours trying to understand what went wrong. Here’s a good example of my own…

I started to import configuration and followed all recommendations. Everything seemed to be ok with exception of one single FEX interface that I wasn’t able to configure at all, with no difference in what configuration mode I tried to do that:

N5K-01(config)# interface ethernet101/1/15
N5K-01(config-if)# inherit port-profile PP-Servers
Error: Command is not mutually exclusive

N5K-01(config)# conf sync
N5K-01(config-sync)# switch-profile vPC
N5K-01(config-sync-sp)# interface ethernet101/1/15
N5K-01(config-sync-sp-if)# inherit port-profile PP-Servers
N5K-01(config-sync-sp-if)# verify
Failed: Verify Failed

N5K-01(config-sync-sp-if)# show switch-profile status

switch-profile  : vPC
----------------------------------------------------------

Start-time: 633904 usecs after Mon Sep 30 09:37:15 2013
End-time: 640502 usecs after Mon Sep 30 09:37:15 2013

Profile-Revision: 17
Session-type: Verify
Session-subtype: -
Peer-triggered: No
Profile-status: Verify Failed

Local information:
----------------
Status: Verify Failure
Error(s): 
Following commands failed mutual-exclusion checks:
interface Ethernet101/1/15
        inherit port-profile PP-Servers

Both switches were reporting the same errors. If you have carefully read Cisco’s Config Sync Troubleshooting guide, you probably know that “Command is not mutually exclusive” or “commands failed mutual-exclusion checks” mean that some configuration is not consistent across two vPC domain switches within two different configuration modes. In particular, it often happens when you import the config from the Global mode to the Switch Profile (Config Sync) mode on one switch, but forgot to do this on the other one before re-enabling synchronization… But that was not my case! I have confirmed Global and Switch Profile’s configuration for this interface and found it to be identical (empty, to be more precise) on both switches:

N5K-01# sh run include-switch-profile | section 101/1/15
interface Ethernet101/1/15
  interface Ethernet101/1/15

N5K-02# sh run include-switch-profile | section 101/1/15
interface Ethernet101/1/15
  interface Ethernet101/1/15

So, what was wrong? I have Google’d for days, read tens of different posts and finally found one comment that was covering a slightly different issue. Anyway, the idea behind that post was about NX-OS maintaining internal databases that are presenting running configuration using kind of NX-OS-friendly structure…  This databases exist for both configuration modes – Global and Switch Profile. As it turned out, NX-OS uses those structures for mutual exception checks (and not only)! And, to make things worse… these databases can become unsynchronized with running-configuration. So it was in my case – I have checked running-config, confirmed it looked ok, but had no idea about those internal structures which didn’t reflect some of my recent changes. Every time you have an unexplained behavior of the Switch Profile feature, use the following commands to confirm the internal databases are consistent with the running configuration from the perspective of your problem:

  • show system internal csm info switch-profile cfgd-db cmd-tbl
    Display internal Switch Profile configuration database.
  • show system internal csm info global-db cmd-tbl
    Display internal Global configuration database.

This two commands will save hours of your time when troubleshooting odd Config Sync issues, remember them well! The basic idea is that internal databases content must reflect the state of the running configuration. In my case these internal databases have had the following records.

N5K-02# show system internal csm info switch-profile cfgd-db cmd-tbl | sec 101/1/15
parent_seq_no= 0,  seq_no= 1079,  clone_seq_no= 0, cmd= 'interface Ethernet101/1/15'

N5K-02# show system internal csm info global-db cmd-tbl | sec 101/1/15
  clone_seq_no= 0, cmd= 'interface Ethernet101/1/15'
    parent_seq_no= 811,  seq_no= 812,  clone_seq_no= 0, cmd= 'description Servers'
    parent_seq_no= 811,  seq_no= 815,  clone_seq_no= 0, cmd= 'duplex full'
    parent_seq_no= 811,  seq_no= 814,  clone_seq_no= 0, cmd= 'speed 1000'
    parent_seq_no= 811,  seq_no= 813,  clone_seq_no= 0, cmd= 'switchport access vlan 150'

Although only N5K-02’s output is shown here, both switches have had identical records in their internal databases. This doesn’t look right if you compare this output to the running-configuration contents shown few paragraphs before.

I recalled these commands were part of the Global configuration mode before I reset port’s configuration to default! I wanted to try a different configuration method as opposed to the import/verify/commit sequence. Basically, I wanted to reset port’s configuration to default in Global configuration mode on both switches and configure it on one of the switches via Switch Profile. I have checked my PuTTY session log and found this output right after I applied the default configuration to the interface:

Interface config wipeout failed for 0x2
Interface config wipeout failed for 0x1f690000

Even though I encounter this error every time when I wipe out any interface’s configuration using default interface command, I never paid any special attention to it. Mainly because I relied on running-configuration contents and it always reflected my changes. Now it became obvious what this error means… It tells network engineer that configuration was not wiped out from the internal configuration structures! As result, the switches stuck in internally unsynchronized/non-consistent state.

Luckily, there is a command to force internal databases synchronization – resync-database. It has to be executed on both switches separately, from the root of the Config Sync mode (not within Switch Profile). A complete example follows

N5K-01(config-sync)# resync-database
Re-synchronization of switch-profile db takes a few minutes...
Re-synchronize switch-profile db completed successfully.

N5K-02(config-sync)# resync-database
Re-synchronization of switch-profile db takes a few minutes...
Re-synchronize switch-profile db completed successfully.

N5K-02# show system internal csm info switch-profile cfgd-db cmd-tbl | sec 101/1/15
parent_seq_no= 0,  seq_no= 1079,  clone_seq_no= 0, cmd= 'interface Ethernet101/1/15'

N5K-02# show system internal csm info global-db cmd-tbl | sec 101/1/15
  clone_seq_no= 0, cmd= 'interface Ethernet101/1/15'

As you can see, the configuration is now consistent across running-configuration and internal databases on both switches (only N5K verification is shown). You are now allowed to apply the configuration to the problematic port via Switch Profile (or Global mode, if that was your requirement) and successfully commit all changes!

Hope this saved your time!

16 Comments

  1. LTLnetworker says:

    Well done, my friend.

    • Tim Dmitrenko says:

      Thank you for your feedback. Much appreciate, especially taking into account this is the first comment on my blog 🙂 Take care!

  2. jajo says:

    great!
    thanks

  3. Priba says:

    Great stuff Tim.
    Thank you, been struggling whit getting some info about “resync-database” command

  4. Costyn says:

    This is fantastic, thank you for this! Helped us out of a sticky situation.

  5. Blake says:

    This worked great… thank you. Couldn’t figure out why an interface that was defaulted on both 5k’s wouldn’t sync together.

  6. Andrew R. says:

    Good one, can you tell me what nx-os version did you have on this 55xx? We got the same problem after recent firmware update

    • Hi Andrew
      This issue has been spotted on 6.0(2)N1(2). We recently upgraded one of our DCs to 6.0(2)N2(2), but we haven’t had any major changes there recently, so I was not able to check if bug is still there… Let me know what is the version your N5Ks are running? Cheers

  7. Andrew R. says:

    Nexus 5548, upgrade from 5.1(3)N2(1b) to 5.2(1)N1(5), a response we got from TAC is to re-create switch-profile. Unfortunately, haven’t tried this solution yet – boxes are in production.

  8. Gene Mosley says:

    Thank you! I had the same problem and it was very annoying.

  9. Petya Karayaneva says:

    Thank you!
    I have a pair of N5K switches system version: 6.0(2)N1(2) with the same problem. This solution helped me!

  10. Frans Brinkman says:

    Very useful ! Thanks for sharing ! (and still seen in 6.0(2)N2(3) )

  11. Hi
    Do you know if this process is disruptive?

    • I am not sure I understand the question. Process of checking or synchronizing the config? In fact, everything that’s covered in this article is NON disruptive. It is possible to disable synchronization between peers at any time without affecting live environment.

Leave a Reply

%d bloggers like this: