I would like to make you all aware about catastrophic bug that affects (or may potentially affect) you. It applies to legacy eol/eos wireless controllers (Cisco WLC 4400 series), which are still massively deployed in many companies. A friend of mine has asked me to help troubleshooting very strange problem they’ve been experiencing for some time. Here’s a story and workaround.
My friend’s company runs H-REAP wireless environment using two 4402 controllers (active/active) – APs are equally allocated across both controllers. Everything has been working fine, until they faced power outage in one office. Once power was restored, APs were not able to join primary controller and were flapping on the secondary controller. Hence, service was not available to the end users. IT guys have been struggling to find the reason of such behaviour. What made things worse is that APs were unmanageable – there was no way to enable SSH on them, or re-configure anything else, due to the consistent flapping.
I’ve spent few hours to find the root cause. All debug outputs were pointing to the problem with establishing DTLS connection. I then accidentally found some references to new Cisco bugs, which have finally shed the light on this problem. Basically, all controllers and APs are supplied with MIC (manufacturer installed certificate) and those are valid for 10 years. Very old controllers (and/or APs) may have an expired certificate. Well… they still can work normally, until a reboot takes place.
Troubleshooting can be complicated, like it was in my case. You have to remember these general rules:
- WLC will provide enough debug information if an AP with expired MIC is trying to connect
- WLC will not provide enough debug information if its own MIC is expired. You must check AP’s console debug output for relevant messages.
In my case, I’ve dealt with an expired WLC’s MIC. Hence, it took some time to get enough information to put into Google – have checked everything and AP’s console was my last resort point.
An official workaround from Cisco is to disable NTP (change year back to 2014). This will allow AP to join the controller.
There is a fix. It was released under 7.252.0 AireOS. However, according to Bug Toolkit reports, it doesn’t resolve this problem for very old WLCs. Hence, there are two mostly identical bugs in the database:
- LAP/WLC MIC or SSC lifetime expiration causes DTLS failure
- CSCuq19142 workaround doesn’t work on very old 4400s with Airespace MIC
Please refer to bug #1 for more information. There you will find information as to how to confirm your WLC’s manufacturing date using its serial number.
As for now, it’s fixed for live controllers, but not fully fixed for legacy 4400. Check and monitor bug #2 if AireOS 7.252.0 hasn’t fixed your issues.
Hope this saved your time!