Benefits of using a Standby Cisco APIC Controller

So you may wonder why you should consider purchasing a standby APIC controller for your current APIC cluster. A valid argument is why you would need a standby APIC if you already have a Cisco 24x7x4 Smartnet contract. Well Smartnet is great but:

  1. We are currently experiencing hardware shortages so sometimes Cisco cannot commit to getting you a replacement APIC in 4 hours or less.
  2. If you do get an APIC replacement, it will most likely not ship with the code version of your current APIC cluster. As a result, you will spend time updating the CIMC ~1 hour + updating the APIC application code version to match. The time it takes depends on the code versions. In some cases, the appliance may ship with a newer code version and you will have to downgrade as well. You cannot assume that Smartnet 24x7x4 will mean you will have an operational APIC with in 4 hours because code upgrades/downgrades will be required.
  3. When you get a replacement APIC, you must still take time to UNRACK the failed APIC and Install the new APIC so there is some time with physical labor to provision the replacement.
  4. If you had a standby APIC, it will get automatically upgraded during your upgrade cycle so if you have a failed APIC in the cluster, you can easily replace the failed with the Standby APIC controller. No new physical connections are needed as all of this has already been planned and provisioned.

Cisco ACI APIC OOB Management MAC Address is flapping

Recently I came across a request to rename the APIC Controllers and directly after the APICs were renamed and rebooted, OOB management access started flaking out. Basic PING tests revealed OOB reachability issues. What I discovered was that the Bond1 interface was consistently failing over between Eth1/1 and Eth1/2, the OOB management would continuously relearn the MAC addresses on these ports and this created a management access issue.

The ACI code version this was experienced with was 5.2(6e)

Cisco APIC Physical Interfaces

I confirmed this by checking where the BOND1 MAC address was being learned and after multiple refreshes on the management switch, it was obvious that it was flapping back and forth. This caused the PING tests to fail from time to time.

I did some web surfing for this issue and found some reference from an older 1.x version around LLDP issues when changing the hostname and the Cisco fix was to shutdown one of the BOND1 interfaces. But this is not acceptable for my customers, so I started doing some digging and the ultimate fix was to decommission each APIC at a time and wipe it. Then re-add the APICs to the cluster and after these steps were performed, the APIC OOB MAC Flapping was resolved.

STEPs

  1. Document the preferred hostname, Fabric name, Fabric ID, POD #, VTEP pool, OOB MGMT IP and Gateway, the Infra VLAN used and the local admin credentials. These are the parameters you will need to recommission the APICs.
  2. Decommission one APIC at a time from the cluster. Wait ~5 minutes to make sure its all replicated properly.
  3. Console via CIMC into the target APIC and Wipe it out using below:
apic# acidiag touch clean
This command will wipe out this device. Proceed? [y/N] y
apic# acidiag touch setup
This command will reset the device configuration, Proceed? [y/N] y
apic# acidiag reboot
This command will restart the this device, Proceed? [y/N] y

3. After the APIC reboots, the SETUP should run and you can reprovision the APIC via CIMC using STEP 1’s parameters. Wait ~2-3 minutes for the APIC to converge itself after the settings.

4. Finally, commission the target APIC in the GUI by simply right-clicking the old APIC in the list and clicking commission. Even though the old APIC name may be shown, after ~5 minutes, the recommissioned APIC will converge into the cluster and be shown properly.

I am also asked sometimes how the OOB management switches should be configured for the APIC OOB BOND1 interfaces, since they are “bonded”. These ports should be treated similar to regular access ports.

!---OOB-MGMT-SWITCH---- ! 
interface G1/0/1  
 desc <APIC-HOSTNAME> OOB Bond1 management interface  
 switchport mode access  
 switchport access vlan <OOB-MGMT-VLAN>  
 spanning-tree portfast 
!

Reason to use Cisco Nexus instead of Cisco Catalyst in the Data Center

Time and time again I have customers wanting to understand the true benefit of a Cisco Nexus switch versus a Cisco catalyst switch in the data center for connecting servers. Customers may argue that they just need simple 1G or 10G speeds, dual-homed with a port-channel and they can achieve this simply using a stack of Catalyst 9300 or Catalyst 9500 switches. So here are a few reasons to ponder:

Code upgrade

If you have servers that are dual-homed for redundancy across two separate Cisco catalyst switches which are stacked because you want to leverage a port-channel then that sounds fine and dandy but when it comes time to upgrade the switch, because there will be a time you have to upgrade the switch, the whole stack will need to be reloaded resulting in an outage to your servers. This is not the case with Cisco Nexus.

Lower latency = better performance

VeloCloud in AWS

After a few hours of troubleshooting, I found out that when using the 3.3 brownfield Cloudformation template, entering the VCO as an IP does not work. You must use the FQDN instead of the IP for the VCO. I also made sure to set the version to 331 instead of 321. The instance type of C5.4xlarge. After the vEdge joins the orchestrator, then you can upgrade the version to a newer code.

Cisco VXLAN troubleshooting

ERROR after you configure EVPN

No VLAN id configured, unable to generate auto RD

This is because your NVE interface is down. Shutdown your NVE loopback and NVE interface, then unshut your loopback followed by NVE interface.

Border leaf receiving advertisement from external router and advertising to spine. Spine not advertising to other leafs. After review of the bgp l2vpn evpn routing table, its indicates “Path type: internal, path is invalid(no RMAC or L3VNI), no labeled nexthop”.

Why is this happening? Well because you don’t have the L3VNI configured properly. On the Border Leaf, verify that you have the L3VNI VLAN defined, the vni assigned to the VLAN and the interface VLAN defined with vrf and ip forward.

Example:

vlan 2500
name L3VNI-VLAN
vn-segment 50000

vrf context PROD
vni 50000
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
address-family ipv6 unicast
route-target both auto
route-target both auto evpn

interface Vlan2500
description L3VNI-SVI
no shutdown
mtu 9216
vrf member PROD
no ip redirects
ip forward
no ipv6 redirects

interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback1
member vni 50000 associate-vrf