The Safest Way to Convert to LACP

By | November 18, 2014

I have been having “discussions” with our network team regarding the network design for our new Datacenter. The networking team was pushing LACP hard, I was pushing back. My reasons for not doing it:

  1. Almost every online resource I had found had said LACP was difficult to get right end-to-end
  2. VMware has found that most customers use LBT over LACP (based on meeting with VMware, though some support folks have seen LACP)
  3. Not recommended for multi-nic vMotion (requires two independent links) or iSCSI port binding
  4. LACP configuration settings possibly not present in host profiles (I believe this is nonsense actually, all of the LACP configuration is at the VDS level so there would be nothing to maintain at the host profile level)
  5. This datacenter would be different from our other datacenters and would be a one-off

The network teams reasons for implementing LACP:

  1. Improve link utilization of NFS
  2. Will not overload 5k interlink (networking has a straight through design )
  3. Improve Nic utilization monitoring via SNMP (since it’s straight through, each nic will only report to a single port on the 5k)
  4. Simplified setup for network operations to manage
  5. Faster MAC relearning for virtual fabric path (microseconds vs 2-3 seconds)

Long story short, I tried to implement LBT but had major issues due to the network design that was already put in place. Testing with LACP was perfect, but this meant I had to reconfigure everything to LACP.

The VMware kb article mentions a three step process to change a vds uplinks, hosts, and portgroups to LACP. Basically the steps VMware lists are:

  1. Create the lag
  2. Assign all of the distributed port groups to use the lag uplink as standby
  3. Assign the host uplinks to the lag
  4. Assign all of the distributed port groups to use the lag as active and the other links as unused

In my testing I have had issues right after I completed step 3. I’m not sure if you have to do one at a time or both at a time, but either way it starts to get really weird. Half the things work, half the things don’t. It’s basically classic portchannel issues. Since the host uplinks are using the lag, the hashing algorithim is in effect. Since the distributed portgroups still have the lag as standby, those portgroups will use one link or the other which doesn’t seem to work with the lag. This is how I reasoned it in my head, in reality it may have been some other issue. Regardless, I have developed my own series of steps that seem to work for me. YMMV, so always, ALWAYS test on a small scale with your own setup.

  1. Backup the VDS
    1. Distributed Switches -> Right Click on the VDS -> All vCenter Actions -> Export
      lacp-exportvds1
    2. Choose all port groups -> OK
      lacp-exportvds2
    3. Click Yes
      lacp-exportvds3
    4. Choose filename
  2. Import the VDS (we will call this VDS-temp)
    1. Datacenters Inventory List -> Right Click on the Datacenter-> All vCenter Actions -> Import Distributed Switch
      lacp-importvds1
    2. Browse to zip file
    3. Do not choose “Preserve original distributed switch and port group identifiers”
      lacp-importvds12jpg
    4. Click Next -> Finish
  3. Add all of the hosts to VDS-temp and assign one vmnic to the VDS-temp uplink (wizard)
    1. Distributed Switches -> Right Click on VDS-Temp -> Add and Manage Hosts
      lacp-addhosts-vdstemp-1
    2. Choose Add Hosts -> Next -> New Hosts -> Choose all hosts that are on the source  VDS
      lacp-addhosts-vdstemp-2
      lacp-addhosts-vdstemp-3
    3. Select the “Configure identical network settings on multiple hosts (template mode) -> Next
      lacp-addhosts-vdstemp-4
    4. Select one host as a template -> Next
      lacp-addhosts-vdstemp-5
    5. Deselect Manage “VMKernel adapters (template mode)”, you should only have “Manage physical adapters (template mode)” -> Next
      lacp-addhosts-vdstemp-6
    6. Click on one of the vmnics (I like to move vmnic1) -> Assign uplink -> Choose an uplink (I choose Uplink 2 for consistency)
      lacp-addhosts-vdstemp-8
    7. Now click “Apply to All” -> Next -> Next -> Finish
      lacp-addhosts-vdstemp-7
  4. Migrate all of the VMkernels to the VDS-temp portgroups (wizard)
    1. Distributed Switches -> Right Click on VDS-Temp -> Add and Manage Hosts
    2. Choose Manage host networking -> Next -> Attached Hosts -> Choose all hosts that are on the VDS-temp -> OK -> Next
      lacp-migvratemkernels-vdstemp-1

      1. Note that I do not use the template mode for migrating the VMkernels using template mode forces you to enter in all of the IP addresses of the management kernels
    3. Deselect “Manage Physical adapters” -> Next
    4. Click on each vmkernel and assign the source port group from VDS-temp -> Next -> Next -> Finish
      lacp-migvratemkernels-vdstemp-2
  5. Flip all of the portgroups for all of the VMs to use the portgroup from the VDS-temp (I used a script for this, unfortunately I can’t share it since I didn’t write it)
  6. Migrate the last vmnic to the VDS-temp uplink (this is probably not necessary but I do it anyways)
  7. Now create the lag on the VDS
    1. Click on VDS -> Manage -> Settings -> LACP -> “Plus sign”
      lacp-createlag1
    2. Name your lag, select your ports, mode, and load balancing mode.
      lacp-createlag2

      1. I selected Active since my Network team has their ports set to Passive
      2. Be sure that  your hashing algorithm matches with the ports
  8. Modify all of the distributed port groups teaming/failover to have the lag as active and all other links as unused (wizard)
    1. Distributed Switches -> Right Click on VDS -> Manage Distributed Port Groups
      lacp-modifypg-1
    2. Select Teaming and failover -> Next
      lacp-modifypg-2
    3. Select all of the port groups to modify (in my case all of them) -> Next
      lacp-modifypg-3
    4. Change failback to No (no need to failback in LACP, I don’t even know if it applies since it’s a LAG)
    5. Move the LAG to the Active uplink and move all other uplinks to unused
      lacp-modifypg-4
  9. Assign a single uplink from each host to the VDS lag (wizard)
    lacp-assign1stuplink-lag-vds-1
    lacp-assign1stuplink-lag-vds-2
  10. Migrate all of the vmkernels to the VDS portgroups (wizard)
  11. Flip all of the portgroups for all of the VMs to use the portgroup from the VDS
  12. Migrate the last vmnic to the VDS lag uplink
    lacp-assign2nduplink-lag-vds-1
    lacp-assign2nduplink-lag-vds-2
  13. Remove the host from VDS-temp
  14. Remove VDS-temp from inventory

One thought on “The Safest Way to Convert to LACP

  1. Pingback: To VDS and Back: Migrating a VDS Host to a New vCenter | Virtual Chris

Leave a Reply

Your email address will not be published. Required fields are marked *