The Safest Way to Convert to LACP

I have been having “discussions” with our network team regarding the network design for our new Datacenter. The networking team was pushing LACP hard, I was pushing back. My reasons for not doing it:

Almost every online resource I had found had said LACP was difficult to get right end-to-end
VMware has found that most customers use LBT over LACP (based on meeting with VMware, though some support folks have seen LACP)
Not recommended for multi-nic vMotion (requires two independent links) or iSCSI port binding
LACP configuration settings possibly not present in host profiles (I believe this is nonsense actually, all of the LACP configuration is at the VDS level so there would be nothing to maintain at the host profile level)
This datacenter would be different from our other datacenters and would be a one-off

The network teams reasons for implementing LACP:

Improve link utilization of NFS
Will not overload 5k interlink (networking has a straight through design )
Improve Nic utilization monitoring via SNMP (since it’s straight through, each nic will only report to a single port on the 5k)
Simplified setup for network operations to manage
Faster MAC relearning for virtual fabric path (microseconds vs 2-3 seconds)

Long story short, I tried to implement LBT but had major issues due to the network design that was already put in place. Testing with LACP was perfect, but this meant I had to reconfigure everything to LACP.

The VMware kb article mentions a three step process to change a vds uplinks, hosts, and portgroups to LACP. Basically the steps VMware lists are:

Create the lag
Assign all of the distributed port groups to use the lag uplink as standby
Assign the host uplinks to the lag
Assign all of the distributed port groups to use the lag as active and the other links as unused

In my testing I have had issues right after I completed step 3. I’m not sure if you have to do one at a time or both at a time, but either way it starts to get really weird. Half the things work, half the things don’t. It’s basically classic portchannel issues. Since the host uplinks are using the lag, the hashing algorithim is in effect. Since the distributed portgroups still have the lag as standby, those portgroups will use one link or the other which doesn’t seem to work with the lag. This is how I reasoned it in my head, in reality it may have been some other issue. Regardless, I have developed my own series of steps that seem to work for me. YMMV, so always, ALWAYS test on a small scale with your own setup.

Backup the VDS
1. Distributed Switches -> Right Click on the VDS -> All vCenter Actions -> Export
2. Choose all port groups -> OK
3. Click Yes
4. Choose filename
Import the VDS (we will call this VDS-temp)
1. Datacenters Inventory List -> Right Click on the Datacenter-> All vCenter Actions -> Import Distributed Switch
2. Browse to zip file
3. Do not choose “Preserve original distributed switch and port group identifiers”
4. Click Next -> Finish
Add all of the hosts to VDS-temp and assign one vmnic to the VDS-temp uplink (wizard)
1. Distributed Switches -> Right Click on VDS-Temp -> Add and Manage Hosts
2. Choose Add Hosts -> Next -> New Hosts -> Choose all hosts that are on the source VDS
3. Select the “Configure identical network settings on multiple hosts (template mode) -> Next
4. Select one host as a template -> Next
5. Deselect Manage “VMKernel adapters (template mode)”, you should only have “Manage physical adapters (template mode)” -> Next
6. Click on one of the vmnics (I like to move vmnic1) -> Assign uplink -> Choose an uplink (I choose Uplink 2 for consistency)
7. Now click “Apply to All” -> Next -> Next -> Finish
Migrate all of the VMkernels to the VDS-temp portgroups (wizard)
1. Distributed Switches -> Right Click on VDS-Temp -> Add and Manage Hosts
2. Choose Manage host networking -> Next -> Attached Hosts -> Choose all hosts that are on the VDS-temp -> OK -> Next
  1. Note that I do not use the template mode for migrating the VMkernels using template mode forces you to enter in all of the IP addresses of the management kernels
3. Deselect “Manage Physical adapters” -> Next
4. Click on each vmkernel and assign the source port group from VDS-temp -> Next -> Next -> Finish
Flip all of the portgroups for all of the VMs to use the portgroup from the VDS-temp (I used a script for this, unfortunately I can’t share it since I didn’t write it)
Migrate the last vmnic to the VDS-temp uplink (this is probably not necessary but I do it anyways)
Now create the lag on the VDS
1. Click on VDS -> Manage -> Settings -> LACP -> “Plus sign”
2. Name your lag, select your ports, mode, and load balancing mode.
  1. I selected Active since my Network team has their ports set to Passive
  2. Be sure that your hashing algorithm matches with the ports
Modify all of the distributed port groups teaming/failover to have the lag as active and all other links as unused (wizard)
1. Distributed Switches -> Right Click on VDS -> Manage Distributed Port Groups
2. Select Teaming and failover -> Next
3. Select all of the port groups to modify (in my case all of them) -> Next
4. Change failback to No (no need to failback in LACP, I don’t even know if it applies since it’s a LAG)
5. Move the LAG to the Active uplink and move all other uplinks to unused
Assign a single uplink from each host to the VDS lag (wizard)
Migrate all of the vmkernels to the VDS portgroups (wizard)
Flip all of the portgroups for all of the VMs to use the portgroup from the VDS
Migrate the last vmnic to the VDS lag uplink
Remove the host from VDS-temp
Remove VDS-temp from inventory

5 thoughts on “The Safest Way to Convert to LACP”

Pingback: To VDS and Back: Migrating a VDS Host to a New vCenter | Virtual Chris
Pingback: LACP Configuration in vSphere 6 – Virtual Reality
Carlos Chacon May 9, 2018

hi using LACP in vmware 5.5 and up will really increase speed -throughput? I mean let’s say I have 4 uplinks (4 nics) and I configure LACP at cisco side and I do also the vmware config

Will a VM have more than 1 GB speed link? usually the transfer speed is 100MB to 130MB but will I get almost 400MB when copying files o any transfer activity?

thanks

Reply ↓
1. Chris Post authorMay 9, 2018
  
  Short answer no. If you are talking one vm transferring a file to some other VM, then that session will be limited by 1 link. Lacp spreads sessions across the active uplinks through a hashing mechanism. At best multiple sessions will be spread across your 4 links but any one session will not be greater than the transfer speed of one uplink. If you are transferring files to multiple destinations, then it may be possible that the aggregate of all transfers across all links would be higher
  
  Reply ↓
  1. Carlos Chacon May 10, 2018
    
    thanks a lot just wanted to be sure my customer has 4 uplinks and LACP and see the VMXNET3 adapter saying 10GB speed and I just wanted to be sure since he says why I have 4 nics and LACP and no more than 120MB when copying or anything related with transferring files
    
    of course I said that to him and not believing since he still thinks why I am not getting the hole 4 GB channel is configured……
    
    yeah I said already you need 10GB uplinks then
    thanks a lot
    
    Reply ↓

The Safest Way to Convert to LACP

Related

5 thoughts on “The Safest Way to Convert to LACP”

Leave a Reply Cancel reply