I have been having “discussions” with our network team regarding the network design for our new Datacenter. The networking team was pushing LACP hard, I was pushing back. My reasons for not doing it:
- Almost every online resource I had found had said LACP was difficult to get right end-to-end
- VMware has found that most customers use LBT over LACP (based on meeting with VMware, though some support folks have seen LACP)
- Not recommended for multi-nic vMotion (requires two independent links) or iSCSI port binding
- LACP configuration settings possibly not present in host profiles (I believe this is nonsense actually, all of the LACP configuration is at the VDS level so there would be nothing to maintain at the host profile level)
- This datacenter would be different from our other datacenters and would be a one-off
The network teams reasons for implementing LACP:
- Improve link utilization of NFS
- Will not overload 5k interlink (networking has a straight through design )
- Improve Nic utilization monitoring via SNMP (since it’s straight through, each nic will only report to a single port on the 5k)
- Simplified setup for network operations to manage
- Faster MAC relearning for virtual fabric path (microseconds vs 2-3 seconds)
Long story short, I tried to implement LBT but had major issues due to the network design that was already put in place. Testing with LACP was perfect, but this meant I had to reconfigure everything to LACP.
The VMware kb article mentions a three step process to change a vds uplinks, hosts, and portgroups to LACP. Basically the steps VMware lists are:
- Create the lag
- Assign all of the distributed port groups to use the lag uplink as standby
- Assign the host uplinks to the lag
- Assign all of the distributed port groups to use the lag as active and the other links as unused
In my testing I have had issues right after I completed step 3. I’m not sure if you have to do one at a time or both at a time, but either way it starts to get really weird. Half the things work, half the things don’t. It’s basically classic portchannel issues. Since the host uplinks are using the lag, the hashing algorithim is in effect. Since the distributed portgroups still have the lag as standby, those portgroups will use one link or the other which doesn’t seem to work with the lag. This is how I reasoned it in my head, in reality it may have been some other issue. Regardless, I have developed my own series of steps that seem to work for me. YMMV, so always, ALWAYS test on a small scale with your own setup.
- Backup the VDS
- Import the VDS (we will call this VDS-temp)
- Add all of the hosts to VDS-temp and assign one vmnic to the VDS-temp uplink (wizard)
- Distributed Switches -> Right Click on VDS-Temp -> Add and Manage Hosts
- Choose Add Hosts -> Next -> New Hosts -> Choose all hosts that are on the source VDS
- Select the “Configure identical network settings on multiple hosts (template mode) -> Next
- Select one host as a template -> Next
- Deselect Manage “VMKernel adapters (template mode)”, you should only have “Manage physical adapters (template mode)” -> Next
- Click on one of the vmnics (I like to move vmnic1) -> Assign uplink -> Choose an uplink (I choose Uplink 2 for consistency)
- Now click “Apply to All” -> Next -> Next -> Finish
- Distributed Switches -> Right Click on VDS-Temp -> Add and Manage Hosts
- Migrate all of the VMkernels to the VDS-temp portgroups (wizard)
- Distributed Switches -> Right Click on VDS-Temp -> Add and Manage Hosts
- Choose Manage host networking -> Next -> Attached Hosts -> Choose all hosts that are on the VDS-temp -> OK -> Next
- Note that I do not use the template mode for migrating the VMkernels using template mode forces you to enter in all of the IP addresses of the management kernels
- Deselect “Manage Physical adapters” -> Next
- Click on each vmkernel and assign the source port group from VDS-temp -> Next -> Next -> Finish
- Flip all of the portgroups for all of the VMs to use the portgroup from the VDS-temp (I used a script for this, unfortunately I can’t share it since I didn’t write it)
- Migrate the last vmnic to the VDS-temp uplink (this is probably not necessary but I do it anyways)
- Now create the lag on the VDS
- Modify all of the distributed port groups teaming/failover to have the lag as active and all other links as unused (wizard)
- Distributed Switches -> Right Click on VDS -> Manage Distributed Port Groups
- Select Teaming and failover -> Next
- Select all of the port groups to modify (in my case all of them) -> Next
- Change failback to No (no need to failback in LACP, I don’t even know if it applies since it’s a LAG)
- Move the LAG to the Active uplink and move all other uplinks to unused
- Distributed Switches -> Right Click on VDS -> Manage Distributed Port Groups
- Assign a single uplink from each host to the VDS lag (wizard)
- Migrate all of the vmkernels to the VDS portgroups (wizard)
- Flip all of the portgroups for all of the VMs to use the portgroup from the VDS
- Migrate the last vmnic to the VDS lag uplink
- Remove the host from VDS-temp
- Remove VDS-temp from inventory
Pingback: To VDS and Back: Migrating a VDS Host to a New vCenter | Virtual Chris
Pingback: LACP Configuration in vSphere 6 – Virtual Reality
hi using LACP in vmware 5.5 and up will really increase speed -throughput? I mean let’s say I have 4 uplinks (4 nics) and I configure LACP at cisco side and I do also the vmware config
Will a VM have more than 1 GB speed link? usually the transfer speed is 100MB to 130MB but will I get almost 400MB when copying files o any transfer activity?
thanks
Short answer no. If you are talking one vm transferring a file to some other VM, then that session will be limited by 1 link. Lacp spreads sessions across the active uplinks through a hashing mechanism. At best multiple sessions will be spread across your 4 links but any one session will not be greater than the transfer speed of one uplink. If you are transferring files to multiple destinations, then it may be possible that the aggregate of all transfers across all links would be higher
thanks a lot just wanted to be sure my customer has 4 uplinks and LACP and see the VMXNET3 adapter saying 10GB speed and I just wanted to be sure since he says why I have 4 nics and LACP and no more than 120MB when copying or anything related with transferring files
of course I said that to him and not believing since he still thinks why I am not getting the hole 4 GB channel is configured……
yeah I said already you need 10GB uplinks then
thanks a lot