Site Recovery Manager with Multiple NFS addresses

By | July 12, 2014

I have been working on a project at work to implement Site Recovery Manager (SRM) within our environment. We use primarily Netapp storage and our currently converting to cluster mode/CMODE/clustered data ontap/CDOT.

Netapp has a great document that describes some best practices.

One of the best practices for clustered data on tap serving NFS is to assign an IP address/LIF to each volume so that the single volume can be moved along with its LIF to another node for load balancing. If the LIF and the volume are not on the same node, then the esxi host accesses the LIF which then access the volume through the cluster interconnect switch.

In the SRA you can either define IP addresses for SRM to use or let the SRA auto discover (preferred). For auto discovery to work, the LIF must only have NFS as a protocol, otherwise you must hardcode every IP address

So how does SRM treat multiple IPs? The answer gets murky depending on how you set up the SRA initially.

Array Manager with fully Discovered or Defined IPs
When you first add the Array manager via the SRA, the SRA runs a discovery to find what “Storageports”‘(nfs IPs) are available. If you hard coded addresses it will skip those that you did not define. If you auto discovered addressed, it will use all addresses found that only use NFS.

After the discovery has run, SRM takes the ip addresses at the protected side and makes a static mapping to the ip addresses at the recovery side.

EX: 2 IP addresses available on protected and recovery side
Protected Datastore1 – 10.1.1.11 -> Recovery Datastore1- 10.1.2.21
Protected Datastore2 – 10.1.1.22 -> Recovery Datastore2 -10.1.2.22

So when you do a test recovery or any other failover, datastore1 will mount on 10.1.2.21 and datastore2 will mount on 10.1.2.22 every time.

Array Manager Missing IP

If for some reason the discover devices did not pick up all if the IPs or you did not hard code all of the IPs into the recovery side array manager, then the mappings are made to the addresses known and they are PERMANENT unless you remove the array manager. If you try to correct the addresses in the array manager, it will not be fix the existing mappings, it will only use the new IP(s) for new mapings.

EX: 10.1.2.21 was the only address available (either discovered or hard coded) in the recovery side array manager
Protected Datastore1 – 10.1.1.11 -> Recovery Datastore1- 10.1.2.21
Protected Datastore2 – 10.1.1.22 -> Recovery Datastore2 -10.1.2.21

In this example 10.1.2.21 was either the only address discovered in the recovery array manager or the only one hard coded. Both datastore1 and datastore2 will mount using 10.1.2.21

Optimal mappings/Direct path
This behavior can result in suboptimal access to the volumes via the interconnect LIF. I am not a storage person, but my storage team estimates that it may be a 10% hit. I have asked VMware for a feature request to allow for optimal pathing. This could possibly happen manually or by updating the SRA spec to allow for the designation of “optimal” LIFs to use.

EDIT 7/21/2014:
In the last example I showed an N:1 IP address mapping, which is a valid way you may want to handle IPs. For example, for DR we did not want to provision an IP address/LIF per recovery volume, just one per node in the C-MODE cluster (this N:1). We were hoping that SRM would be able to determine the IP/LIF that was on the same node as the volume/datastore that was being recovered. A problem that you may run into is if you want to do a failback, I don’t believe that SRM will demux your mappings (1:N). You may not failback to the same (optimized) IP address that was used previously. You also run into the same issue as before where a misconfiguration leaves an IP address unused. As part of my feature request I also asked for the option to failback to the same ip addresses (I’d say most important for disaster avoidance or workload migration scenarios where you want to do a failback).

Conclusions

Site Recovery  Manager with Netapp C-MODE storage is a great combination, but I am looking forward to when all of the features of C-MODE can be utilized by SRM. Scaling SRM while maintaining application level recovery plans is very difficult with ABR. My current design is based on bulk failover, so I will failover whatever SRM let me enroll (current soft limit of 1500 VMs with ESXi 5.5u1 and a specific patch).I am very interested in how VVOLs will play into SRM, I believe for Netapp each VVOL will be volume, so if there are individual replication policies per volume that would greatly improve the ability to design Recovery Plans. Hopefully there is an option to put VVOLs into a consistency group for replication purposes.

Note, next post I will probably detail how you can hack into the SRM database to update the IP address mapping. Totally NOT SUPPORTED but a fun exercise.

5 thoughts on “Site Recovery Manager with Multiple NFS addresses

  1. Pingback: Site Recovery Manager IP Address Hacking | Virtual Chris

  2. mark

    in all your time playing with cdot and SRm , you come across the follwoing error msg

    Unable to export the NAS device
    Ensure that the correct export rules are specified in the ontap_config.txt file.

    and since the documentation does not say you have to add info to the ontap_config.txt file . its driving me nuts

    Reply
    1. Chris Post author

      I don’t think I ran into that, but do you think it’s a permissions issue? Srm will modify the export policy directly for each host. Earlier versions may have used a txt file

      Reply
  3. twodot0h

    4 years later but I have this exact requirement, whereby we failover datastores and they are not detecting the NFS lif local to the node which is serving the volume being recovered. We are using pretty much up to date version of everything and it still doesn’t seem possible. Had any further joy with it?

    Reply
    1. Chris Post author

      No, I gave up trying. My storage team said that they were concerned about the local lif but then stopped being concerned for some reason (I think Netapp changed its documentation?).

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *