Debugging and Modifying Netapp CMODE SRA for Kicks and Giggles

By | June 15, 2015

I’m not sure who enjoys using the Perl debugger on Windows (certainly not I), but the process is still interesting.

The Situation:

So you are adding an array manager for a Netapp CMODE array, you hit click “Enable” but then you get an error.

SRA command ‘discoverDevices’ failed. Storage system configuration error ensure that the storage system has proper configuration of FCP, iSCSI, and storage IP.

At first you may think that you did not set the array ip address correctly, but you did.

After diving into the vmware-dr.log you see this:

2015-06-08T17:08:07.160+01:00 [04188 info ‘SraCommand’ opID=30f39891] discoverDevices’s stdout:
–>
–> 08-06-2015T17:08:05 NetApp FAS/V-Series Storage Replication Adapter 2.1 for clustered Data ONTAP Build Date 11-11-2013
–> 08-06-2015T17:08:05 Discover Devices Started
–> 08-06-2015T17:08:05 1
–> 08-06-2015T17:08:06 Version in filer is 1 21
–> 08-06-2015T17:08:06 Collecting storage IP address for SVMNAME-1
–> 08-06-2015T17:08:06 Collecting storage IP address failed
–> 08-06-2015T17:08:06 Collecting world wide node name
–> 08-06-2015T17:08:06 entry doesn’t exist
–> 08-06-2015T17:08:06 Collecting iscsi port list
–> 08-06-2015T17:08:07 entry doesn’t exist
–> 08-06-2015T17:08:07 Collecting fcp port list
–> 08-06-2015T17:08:07 fcp ,iscsi and storage ip are not configured
–> 08-06-2015T17:08:07 DiscoverDevices completed with errors
–> 08-06-2015T17:08:07 Generate autosupport event in filer
–> 08-06-2015T17:08:07 Autosupport Event is disabled
2015-06-08T17:08:07.160+01:00 [04188 verbose ‘SraCommand’ opID=30f39891] Stopped listening for updates to file ‘C:\Users\MYUSER\AppData\Local\Temp\vmware-MYUSER\sra-status-97-211’
2015-06-08T17:08:07.160+01:00 [04188 verbose ‘SraCommand’ opID=30f39891] Cancelling SRA command timeout
2015-06-08T17:08:07.160+01:00 [04188 info ‘SraCommand’ opID=30f39891] discoverDevices exited with exit code 0
2015-06-08T17:08:07.160+01:00 [04188 verbose ‘SraCommand’ opID=30f39891] discoverDevices responded with:
–> <?xml version=”1.0″ encoding=”UTF-8″?>
–> <Response xmlns=”http://www.vmware.com/srm/sra/v2″>
–> <Error code=”1006″ />
–> </Response>
–>
2015-06-08T17:08:07.160+01:00 [04188 verbose ‘Storage’ opID=30f39891] XML validation succeeded
2015-06-08T17:08:07.160+01:00 [04188 error ‘Storage’ opID=30f39891] SRA command discoverDevices failed: (dr.storage.fault.LocalizableAdapterFault) {
–> dynamicType = <unset>,
–> faultCause = (vmodl.MethodFault) null,
–> code = “70d7e5fb-4684-49cc-b099-e8f0386c17cb.1006″,
–> msg = ”

It still looks like something is wrong with the IP address of the SRA, but that doesn’t make sense.

Not able to look at the Netapp config directly (another team manages the storage), you are left with no other option but to debug the SRA.

The Netapp SRA is a set of Perl scripts that take in XML and return XML, they are typically located in C:\program files\vmware vcenter site recovery manager\sra\cmode\bin. The command.pl script is run, which is a launcher script that will take the xml and determine the next script to run.

To start, go back to the vmware-dr.log and look for the line right after “Input for discoverDevices”:

Put the text into notepad, remove the “–>”, enter in the array username and password, save as temp.xml

Prior to running the Perl debugger, open, the discoverDevices.pl, preferably in a editor that has line numbers.

To use the Perl debugger, you need to add the “-d” flag when launching the perl script:

“C:\Program Files\VMware\VMware vCenter Site Recovery Manager\external\perl-5.14.4\bin\perl.exe” -d  “C:\Program Files\VMware\VMware vCenter Site Recovery Manager\storage\sra\CMODE_ONTAP\discoverDevices.pl” < temp.xml

 

I’m not going to get too deep into how to use the built-in Perl debugger, you can google that.

I will say that the Perl debugger brings me back to my college days, using GDB with C and C++. It wasn’t fun then and it wasn’t fun now. Basically like any debugger, you can set breakpoints so that you can ‘forward’ to specific sections of code that are probably having issues (using ‘b’ for breakpoint at a specific line number and ‘c’ for continue to next breakpoint). Once you get to the section of code, you want to use ‘s’ to step into a function or ‘n’ to step over it. At this point you are using ‘p’ to print out variables to validate the input and how the script is handling it. In GDB you can also use ‘w’ to watch variables (consistently print a variable) to see how they change after each line in the script.

Having looked at the code previously, I focuses on lines 2798-2812

 

I printed out $protocol and $isOnlyNFS and as the script looped through the data LIFS, it appeared that there was an additional data protocol defined on these LIFS (CIFS in particular).

I remember this issue before and it was an error on the storage admin. The problem though is that remediating this issue is not as simply as turning off CIFS on the LIF, I believe the LIF has to be taken offline (at least from what I’m told). I COULD edit the discoverDevices.pl code and change this:

Leave a Reply

Your email address will not be published. Required fields are marked *