Getting to grips with NFSv4.1 and Kerberos

Over the past few weeks, I’ve been looking to update some of our older white papers on core storage topics. One of the outdated papers was on NFS, and a lot had changed in this space since the paper was last updated. Most notably, was the introduction of support for NFS v41 in vSphere 6.0, along with Kerberos based authentication. In vSphere 6.5, we also added Kerberos integrity checking. I decided to have a go at configuring this in my own lab. Before going any further, I need to thank Justin Parisi of NetApp for this guidance through this setup. He’s even gone ahead and written up an excellent blog post describing the steps using the NetApp OnTap appliance. This should be the first place to go for guidance on how to do this setup. As Justin states in his post, setting this up is a PITA. You’ll see why soon. What follows are some of my own observations, trials and tribulations on trying to get this work in my own lab.

First off, let’s consider the lab. My environment consisted of:

  • Active Directory (AD) and DNS services running (in my lab, these were on MS Windows Server 2012R2)
  • 4 x ESXi hosts running ESXi v6.5, managed by vCenter Server v6.5
  • NetApp Simulator Appliance version 9.2.

Simple setup steps for ESXi hosts:

  • All 4 ESXi hosts and vCenter were in DNS, with forward and backward lookup resolving correctly.
  • All 4 ESXi hosts are joined to the AD domain.
  • Some Domain User credentials to add NFS Kerberos credentials on each of the ESXi hosts (in this case, these creds were a Domain Admin).

More advanced setup steps for ESXi hosts:

  • You now need to go into the Active Directory, select each ESXi host in turn, open their Properties, go to the Extensions section, click the Attribute Editor, and scroll down to the msDS-SupportedEncryptionTypes field. Edit this field, and provide the value of 24 (0x18). It should look something like this:

See what I mean about this being a PITA! Anyways, that is pretty much everything that needs to be done for the ESXi host side of things for the moment. Let’s turn our attention to the target side of things next. In my case, this is a NetApp OnTap Simulator (version 9.2).

Setup steps for NetApp simulator:

  • After initial deploy of the appliance, Control-C into the console, and select option 4 from the boot menu to claim/initialize disks.
  • 2 static IP addresses will be needed, one for management interface and the other for the cluster interface (even if you deploy a single node it seems).
  • Once complete, a web interface is available. Login to the UI by pointing a browser at the IP address of the node/cluster.
  • Add appropriate licenses, e.g. NFS.
  • Assign previously initialized disks to the node.
  • Create an aggregate using new disks (aggr1).
  • Setup NTP, DNS.
  • Create a new Storage Virtual Machine (SVM) with NFS protocol support enabled – this will require one or more data interfaces added.

With all of this in place, we are ready to go through the final few steps to support Kerberos based authentication for NFS v4.1 datastores.

A word of advice: At this point, create a volume with an export policy and verify that you can successfully mount this NFS v4.1 volume using AUTH_SYS authentication rather than Kerberos from your ESXi hosts. It would be worth validating your data paths and exports before trying any Kerberos related stuff and adding more complexity to the mix.

Kerberos setup steps on the NetApp simulator:

Step 1 is to setup the Kerberos realm. This basically mirrors my active directory configuration. I called it the same as my AD domain (rainpole.com) but used uppercase letters, so it is RAINPOLE.COM. Basically the setup simply involved adding details about the AD environment.

This next bit was the one that really had me confused. It is the Kerberos interface, essentially enabling Kerberos on the data paths of the SVM. Here is an example of one of my interfaces:

In my setup, my SVM had two data interfaces, netappc and netappd. These were both in DNS, will forward and reverse lookups. In this example, we are looking at interface netappc. The Kerberos Realm is RAINPOLE.COM, mentioned previously. Now the Service Principal Name takes the following format: nfs/<fqdn-of-my-interface>@Kerberos-Realm. Therefore my SPN is nfs/netappc.rainpole.com@RAINPOLE.com. The Admin username and password are only required for the enabling and disabling of Kerberos on the interface as this SPN is added to AD, as shown below (note the odd names that they take in AD):

The SPN can now be queried from AD using the following commands (thanks to Justin again for his help here).

PS C:\Users\Administrator> Get-ADComputer nfs-netappd-rai -Properties servicePrincipalName

DistinguishedName : CN=NFS-NETAPPD-RAI,CN=Computers,DC=rainpole,DC=com
DNSHostName : NFS-NETAPPD-RAI.RAINPOLE.COM
Enabled : True
Name : NFS-NETAPPD-RAI
ObjectClass : computer
ObjectGUID : 18454529-c6ef-4d93-bc33-99c6f2d830b8
SamAccountName : NFS-NETAPPD-RAI$
servicePrincipalName : {nfs/netappd.rainpole.com, nfs/nfs-netappd-rai.rainpole.com, nfs/NFS-NETAPPD-RAI,
 HOST/nfs-netappd-rai.rainpole.com...}
SID : S-1-5-21-1660322180-797832923-1225732573-5694
UserPrincipalName :

Note the servicePrincipalName line. Note that the first entry has netappd.rainpole.com, which is the FQDN of our interface. As long as these match up, you are good to go (I spun my wheels here for the longest time, trying to figure out what was correct and what was not). Again, note that these only appear in AD once the Kerberos interface are configured. Make sure these are visible and correct before going any further. You’ll have a unique entry for each interface.

The final piece of this setup is to do the same change that we did to the ESXi hosts previously. In Active Directory Administrative Centre, select each NetApp SPC in turn, open their Properties, go to the Extensions section, click the Attribute Editor, and scroll down to the msDS-SupportedEncryptionTypes field. Edit this field, and provide the value of 24 (0x18).

Exporting an NFS volume

This can be summarized in 3 steps:

  • Create a volume
  • Create an export policy
  • Assign the export policy to the namespace of the volume

The export policy is critical. This is where I had most difficult. You need to be aware of the first field, which is the Client Specification. It does not seem to like CIDR formats (other than 0.0.0.0/0) or hostnames/FQDNs. I spent ages figuring out why every time I tried to mount a volume, it failed as follows:

WARNING: NFS41: NFS41FSGetRootFH:4234: Lookup nas02_data_1 failed for volume vol2: Permission denied
WARNING: NFS41: NFS41FSCompleteMount:3762: NFS41FSGetRootFH failed: Permission denied
WARNING: NFS41: NFS41FSDoMount:4399: First attempt to mount the filesystem failed: Permission denied
WARNING: NFS41: NFS41_FSMount:4683: NFS41FSDoMount failed: Permission denied

Once I used the IP address of the ESXi hosts in the Client Specification, it all started to work as expected.

Setting up NFS users

The final part of the puzzle is the requirement to create some users on the NetApp. One of these is the SPN user called “nfs” and the other is the user we used on the ESXi side (“chogan”) to establish NFS Kerberos credentials. Interestingly, I seemed to be able to mount my NFS volumes without having the “nfs” user but I definitely needed the NFS Kerberos credentials user (“chogan”) created on the NetApp side. Without this user defined, I got the following when trying to mount NFS v41 volumes using Kerberos authentication:

WARNING: NFS41: NFS41FSWaitForCluster:3637: Failed to wait for the cluster to be located: Timeout
WARNING: NFS41: NFS41_FSMount:4683: NFS41FSDoMount failed: Timeout
StorageApdHandler: 1062: Freeing APD handle 0x430c89c16d70 []
StorageApdHandler: 1147: APD Handle freed!
WARNING: NFS41: NFS41_VSIMountSet:431: NFS41_FSMount failed: Timeout
.
.
WARNING: SunRPC: 742: Failed to send NULLPROC for xid 0x1e42b9be: RPC connection reset 0xe

This is also the failure I got when Kerberos was not configured on the SVM interfaces. So there is some behaviour here that I still need to figure out.

 

Checking status of NFS v41 with Kerberos from CLI

It is possible to tell the authentication type used to mount an NFS v41 volume from the CLI. The security column from the following command tells you. If the Security is SEC_KRB5, then Kerberos has been used. If it is AUTH_SYS, then it hasn’t used Kerberos and used the “normal” authentication mechanism. Ignore the hosts listing. As I mentioned, my SVM had two interfaces and I could mount my volumes on either netappc or netappd. Unfortunately there is no NFS v41 multipath support on the Netapp at this time, so I can’t do much with it. One final note – vol3 is using SEC_KRBI, Kerberos authentication and data integrity. The setup steps are the same.

[root@esxi-dell-e:~] esxcli storage nfs41 list
Volume Name  Host(s)          Share          Accessible  Mounted  Read-Only  Security    isPE  Hardware Acceleration
-----------  ---------------  -------------  ----------  -------  ---------  ---------  -----  ---------------------
vol4         netappc          vol4                 true     true      false  SEC_KRB5   false  Not Supported
vol3         netappd,netappc  /vol3                true     true      false  SEC_KRB5I  false  Not Supported
vol2         netappd,netappc  /nas02_data_1        true     true      false  AUTH_SYS   false  Not Supported
vol1         netappc          /nas01_data_1        true     true      false  SEC_KRB5   false  Not Supported
 

Conclusion

I hope you find this useful. As I said at the beginning, please go to Justin’s blog for more in-depth step-by-step instructions. I still have a few questions about how all of this hangs together, and some other weird behaviour that I’m seeing (probably some future blogs). Hopefully my own personal observations on what is involved in this setup will also be beneficial to you in some way.

4 Replies to “Getting to grips with NFSv4.1 and Kerberos”

  1. Thanks for your blog. I also followed Justin’s blog as was able to get NFS datastores mounted on ESXi 6.5 using KRB5i.

    I have an issue with cloning VMs that I’m wondering if you also see? I can create a VM on one of the NFS datastores, but if I try and clone one of the VMs (the destination can be either local storage, the same NFS datastore, or another NFS datastore) and the clone task fails every time with an error accessing the vmdk.

    The vpxd.log shows errors accessing the destination vmdk, stating “file not found”.

    Are you able to successfully clone VMs on NFS datastores using Kerberos 5? I can clone VMs when mounting using AUTH_SYS instead of Kerberos.

    1. Chris, I can create VMs, files and clone just fine on my datastore. However I am logged in with the same user credentials as those used for the NFS Kerberos credentials.

      One other thing to check – make sure that the volume you are using in the clone operation has “Configure UNIX credentials” ticked in the settings

  2. HI Cormac, thanks for the reply.

    I have been in contact with Justin about this also and whilst I’m not sure why, I now have this working. The fix has been to change the default export policy and also the policy for the ESXi volume and change superuser from “any” to “sys,kbr5,krb5i”.
    If I set superuser to any, which should cover all authentication methods, I get errors performing clones. Changing superuser to the defined list resolves the problem.
    This is a very strange issue, but I’ll take the win.
    Thanks,
    Chris

Comments are closed.