NFS服务器使用排错-FreeOA

NFS服务器使用排错

2013-10-20 16:25:28

阿炯

本文收集了在Linux下NFS服务器使用过程碰到的各种问题，以及相应的解决办法。
[nfs服务器下iptables设置]

[不能通过TCP连接的方式挂载NFS共享]

[配套服务没有开启而导致的启动失败]

[在多块网卡的机器让nfs侦听不同的网卡]

[nfs网络安全加固]

[nfs问题导致使用df时被挂起]

[nfs服务器下iptables设置]
NFS安装设置都比较简单，但在防火墙环境中的设置就需要注意了。
showmount -e 192.168.1.15
rpc mount export: RPC: Unable to receive; errno = No route to host

在nfs服务器端中iptabes里面开放了如下的端口111:tcp/udp 2049:tcp/udp，测试还是报错。关闭iptables服务，客户端就能够正常访问。

实际上还需要其它服务的端口需要发布，比如MOUNTD_PORT、 STATD_PORT、 LOCKD_TCPPORT、 LOCKD_UDPPORT，而上述服务运行的端口号又是在服务运行时随机产生的。解决的办法就是指定将上述服务的运行端口，然后在iptables中声明，其配置文件为 /etc/sysconfig/nfs。

修改后保存，重启相关服务
# /etc/init.d/portmap restart
# /etc/init.d/nfs restart

查看服务运行的相关端口情况
# rpcinfo -p 192.168.1.15

将会打印出相关服务和与之相应的端口，可在iptables中时行相关的设置。

如果只需要配置一个简单的nfs服务器的话，那只需要开放3个守护进程端口就行了：
111，portmap启动的端口，用来提供nfs端口
2049，NFS启动的端口，用来管理Client登入主机的权限
892，在配置文件里启用MOUNTD_PORT=892从而来手工指定文件存取服务端口，这个端口主要用来管理NFS文件系统权限的

[不能通过TCP连接的方式挂载NFS共享]
Can't mount NFS share over TCP
Both NFS client and server are fully up to date

I've set up an NFS export on the server:
/opt/nfs     10.1.1.0/24(rw,sync,no_root_squash,no_all_squash)

$ service rpcbind status
rpcbind (pid 20079) is running...

$ service nfslock status
rpc.statd (pid 19986) is running...

$ service nfs status
rpc.svcgssd is stopped
rpc.mountd (pid 20034) is running...
nfsd (pid 20031 20030 20029 20028 20027 20026 20025 20024) is running...

On the client, both rpcbind and nfslock report as running.

但在mount时，就会出现很长时间的等待直到超时
mount -t nfs 10.1.1.33:/opt/nfs /opt/test/nfs

但如果使用udp方式进行挂载，就不会有这个问题
mount -o udp -t nfs 10.1.1.33:/opt/nfs /opt/test/nfs

It instantly succeeds and the mount is accessible.

通过udp方式能挂载成功，下面是解决办法：

1、nfs服务器所使用的端口低于1024而可能出现安全问题，可以需要改用高端口
Turns out there was a "security" feature enabled on our PowerConnect switch that took offense to NFS SYN packets with source ports < 1024 (dos-control tcpflag). Suffice it to say, disabling the feature solved the issue.

2、设置selinux的权限
Although SELinux is permissive, try : setsebool -P nfs_export_all_rw 1 Restart rpcbind, nfs and nfslock And then exportfs -a

[配套服务没有开启而导致的启动失败]
# /etc/init.d/nfs start
启动 NFS 服务：                                            [确定]
启动 NFS mountd：                                          [失败]
启动 NFS 守护进程：rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
rpc.nfsd: unable to set any sockets for nfsd [失败]

是因为 rpcbind没有启动。
# service rpcbind start

以前是portmap，现在换成rpcbind 了。

[root@localhost init.d]# ./nfs restart
Shutting down NFS daemon:                                  [FAILED]
Shutting down NFS mountd:                                  [FAILED]
Shutting down NFS services:                                [ OK ]
Shutting down RPC idmapd:                                  [ OK ]
Starting NFS services:                                     [ OK ]
Starting NFS mountd:                                       [FAILED]
Starting NFS daemon: rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
rpc.nfsd: unable to set any sockets for nfsd [FAILED]

是依赖的服务没有启动，只需要先把它们启动即可。
service rpcbind restart
service nfs start

[在多块网卡的机器让nfs侦听不同的网卡]
即只有在某一网卡或网段上提供nfs服务。原以为修改其侦听的网口或ip就能实现，但没有成功
-n, --name ipaddr | hostname
Specifies the bind address used for RPC listener sockets. The ipaddr form can be expressed as either an IPv4 or an IPv6 presen-tation address. If this option is not specified, rpc.statd uses a wildcard address as the transport bind address.

这里只需要用iptables来对特定的网口拦截相应的端口：
iptables -A INPUT -i eth1 -p tcp --dport 111 -j DROP
iptables -A INPUT -i eth1 -p udp --dport 111 -j DROP

[nfs网络安全加固]
上面有使用iptables限制特定的网络接口，还可以使用它限制来源ip等。

另外可以借用 hosts.allow、hosts.deny这样的tcp wraper工具来对portmap服务进行访问控制。当然了，nfs内部也自建了一访问控制规则，对特定的ip段开发不同的目录及相应的权限。

参考来源
http://www.tldp.org/HOWTO/NFS-HOWTO/security.html

[nfs问题导致使用df时被挂起]

nfs服务端问题(多以网络通信或防火墙阻止)导致nfs客户端的访问或系统的根目录完全不能访问，具体表现为使用df命令或者涉及访问该目录的命令时会hang住挂起，通过strace去分析df命令的系统调用及信号情况，可以明显发现df是在系统调用尝试获取目录/var/nfs的stat信息时挂起。问题基本就在df在访问/mnt/nfs信息是hang住的，通过ps aux抓取系统运行的df进程信息(大量df状态为D+(无法中断的休眠状态,后台))。

通过umount -lf强制卸载文件系统，df恢复正常，通过killall df将大量的残余df进程中止后系统负载下降。

NFS shares hang with the following error(s) in /var/log/messages:
kernel: nfs: server <servername> not responding, still trying
kernel: nfs: server <servername> not responding, timed out

Linux下强制卸载NFS挂载点
umount -f -l /mnt/myfolder

Will sort of fix the problem:
-f Force unmount (in case of an unreachable NFS system). (Requires kernel 2.1.116 or later.)

-l Lazy unmount. Detach the filesystem from the filesystem hierarchy now, and cleanup all references to the filesystem as soon as it is not busy anymore. (Requires kernel 2.4.11 or later.)

mount hangs: nfs: server not responding, still trying

该文章最后由阿炯于 2019-11-20 22:36:49 更新，目前是第 2 版。