nfs服务器异常

故障表现

        近期发现只要是挂载nfs的服务器,不定期的出现服务器卡死,发现是在ansible自动化发布的时候出现一直卡死,然后登录服务器端发现发现命令不能用,如: ls、df等命令无法正常使用。在客户端查看系统日志没有任何错误。查看系统资源,资源利用率也足够。

故障解决

nfs服务端

        登录nfs服务端查看系统的日志发现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Dec 23 09:01:01 dev-nfs systemd: Removed slice User Slice of root.
Dec 23 09:01:01 dev-nfs systemd: Stopping User Slice of root.
Dec 23 09:21:26 dev-nfs systemd: Created slice User Slice of root.
Dec 23 09:21:26 dev-nfs systemd: Starting User Slice of root.
Dec 23 09:21:26 dev-nfs systemd: Started Session 12836 of user root.
Dec 23 09:21:26 dev-nfs systemd-logind: New session 12836 of user root.
Dec 23 09:21:26 dev-nfs systemd: Starting Session 12836 of user root.
Dec 23 09:22:01 dev-nfs systemd: Stopping RPC bind service...
Dec 23 09:22:01 dev-nfs systemd: Starting RPC bind service...
Dec 23 09:22:01 dev-nfs systemd: Started RPC bind service.
Dec 23 09:22:11 dev-nfs systemd: Started Session 12837 of user root.
Dec 23 09:22:11 dev-nfs systemd-logind: New session 12837 of user root.
Dec 23 09:22:11 dev-nfs systemd: Starting Session 12837 of user root.
Dec 23 09:22:12 dev-nfs systemd: Stopping NFS server and services...
Dec 23 09:22:12 dev-nfs kernel: nfsd: last server has exited, flushing export cache
Dec 23 09:22:12 dev-nfs systemd: Stopping NFS Mount Daemon...
Dec 23 09:22:12 dev-nfs systemd: Stopping NFSv4 ID-name mapping service...
Dec 23 09:22:12 dev-nfs rpc.mountd[29810]: Caught signal 15, un-registering and exiting.
Dec 23 09:22:12 dev-nfs systemd: Starting Preprocess NFS configuration...
Dec 23 09:22:12 dev-nfs systemd: Started Preprocess NFS configuration.
Dec 23 09:22:12 dev-nfs systemd: Starting NFSv4 ID-name mapping service...
Dec 23 09:22:12 dev-nfs systemd: Starting NFS Mount Daemon...
Dec 23 09:22:12 dev-nfs systemd: Started NFSv4 ID-name mapping service.
Dec 23 09:22:12 dev-nfs systemd: Started NFS Mount Daemon.
Dec 23 09:22:12 dev-nfs systemd: Starting NFS server and services...
Dec 23 09:22:12 dev-nfs rpc.mountd[31339]: Version 1.3.0 starting
Dec 23 09:22:12 dev-nfs kernel: NFSD: starting 90-second grace period (net ffffffff81ad9d40)

        发现nfs的服务端出现 kernel内核的异常,于是乎登录google得知。发现nfs线程数不够了,提示要增加一些数量的threads。

当前nfs状态
1
2
3
4
5
6
7
8
9
10
11
# cat /proc/net/rpc/nfsd 
rc 0 0 30463198
fh 0 0 0 0 0
io 1447382326 1456801108
th 8 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ra 32 0 0 0 0 0 0 0 0 0 0 0
net 30463199 0 30485174 266
rpc 30463198 1 1 0 0
proc3 22 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc4 2 70 30463126
proc4ops 72 0 0 0 570540 491739 28774 6320 0 191550 6463291 255991 0 0 0 0 177217 0 0 523775 0 0 133 6853388 0 230 203547 27936 0 8180 209 0 0 209 0 311026 0 0 0 2126786 0 0 0 266 14491135 185 0 0 0 0 0 0 0 26 15971485 0 3 0 46 204 0 0 0 0 0 0 0 0 0 0 0 0 0
查看线程数
1
2
# cat /proc/fs/nfsd/threads
8

        原来nfs默认启动了8个thread,应该是不够了,可以手动修改增加一些。修改nfs的默认线程数方式如下

修改nfs的默认线程数
1
2
3
# vim /etc/sysconfig/nfs
# The default is 8.
RPCNFSDCOUNT=32

        需要重启nfs

验证

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# cat /proc/net/rpc/nfsd
rc 0 0 30464433
fh 0 0 0 0 0
io 1447382326 1456801108
th 32 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ra 64 0 0 0 0 0 0 0 0 0 0 0
net 30464434 0 30486409 268
rpc 30464433 1 1 0 0
proc3 22 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc4 2 70 30464361
proc4ops 72 0 0 0 570540 491739 28774 6320 0 191550 6463293 255991 0 0 0 0 177217 0 0 523775 0 0 133 6853388 0 232 203547 27936 0 8180 209 0 0 209 0 311026 0 0 0 2126786 0 0 0 268 14492227 185 0 0 0 0 0 0 0 26 15971626 0 3 0 46 206 0 0 0 0 0 0 0 0 0 0 0 0 0

# cat /proc/fs/nfsd/threads
32

# cat /proc/fs/nfsd/pool_threads
32

# cat /proc/fs/nfsd/pool_stats
# pool packets-arrived sockets-enqueued threads-woken threads-timedout
0 485 4 481 0
坚持原创技术分享,您的支持将鼓励我继续创作!
0%