NFS, How I Loathe Thee!

Keywords: #nfs

NFS. The very bane of my existence. Well maybe not quite, but it sure makes my life hell. You can’t narrow down problem/runaway areas because there’s no granularity to what tiny statistics gathering it can/does do. There’s no authentication, and little to no access control.

Please, don’t start on NFSv4. Linux can’t even support NFSv3 hardly, and the 2.6 kernel is worse than 2.4. Atleast 2.4 doesn’t completely lose track of locks and locking the way 2.6 clients do. They completely forget about cleaning up after NFS locks somewhat randomly. Don’t believe me?


web4:~# uname -an
Linux web4 2.6.18-5-amd64 #1 SMP Tue Oct 2 20:37:02 UTC 2007 x86_64 GNU/Linux
web4:~# cat /proc/locks
1: FLOCK  ADVISORY  WRITE 25347 00:33:6591864 0 EOF
2: POSIX  ADVISORY  READ  21417 00:23:3323129 0 426
3: POSIX  ADVISORY  READ  12981 00:23:3323129 0 426
4: POSIX  ADVISORY  READ  14067 00:23:3323129 0 426
5: POSIX  ADVISORY  WRITE 15061 08:09:48938 0 9223372036854775806
6: POSIX  ADVISORY  WRITE 15061 08:09:48936 0 9223372036854775806
7: POSIX  ADVISORY  WRITE 15061 08:09:48934 0 9223372036854775806
8: POSIX  ADVISORY  WRITE 15061 08:09:48932 0 9223372036854775806
9: POSIX  ADVISORY  WRITE 15061 08:09:48929 0 9223372036854775806
10: POSIX  ADVISORY  WRITE 15061 08:09:48895 0 9223372036854775806
11: POSIX  ADVISORY  WRITE 15061 08:09:48893 0 9223372036854775806
12: FLOCK  ADVISORY  WRITE 14354 08:08:402423 0 EOF
13: FLOCK  ADVISORY  WRITE 3263 08:08:96619 0 EOF
14: POSIX  ADVISORY  WRITE 2628 08:08:322011 1024 2047
web4:~# stat /proc/14354
stat: cannot stat `/proc/14354': No such file or directory
web4:~# cat /proc/locks
1: FLOCK  ADVISORY  WRITE 25464 00:24:1986016 0 EOF
2: FLOCK  ADVISORY  WRITE 25463 00:23:106101 0 EOF
3: POSIX  ADVISORY  READ  21417 00:23:3323129 0 426
4: POSIX  ADVISORY  READ  12981 00:23:3323129 0 426
5: POSIX  ADVISORY  READ  14067 00:23:3323129 0 426
6: POSIX  ADVISORY  WRITE 15061 08:09:48938 0 9223372036854775806
7: POSIX  ADVISORY  WRITE 15061 08:09:48936 0 9223372036854775806
8: POSIX  ADVISORY  WRITE 15061 08:09:48934 0 9223372036854775806
9: POSIX  ADVISORY  WRITE 15061 08:09:48932 0 9223372036854775806
10: POSIX  ADVISORY  WRITE 15061 08:09:48929 0 9223372036854775806
11: POSIX  ADVISORY  WRITE 15061 08:09:48895 0 9223372036854775806
12: POSIX  ADVISORY  WRITE 15061 08:09:48893 0 9223372036854775806
13: FLOCK  ADVISORY  WRITE 14354 08:08:402423 0 EOF
14: FLOCK  ADVISORY  WRITE 3263 08:08:96619 0 EOF
15: POSIX  ADVISORY  WRITE 2628 08:08:322011 1024 2047
web4:~# 

Yeah, PID 14354 is LONG GONE but still holding a lock. Yes I know they’re advisory, but 2.4 doesn’t have a problem cleaning up and getting rid of locks when processes die. Back with 2.6.8 we tried to convert our NFS Server from 2.4->2.6, and that was a complete disaster. The 2.6.8 NFS Server would just randomly spit out permission denied errors to clients under load. We never tried again. NFS isn’t where it’s at for us. We’re moving to AFS. AFS isn’t without pitfalls either mind you. And we’ll have to do a LOT of work to get there including some custom code, most notably to support device inodes. There’s also a pretty bad bug in the Linux AFS client that we’re hoping to see fixed soon.