[ILUG] Thorny networking problem

Niall O Broin niall at linux.ie
Wed Mar 6 10:12:44 GMT 2002


I have a very hairy one to start your morning off. I have three boxes, my
deskunder, my notebook, and a big box of disks, all of which are running
SuSE 7.3 with kernel 2.4.10. Let's call the boxes A, B, and C in the
aforementioned order. A is an NFS server and B can mount a filesystem from
it quite happily, though when stracing the mount, for reasons which will
become apparent, I see the following

mount("bagend:/export", "/mnt", "nfs", 0xc0ed0000, 0x805a560) = -1 \
ENOSYS (Function not implemented)
mount("bagend:/export", "/mnt", "nfs", 0xc0ed0000, 0x805a560) = -1 \
ENOSYS (Function not implemented)
mount("bagend:/export", "/mnt", "nfs", 0xc0ed0000, 0x805a560) = 0

In /var/log/messages on A I see two identical log messages like this

Mar  6 08:56:23 bagend rpc.mountd: authenticated mount request from
192.168.1.3:602 for /export (/export)

one immediately after the other 

Anyone hazard a guess as to why the first two mount calls fail, and the
third succeeds, when they are apparently identical ?

That, however, is NOT the hairy one, which is that C takes a long time (XXX)
to mount the filesystem. The mount attempt just hangs and can't be
interrupted with ^C. Stracing it shows that it proceeds like the mount from
B but only gets to

mount("bagend:/export", "/mnt", "nfs", 0xc0ed0000, 0x805a560) = -1 \
ENOSYS (Function not implemented)
mount("bagend:/export", "/mnt", "nfs", 0xc0ed0000, 0x805a560

when it waits. I was quite convinced that it had hung but eventually during
one attempt I had to do something else so I left it for some time and when I
came back it had finished. Again there were two log messages as above on A
but instead of being at the same time, they were nearly 10 minutes apart.

Anyone care to hazard a guess as to what might be going on there ?

Networking between the boxes seems fine BTW - 124 µsec average ping time
with a flood ping with 0% loss.

I just rebooted the box (q.v.) and did the NFS mount again, this time
with time - it took 5 minutes, and there were no log messages on box A
- it gets weirder.

But the weirdness doesn't stop there, oh no. When I was having
problems with NFS I really needed to transfer files from A to C so I
decided to use scp instead - bad idea ! I cannot ssh to this machine
at all. When I do, whether from A or from C itself, the machine hangs
up tight and the CapsLock and ScrollLock lights on the keyboard start
to flash. Figure that one out !

I also have a RH7.1 installation on this box with 2.4.6 and it behaves
normally - NFS mounts are instantaneous and ssh works fine, so there
shouldn't be a fundamental hardware oddity.

OK - that's it - chew on that with your morning coffee.



Niall




More information about the ILUG mailing list