of our file servers (running 3.99.11) has started locking up every couple
of days. It gets into a state where any process will run fine until it tries
to access the disk at which point it stops responding. Updating the kernel
to a current from a couple of weeks ago makes no difference.
I've had zero luck in tracking down whats causing this.
However I set up an external-mode watchdog to panic the machine if a loop of
sleep 20; ls -l /a/local/directory /dev/null ; wdogctl -t
failed to tickle the watchdog for a minute, so I now have a core dump from
such a panic. I'd like some suggestions on what to look for/how to poke at
this core dump to try to find whats happening.
cheers
mark