[KLUG Members] Attempting to fix a server

Adam Bultman adamb at glaven.org
Wed Aug 25 17:17:31 EDT 2004


Last week, I had a server tip over on me.   I believe that the SCSI card 
had pretty much up and died, and as a result we had some filesystem 
corruption, and the inability to boot.

At the time, the system would 'start' to boot, and then reboot when it 
started getting too far, and start over again.

Since then, the SCSI card has been replaced, and we are working on 
mopping things up.  So far, I've fixed the filesystems, made sure that 
things are bootable, and been trying to get things to boot normally.

At one point, upon login, it would fail when you tried to log in - the 
error was something to the tune of 'unable to open the password store'.  
However, since then the error has gone away, and now, the system will 
boot most of the way, then fail. It either will boot to a login prompt, 
but fail any attempts at login, will start booting and freeze with a 
getpwnam error, or get kinda confused on bootup, depending on the kernel 
you choose.

The only thing I can get it to do with any regularity is to boot into 
single user mode.  in single, I can check the filesystems, make sure 
things are all there, and I've removed any scripts that I don't need 
running that have been causing problems (nfs, nfs mounts in fstab, 
network) but I can't seem to crack this nut.  I can't figure out why it 
will boot, but not have a list of usernames - and kind of fail 
silently.  I'm guessing that my /etc/passwd is tanked, and that I'll 
need to rebuild it somehow - does that sound like a reasonable guess?

The server's purpose has since been usurped by another, but I'd like to 
find out what went wrong on this thing, so at least I know what to check 
for in the future.

Adam


More information about the Members mailing list