[KLUG Members] help with lockup problem on debian sarge
Richard Harding
rick at ricksweb.info
Thu Sep 1 08:09:05 EDT 2005
On Mon, 2005-08-29 at 11:59 -0400, Adam Tauno Williams wrote:
> > > > The last two days at different times (3am Sun and 6am Mon) my sarge
> > > > email server has just gone unresponsive. No video, ssh, anything. For
> > > > all purposes the machine is off, but it is still powered. I cannot seem
> > > > to find any reason in the logs for it. All appears well and then nothing
> > > > reported until a hard reboot.
> > > Do you have syslog marking enabled? Do you see any messages after the
> > > last mark?
> > I have -- Mark -- items in /var/log/messages. There are messages in
> > syslog after the last mark. For today the last mark was 6:06 and the
> > last syslog item is 6:25.
>
> Try increasing your mark frequency to once a minute until you have the
> problem worked out. Then you know if time elapsed after the last entry
> before the frotz.
>
> You should be able to use a linux PC and hook the two together with a
> null modem cable and run minicom (a terminal emulator included in most
> LINUX distros). Then google for 'serial console howto'. and it should
> walk you through the steps.
I actually got messages at the terminal this time. I left the machine
logged in. The error is:
APIC error on CPU1: 04 (04) -repeated continuously on the machine
screen.
It seems if you get this error in dmesg and occurs on both CPUs then
it's a hardware problem typically with the mobo and a firmware upgrade
can fix it. Since my end the machine stays up for a day or two and then
bombs and has been running with no issues from last Nov. until now I'm
assuming I have a CPU problem with CPU1.
At least I think I know where to go from here. First I have to get the
services off the box onto another so I can take this thing down and work
on it.
Thanks for the help.
Rick
More information about the Members
mailing list