[KLUG Members] help with lockup problem on debian sarge

Richard Harding rick at ricksweb.info
Thu Sep 1 08:09:05 EDT 2005


On Mon, 2005-08-29 at 11:59 -0400, Adam Tauno Williams wrote:
> > > > The last two days at different times (3am Sun and 6am Mon) my sarge
> > > > email server has just gone unresponsive. No video, ssh, anything. For
> > > > all purposes the machine is off, but it is still powered. I cannot seem
> > > > to find any reason in the logs for it. All appears well and then nothing
> > > > reported until a hard reboot. 
> > > Do you have syslog marking enabled?  Do you see any messages after the
> > > last mark?
> > I have -- Mark -- items in /var/log/messages. There are messages in
> > syslog after the last mark. For today the last mark was 6:06 and the
> > last syslog item is 6:25.
> 
> Try increasing your mark frequency to once a minute until you have the
> problem worked out.  Then you know if time elapsed after the last entry
> before the frotz.
> 
> You should be able to use a linux PC and hook the two together with a
> null modem cable and run minicom (a terminal emulator included in most
> LINUX distros).  Then google for 'serial console howto'.  and it should
> walk you through the steps.

I actually got messages at the terminal this time. I left the machine
logged in. The error is:
APIC error on CPU1: 04 (04) -repeated continuously on the machine
screen. 

It seems if you get this error in dmesg and occurs on both CPUs then
it's a hardware problem typically with the mobo and a firmware upgrade
can fix it. Since my end the machine stays up for a day or two and then
bombs and has been running with no issues from last Nov. until now I'm
assuming I have a CPU problem with CPU1. 

At least I think I know where to go from here. First I have to get the
services off the box onto another so I can take this thing down and work
on it. 

Thanks for the help. 

Rick 



More information about the Members mailing list