[KLUG Members] IBM ServeRAID problems

Mike Slack members@kalamazoolinux.org
Tue, 18 Dec 2001 08:48:14 -0800


This is a response to a response I got about 3 weeks ago.  I've had no time to look into the details until very recently.

I checked /proc/pci and /proc/interrupts and noticed that the onboard scsi controller and the ServeRAID controller share an IRQ (this is after various fiddlings with the BIOS):

$ cat /proc/interrupts
           CPU0       
  0:    4718026          XT-PIC  timer
  1:       1821          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  5:          0          XT-PIC  es1370
  6:         43          XT-PIC  floppy
  8:     143799          XT-PIC  rtc
 10:      18285          XT-PIC  eth0
 11:      55523          XT-PIC  aic7xxx, ips
 12:       4969          XT-PIC  PS/2 Mouse
NMI:          0 
ERR:          0

(It doesn't look like there are any memory address conflicts of any kind).  I read somewhere that a shared IRQ might cause hangs (even though in theory it shouldn't; the problems I am having always occur during I/O intensive operations like tape backups or opening large files, etc.).  The root partition is mounted on a disk on the onboard controller, and all other mounts (including /home) are on the ips controller.  Most peripherals (including tape, CDROM) are on the onboard controller.  So it looks to me like this is a good candidate for the source of the problems (they always occur when both controllers are doing intensive operations).  Unfortunately, I haven't been able to test my theory, since I can't seem to set separate IRQs for aic7xxx and ips no matter what I do.  I can set the ips IRQ from the BIOS, but whatever I set it to, aic7xxx follows (or doesn't get loaded at all).  Am I missing something here?  Or is this even something I should be trying?  Any other suggestions?

Adam Williams (awilliam@whitemice.org) wrote:
> >I am getting some strange behavior from my IBM ServeRAID card.  Every once in a while the box locks up and filesystems on the raid channels become essentially inaccessible.  It seems to happen most frequently when I am using the DAT drive (also SCSI, but on a different, on-board SCSI channel), or starting up VMWare while other disk I/O intensive things are happening.
> >Has anyone seen this kind of problem before? 
> 
> I have serveral IPS controllers.  
> 
> >Is this more likely a hardware or (ips) driver problem?  Or something else, 
> >like a SCSI termination problem?
> 
> Every IPS problem I've has has boiled down to hardware.  Are you using
> an IBM system?  I've noticed they conflict with other devices pretty
> frequently on non-IBM systems (most of the systems I have them in). 
> Otherwise they are great cards.
> 
> >Here is some system info:
> >RH 7.2, Kernel 2.4.9-13
> >$ cat /proc/scsi/ips/1
> >IBM ServeRAID General Information:
> >         Controller Type                   : ServeRAID
> >         IO region                         : 0xe400 (256 bytes)
> 
> Be absolutely certain no other device is camping in this address range
> (0xe400 - 0xe400 + 256 bytes).  Nose around in /proc/ioports
> 
> e400-e4ff : IBM ServeRAID-3x
>   e400-e4ff : ips
> 
> This has accounted for almost all my IPS problems.  They take that
> address no matter what.
> 
> >         Memory region                     : 0xde000 (8192 bytes)
> >         Shared memory address             : 0xc00de000
> >         IRQ number                        : 10
> 
> Same goes for this, make sure nothing else is using it.  If you have ISA
> devices and your BIOS supports resource reservation use it so the IPS
> doesn't land on any of the IRQ's used by ISA devices.  
> 
> >         BIOS Version                      : 4.80.26
> >         Firmware Version                  : 2.25.01
> 
> These both look pretty current.
> 
> > Host: scsi1 Channel: 00 Id: 02 Lun: 00
> >   Vendor: IBM      Model: SERVERAID        Rev: 1.00
> >   Type:   Direct-Access                    ANSI SCSI revision: 02
> > Host: scsi1 Channel: 00 Id: 15 Lun: 00
> >   Vendor: IBM      Model: SERVERAID        Rev: 1.00
> >   Type:   Processor                        ANSI SCSI revision: 02
> 
> >Yep, this all looks about the same as mine.
> >An excerpt from /var/log/messages:
> > Nov 28 18:50:50 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> > Nov 28 18:50:50 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed
> > Nov 28 18:50:52 linus sshd(pam_unix)[2923]: session closed for user bhettiger
> > Nov 28 18:55:35 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> > Nov 28 18:55:35 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed
> > Nov 28 18:58:32 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> > Nov 28 18:58:32 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed
> 
> Would make me wonder if the SCSI bus the tape drive is on (not the IPS)
> is properly terminated or is having cable-ing problems.
> 
> >Nov 28 19:09:41 linus kernel: (ips0) ips_issue val [0x101a].
> >Nov 28 19:09:41 linus kernel: (ips0) ips_issue semaphore chk timeout.
> >Nov 28 19:09:41 linus kernel: (ips0) ips_issue val [0x101a].
> >Nov 28 19:09:41 linus kernel: (ips0) ips_issue semaphore chk timeout.
> 
> I've never seen these.  Is the IPS in a bus master capable slot?  They
> upchuck in odd ways if they aren't.  Did you upgrade the kernel
> recently?
> 
> > Nov 28 19:09:41 linus kernel: SCSI disk error : host 1 channel 0 id 0 lun 0 return code = 70000
> > Nov 28 19:09:41 linus kernel:  I/O error: dev 08:21, sector 0
> 
> Do you have the ipssend utility installed?  Can you interrogate the
> array status?
> 
> _______________________________________________
> Members mailing list
> Members@kalamazoolinux.org
> 

Mike

-- 
Mike Slack
mike@slacking.org
--
"If we knew what it was we were doing, it wouldn't
be called research, would it?" --Albert Einstein