[KLUG Members] IBM ServeRAID problems

Adam Williams members@kalamazoolinux.org
29 Nov 2001 05:39:20 -0500


>I am getting some strange behavior from my IBM ServeRAID card.  Every once in a while the box locks up and filesystems on the raid channels become essentially inaccessible.  It seems to happen most frequently when I am using the DAT drive (also SCSI, but on a different, on-board SCSI channel), or starting up VMWare while other disk I/O intensive things are happening.
>Has anyone seen this kind of problem before? 

I have serveral IPS controllers.  

>Is this more likely a hardware or (ips) driver problem?  Or something else, 
>like a SCSI termination problem?

Every IPS problem I've has has boiled down to hardware.  Are you using
an IBM system?  I've noticed they conflict with other devices pretty
frequently on non-IBM systems (most of the systems I have them in). 
Otherwise they are great cards.

>Here is some system info:
>RH 7.2, Kernel 2.4.9-13
>$ cat /proc/scsi/ips/1
>IBM ServeRAID General Information:
>         Controller Type                   : ServeRAID
>         IO region                         : 0xe400 (256 bytes)

Be absolutely certain no other device is camping in this address range
(0xe400 - 0xe400 + 256 bytes).  Nose around in /proc/ioports

e400-e4ff : IBM ServeRAID-3x
  e400-e4ff : ips

This has accounted for almost all my IPS problems.  They take that
address no matter what.

>         Memory region                     : 0xde000 (8192 bytes)
>         Shared memory address             : 0xc00de000
>         IRQ number                        : 10

Same goes for this, make sure nothing else is using it.  If you have ISA
devices and your BIOS supports resource reservation use it so the IPS
doesn't land on any of the IRQ's used by ISA devices.  

>         BIOS Version                      : 4.80.26
>         Firmware Version                  : 2.25.01

These both look pretty current.

> Host: scsi1 Channel: 00 Id: 02 Lun: 00
>   Vendor: IBM      Model: SERVERAID        Rev: 1.00
>   Type:   Direct-Access                    ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 15 Lun: 00
>   Vendor: IBM      Model: SERVERAID        Rev: 1.00
>   Type:   Processor                        ANSI SCSI revision: 02

>Yep, this all looks about the same as mine.
>An excerpt from /var/log/messages:
> Nov 28 18:50:50 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> Nov 28 18:50:50 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed
> Nov 28 18:50:52 linus sshd(pam_unix)[2923]: session closed for user bhettiger
> Nov 28 18:55:35 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> Nov 28 18:55:35 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed
> Nov 28 18:58:32 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> Nov 28 18:58:32 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed

Would make me wonder if the SCSI bus the tape drive is on (not the IPS)
is properly terminated or is having cable-ing problems.

>Nov 28 19:09:41 linus kernel: (ips0) ips_issue val [0x101a].
>Nov 28 19:09:41 linus kernel: (ips0) ips_issue semaphore chk timeout.
>Nov 28 19:09:41 linus kernel: (ips0) ips_issue val [0x101a].
>Nov 28 19:09:41 linus kernel: (ips0) ips_issue semaphore chk timeout.

I've never seen these.  Is the IPS in a bus master capable slot?  They
upchuck in odd ways if they aren't.  Did you upgrade the kernel
recently?

> Nov 28 19:09:41 linus kernel: SCSI disk error : host 1 channel 0 id 0 lun 0 return code = 70000
> Nov 28 19:09:41 linus kernel:  I/O error: dev 08:21, sector 0

Do you have the ipssend utility installed?  Can you interrogate the
array status?