[KLUG Members] IBM ServeRAID problems
Mike Slack
members@kalamazoolinux.org
Tue, 18 Dec 2001 08:48:14 -0800
This is a response to a response I got about 3 weeks ago. I've had no time to look into the details until very recently.
I checked /proc/pci and /proc/interrupts and noticed that the onboard scsi controller and the ServeRAID controller share an IRQ (this is after various fiddlings with the BIOS):
$ cat /proc/interrupts
CPU0
0: 4718026 XT-PIC timer
1: 1821 XT-PIC keyboard
2: 0 XT-PIC cascade
5: 0 XT-PIC es1370
6: 43 XT-PIC floppy
8: 143799 XT-PIC rtc
10: 18285 XT-PIC eth0
11: 55523 XT-PIC aic7xxx, ips
12: 4969 XT-PIC PS/2 Mouse
NMI: 0
ERR: 0
(It doesn't look like there are any memory address conflicts of any kind). I read somewhere that a shared IRQ might cause hangs (even though in theory it shouldn't; the problems I am having always occur during I/O intensive operations like tape backups or opening large files, etc.). The root partition is mounted on a disk on the onboard controller, and all other mounts (including /home) are on the ips controller. Most peripherals (including tape, CDROM) are on the onboard controller. So it looks to me like this is a good candidate for the source of the problems (they always occur when both controllers are doing intensive operations). Unfortunately, I haven't been able to test my theory, since I can't seem to set separate IRQs for aic7xxx and ips no matter what I do. I can set the ips IRQ from the BIOS, but whatever I set it to, aic7xxx follows (or doesn't get loaded at all). Am I missing something here? Or is this even something I should be trying? Any other suggestions?
Adam Williams (awilliam@whitemice.org) wrote:
> >I am getting some strange behavior from my IBM ServeRAID card. Every once in a while the box locks up and filesystems on the raid channels become essentially inaccessible. It seems to happen most frequently when I am using the DAT drive (also SCSI, but on a different, on-board SCSI channel), or starting up VMWare while other disk I/O intensive things are happening.
> >Has anyone seen this kind of problem before?
>
> I have serveral IPS controllers.
>
> >Is this more likely a hardware or (ips) driver problem? Or something else,
> >like a SCSI termination problem?
>
> Every IPS problem I've has has boiled down to hardware. Are you using
> an IBM system? I've noticed they conflict with other devices pretty
> frequently on non-IBM systems (most of the systems I have them in).
> Otherwise they are great cards.
>
> >Here is some system info:
> >RH 7.2, Kernel 2.4.9-13
> >$ cat /proc/scsi/ips/1
> >IBM ServeRAID General Information:
> > Controller Type : ServeRAID
> > IO region : 0xe400 (256 bytes)
>
> Be absolutely certain no other device is camping in this address range
> (0xe400 - 0xe400 + 256 bytes). Nose around in /proc/ioports
>
> e400-e4ff : IBM ServeRAID-3x
> e400-e4ff : ips
>
> This has accounted for almost all my IPS problems. They take that
> address no matter what.
>
> > Memory region : 0xde000 (8192 bytes)
> > Shared memory address : 0xc00de000
> > IRQ number : 10
>
> Same goes for this, make sure nothing else is using it. If you have ISA
> devices and your BIOS supports resource reservation use it so the IPS
> doesn't land on any of the IRQ's used by ISA devices.
>
> > BIOS Version : 4.80.26
> > Firmware Version : 2.25.01
>
> These both look pretty current.
>
> > Host: scsi1 Channel: 00 Id: 02 Lun: 00
> > Vendor: IBM Model: SERVERAID Rev: 1.00
> > Type: Direct-Access ANSI SCSI revision: 02
> > Host: scsi1 Channel: 00 Id: 15 Lun: 00
> > Vendor: IBM Model: SERVERAID Rev: 1.00
> > Type: Processor ANSI SCSI revision: 02
>
> >Yep, this all looks about the same as mine.
> >An excerpt from /var/log/messages:
> > Nov 28 18:50:50 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> > Nov 28 18:50:50 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed
> > Nov 28 18:50:52 linus sshd(pam_unix)[2923]: session closed for user bhettiger
> > Nov 28 18:55:35 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> > Nov 28 18:55:35 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed
> > Nov 28 18:58:32 linus kernel: st0: Error with sense data: Current st09:00: sense key Unit Attention
> > Nov 28 18:58:32 linus kernel: Additional sense indicates Not ready to ready change,medium may have changed
>
> Would make me wonder if the SCSI bus the tape drive is on (not the IPS)
> is properly terminated or is having cable-ing problems.
>
> >Nov 28 19:09:41 linus kernel: (ips0) ips_issue val [0x101a].
> >Nov 28 19:09:41 linus kernel: (ips0) ips_issue semaphore chk timeout.
> >Nov 28 19:09:41 linus kernel: (ips0) ips_issue val [0x101a].
> >Nov 28 19:09:41 linus kernel: (ips0) ips_issue semaphore chk timeout.
>
> I've never seen these. Is the IPS in a bus master capable slot? They
> upchuck in odd ways if they aren't. Did you upgrade the kernel
> recently?
>
> > Nov 28 19:09:41 linus kernel: SCSI disk error : host 1 channel 0 id 0 lun 0 return code = 70000
> > Nov 28 19:09:41 linus kernel: I/O error: dev 08:21, sector 0
>
> Do you have the ipssend utility installed? Can you interrogate the
> array status?
>
> _______________________________________________
> Members mailing list
> Members@kalamazoolinux.org
>
Mike
--
Mike Slack
mike@slacking.org
--
"If we knew what it was we were doing, it wouldn't
be called research, would it?" --Albert Einstein