[KLUG Members] Any thoughts on network(WAN) problem?

Adam Tauno Williams members@kalamazoolinux.org
Wed, 21 Nov 2001 09:38:22 -0500 (EST)


Configuration A:
<HUB>---<Cisco 2511>---64k PtP---
---<Cisco 2511>---<Switch>

Configuration B:
<HUB>---<Notel BCM>----1.54Mb Frame---
---<Nortel Passport 2430>---<Switch>

Both A & B are between the same two sites,  the HUB is at the remote site and
the switch is at the "host" site.  Both A & B use OSPF routing protocol,  so if
I take down B,  routing tables automatically update to circuit A.  On the switch
end are RS/6000 AIX 4.2.1, Redhat Linux 7.0 and 7.1 servers, a Linux based
firewall, and a Citrix Winframe server (NT 3.51).  Both ends have Linux and
Microsoft clients (Win9x and WinY2k, no WinNT).  

The remote site has been running on circuit A for ***years*** with no problems
what so ever with roughly the same client mix.  Remote site uses HTTP, HTTPs,
LDAP, LPD, TELNET, SSH, SMB/CIFS, WINS, ICA, DNS, and NFS/NIS (portmap) provided
at the host site.  While slow circuit A is 99.999% reliable.

Bringing up circuit B (by doing an admin shutdown of A at the router port),  the
routing tables propogate, etc... Connectivity is virutally uninterrupeted. 
However on circuit B, while users can telnet and login from WinXX clients and
NIS clients basically no other protocol is operational.  HTTP requests receive
the initial ACK and no subsequent data,  SMB connections time out (resource no
longer available).  And while telnet works it infrequently freezes during screan
repaints,  and never resumes.  The RS/6000 (as a host) seems to fair somewhat
better than either the Winframe server or the Linux hosts,  but connectivity is
still a little spotty.  With the Citrix Winframe server enough traffic passes to
display the window, but no contents,  and the connection times out
(eventually).  SSH connections suffer the same problem as telnet,  freezing up
when burst of data come down.  

However pinging and traceroute (ICMP) claims that circuit B is near perfect
(1~2% packet loss,  which could just be packets lost when the ping command is
canceled).  I have done a ping with various payload sized (default, 512, 768,
1028, 1124) and all return near perfect, even when run for over an hour.  Ping
floods (send as fast as yout can) also almost all return,  or as many as you
would expect to survive over a bandwidth constrained circuit.

I have captured (with ethereal) traffic on both subnets and don't see anything
unusual,  packets just seem to "get lost".

Any thoughts or suggestions would be appreciated.

Systems and Network Administrator
Morrison Industries
1825 Monroe Ave NW
Grand Rapids, MI. 49505