[KLUG Members] Squid setup recommendation

Bruce Smith members@kalamazoolinux.org
Mon, 15 Dec 2003 10:21:19 -0500


> I'm putting together a new caching box for work. Squid is high on the list, and
> I've been charged with putting together a box for testing that, if the test goes
> well, will probably be put into production.
> 
> The hardware: HP ProLiant DL360, 2 GB RAM, dual 2.something gig Xeon CPU's, dual
> 72 GB 10k SCSI drives. I can hardware RAID the disks, but I'm not sure I want to
> given the massive amount of disk activity this box is destined for.
> 
> The people: anywhere from 500 to 2,000 concurrent users, with the potential for
> up to 5,000+ in the event of a news event like 9/11.

Wow.  You are about 20+ TIMES more users than my squid server, so my
answers may or may not apply.   (YMMV :)

> I'm planning to use SuSE 9 with squid transparently. I think I can handle squid
> and the other little packages that we intend to mix in with it (already tested
> on a smaller scale), but I'm not sure about sizing the partitions. Is one file
> system better than another for caching? How many partitions? How big? Should I
> mirror the drives? I need the best performance with just a dash of fault
> tolerance. :) 

For filesystems, I'd try ReiserFS first since it's good at handling
small files, which is what squid does a lot of.

Size/partitions, give yourself as much room as possible for cache.
I can't help you with optimal sizing for something that large.

> The config of the box will be backed up frequently in case it
> needs to be rebuilt.

I don't know why the config will change very often, once you have it up
and running well.  Everything but the cache should stay fairly static
(and you don't need to backup the cache).  So, I don't know of a reason
for periodic backups.

> Are there any squid configuration parms that I should be aware for a deployment
> of this size? Any "gotchas" to look out for? Any on-going administrative
> bummers? Cool tools for administration? I'd like to run the package that comes
> with SuSE and can be updated with the provided tools, but I can compile and
> install from source if necessary. Any arguments in favor of one over the other?

You definitely need to optimize the config.  Read through the squid
config (lots of comments) and tune it to your hardware (sizes, etc.).
I'm sure you can do better manually than a general GUI config.
(you could use the GUI config output as a starting point)

Also tune your Linux box for optimal performance.  Stuff like mounting
partitions with noatime parameters, etc.  See Adam's performance tuning
presentation on the KLUG site for all kinds of goodies like that.

As far as hardware RAID, if configured correctly it can speed up disk
access (like if you stripe your drives).  It might be worth considering.

One thing to watch out for, is your log files are going to grow REALLY
LARGE, REALLY FAST.  Make sure you have available disk space in that
partition, rotate/nuke them often, etc.

For that many users, you might want to look into multiple squid servers
(peers).  Something I've never had to worry about here!  :-)

It wouldn't hurt to ask your questions on the squid mailing list.  You
may run into someone who runs a huge site like yours who could be a lot
more help.

 - BS