[KLUG Members] alternate redundancy v. cost

Jamie McCarthy jamie at mccarthy.vg
Sun Aug 1 19:54:55 EDT 2004


What is your access pattern on this data?

If you need fast and frequent reads of a relatively small (< 10 GB)
portion of your dataset, one possibility is to run memcached and
write an API that draws first from that, falling back on your
existing storage method if that fails.  And of course your API has
to do a "write-through" to both memcached and your existing store,
when data needs to be changed or inserted.

That allows you to expand pretty cheaply, offering very fast access
to a cache of however much RAM you can afford to devote to memcached.
That might mean buying a few cheap-ass boxes that you can stuff 2 GB
of RAM into, or maybe borrowing RAM and a few CPU cycles from a
number of existing boxes.

What that does is take load off your existing storage method, so you
no longer need to buy two or three of some mega-expensive box.

Of course if your data is > 10 GB and your access pattern includes
doing frequent scans across all of it, then caching a fraction of it
is not going to help you at all.  And if you need this provided over
a stock protocol like NFS or something, then this won't help you
either.  But if you're just talking about providing a few GB of data
or less, very quickly, to a lot of machines, and you control your
own client software, memcached is perfect...
-- 
  Jamie McCarthy
 http://mccarthy.vg/
  jamie at mccarthy.vg



More information about the Members mailing list