[KLUG Members] MySQL -vs- Postgresql - speed.

Adam Williams members@kalamazoolinux.org
27 Oct 2002 08:51:18 -0500


>A friend of mine (not on this list) has been playing with databases.
>Influenced by certain people on this list, I recommended Postgresql.
>He wasn't happy with the performance and compared it to MySQL.
>This is a quote from an email he sent me:
>>Straight out of the box MySQL inserted the 20,000 records
>>into the database via a PERL script in 10 hours.
>>Postgresql took 40 hours!  Yikes!!

PostgreSQL is ACID/Transactional.  Loading 20,000 records in a single
transaction is going to be quite slow.  It is very much the wrong way to
do such a thing.  If for some reason the "upload" died at record 9,675 
what would he do?  Was the "last" record loaded completely in the table
or not?  What he wants is a bulk loader that pops in the records,
wrapped in transactions,  about a hundred at a time.  I do this
(loading/unloading >100,000 records at a swing) *EVERY* day, and doing
it brute-force is running a marathon with a loaded pistol in your
pocket.  Be prepared to spend an hour or so picking records out of your
data base so you can 'try again', after you figure out exactly where it
failed.

One trick is to create a record_id serial field on the table your
loading/unloading from.  Make not of the max(record_id) before your
start the load, then **ASSUMING** you are the only person writing to the
table, you can reduce the table to its former state with a delete from
blah where record_id > previous_max.  But thats only a hack, and
assuming a unique writer is pretty dangerous on a production system.

>>BTW... this is on a Dual P-III 800 Mhz CPU's w/2 GB of RAM
>>and a 3ware controller running two IBM 60'GB's IDE's in RAID-1.
>I've read that Postgresql may be a little slower than MySQL, but 
>that's not "a little slower", that's a LOT SLOWER!
>From Adam's performance tuning presentation, I recommended he start
>Postgresql with "-F" and a larger "-B".  (he hasn't reported his
>findings on that yet)  Is there anything else he can do to improve 
>Postgresql performance???  

-F will make an enourmous difference,  and may almost negate the speed
difference.  But because something is now "fast enough" still doesn't
mean it is bieng done the correct way.

>Also, a question came up about Postgresql's "vacuum".  From the docs:
>"VACUUM reclaims storage occupied by deleted tuples. In normal
>PostgreSQL operation, tuples that are DELETED or obsoleted by UPDATE are
>not physically removed from their table; they remain present until a
>VACUUM is done. Therefore it's necessary to do VACUUM  periodically,
>especially on frequently-updated tables."
>Does that mean _ALL_ updates to a row cause a deleted tuple?

No.  updates of primary key fields, pariticularly on tables with cluster
indexes can produce deleted tuples.  This is true of EVERY database I've
ever met, including $$,$$$ ones like Informix.  Otherwise such
operations are just too expensive.

"update statistics" serves the same purpose on other database,  although
to clean up empty records one often has to do a "rebuild index" or
adjust "distribution" depending upon the syntax of the particular
engine.

>Or only updates that meet a certain condition, like changing a 
> primary key (or something)?

Yep.