[KLUG Members] Data transformation

Adam Tauno Williams awilliam at whitemice.org
Sat Sep 24 12:42:54 EDT 2005


>> This will be used for both Postgres databases and MS SQL
>> databases.. my main question is.. Can this be done with a SQL
>> query or will the data have to be exported Long and then run
>> through a program/script
>> there are some fields with Nulls..
> It can be done with an SQL query but it's a little grungy and might
> run extremely slowly on your system.

If you run this on both PostgreSQL and M$-SQL it will be interesting to 
see the
time results - assuming similiar hardware and size-of-dataset.  
PostgreSQL join
performance isn't bad,  and PostgreSQL 8 is better than 7,  but 
compared to both
Informix and DB2 it is pretty darn slow (both of which seem more-or-less
unphased by even the wierdest join).  So I'd be curious how M$-SQL does.

> mysql> select f.user_id, f1.field_value as id_1_value, f2.field_value 
> as id_2_value, f3.field_value as id_3_value, f4.field_value as 
> id_4_value from foo as f left join foo as f1 on (f.user_id=f1.user_id 
> and f1.field_id=1) left join foo as f2 on (f.user_id=f2.user_id and 
> f2.field_id=2) left join foo as f3 on (f.user_id=f3.user_id and 
> f3.field_id=3) left join foo as f4 on (f.user_id=f4.user_id and 
> f4.field_id=4) group by f.user_id;

Another thing that might be useful is to "CREATE TEMP TABLE {foo} AS 
..." with a
SELECT statement like the above.  Then you can noodle around inside foo 
all you
want, and the table will automatically be dropped when the session disconnects
(or you can drop it youself during the session).  At least in earlier versions
of PostgreSQL writes to temporary tables were not included in thw WAL
(write-ahead-log) so performance creating temp tables was pretty good (faster
than loading data into a "normal" table).  In Informix you can do a "WITH NO
LOG" when creating a temp table to get the same behaviour.  M$-SQL almost
certainly provides something equivalent.  Another thing that might work with
M$-SQL is to do a "IN {dbspace)" or the equivalent,  to generate the temporary
table on another drive or dataspace as this will let the I/O of the
search/reads and the writes go in parrallel (I have no idea how big your
dataset is,  but this is very helpful if it is approaching huge).

-- 
Adam Tauno Williams - http://www.whitemice.org



More information about the Members mailing list