[KLUG Members] Data transformation

Adam Tauno Williams awilliam at whitemice.org
Sun Sep 25 21:15:09 EDT 2005


>>> ....
>>> >> I am taking data from a SQL database that is long...
>>> >> User_id Field_id Field_value
>>> >> 1        1         data1
>>> >> 1        2         data2
>>> ....
>>> >> 2        2         data2
>>> >> 2        3         data3
>>> ....
>>> >> What I need to do is export it but into a wide file so that it 
>>> looks like
>>> >> the following
>>> >> 1, data1, data2, data3, data4
>>> >> 2, data1, data2, data3, data4
>> If this db were normalized it would be much easier.
> Sure, and maybe (with respect to the initial design and intentions) it is.

Right, I don't think we see enough of the problem to determine if this 
database
is normalized or not.  The 'User_id,Field_id,Field_value' structure seems like
a pretty common and adequetly normalized structure to me (assuming the primary
key is user_id+field_id and user_id and field_id are foreign keys to other
tables).

>>> I have solved this problem in a number of environments, inclusing SQL-based
>>> data systems....
>> I'd agree with Bob here; which bugs me to no end :-).

Mostly likely true.  But sometimes it is either (a) too much bother to 
introduce
a third agent to get a solution or (b) just doesn't work out to the solutions
advantage.  If the host end is dramatically more powerful than the client end
it just makes sense to do as much work there,  or the dataset is very 
large and
the transfer time to the client would be unreasonable (especially true if WANs
like frame-relay is involved).

My experience is that it is simply a fact that managing the data in the 
RDBMS is
almost always faster than doing so on the client side if the dataset is of any
significant size;  PHP is really slow for large arrays and you slam 
into memory
session limits that you want to have in effect for security reasons, 
and writing
good fast code in C requires someone who really knows what they are doing.

> Well, yeah, except there are some useful generalities to be observed here.
> I see the initial table as a transaction file, and now Andrew is 
> being asked to
> produce columnwise (or feildwise) classificationss for some reason. 
> Actually, it would seem that the out is not normalized,

Yea, it is cwazy what those pesky users ask for!  :)

> I can more easily believe  the input may be normalized for the 
> initial purpose of the table/database. However, SQL doesn't handle 
> this very well, but a program (and it could be PHP,  but doesn't have 
> to be) that calls a RDBMS/SQL interface layer may very well be the 
> best
> way to go. Other tasks for the non-SQL code would be SQL code generation and
> managed execution. There are some data lanaguages that essentially 
> abstract the problem into higher-level commands, and that's really 
> helpful when such generality is nessesary.

BIE does some neat stuff since all selects from the database are in XML 
and you
can then do anything an XML transform can do.  And XML transform tools keep
getter faster and faster.  It lets you get the data and mangle it with
essentially no programming.

-- 
Adam Tauno Williams - http://www.whitemice.org



More information about the Members mailing list