[KLUG Members] Data transformation

Robert G. Brown bob at whizdomsoft.com
Mon Sep 26 05:47:33 EDT 2005


On Sun, 25 Sep 2005 21:15:09 -0400, Adam Tauno Williams <awilliam at whitemice.org> wrote:

>>>> ....
>>>> >> I am taking data from a SQL database that is long...
>>>> >> User_id Field_id Field_value
>>>> >> 1        1         data1
>>>> >> 1        2         data2
>>>> ....
>>>> >> ...export it but into a wide file so that it 
>>>> looks like
>>>> >> the following
>>>> >> 1, data1, data2, data3, data4
>>>> >> 2, data1, data2, data3, data4
>>> If this db were normalized it would be much easier.
>> Sure, and maybe (with respect to the initial design and intentions) it is.
>
>Right, I don't think we see enough of the problem to determine if this 
>database
>is normalized or not.  The 'User_id,Field_id,Field_value' structure seems like
>a pretty common and adequetly normalized structure to me (assuming the primary
>key is user_id+field_id and user_id and field_id are foreign keys to other
>tables).
It is similar to a lot of transaction files... {some key},tranaction type,data

>>>>I have solved this problem in a number of environments, inclusing SQL-based
>>>> data systems....
>>> I'd agree with Bob here; which bugs me to no end :-).
>Mostly likely true.  But sometimes it is either (a) too much bother to 
>introduce a third agent to get a solution or (b) just doesn't work out 
>to the solutions advantage.
I don't understand what you're saying, exactly. Perhaps you're arguing in
favor of an "All-SQL" or an "All-in the DBMS" solution? Please clarify.

>If the host end is dramatically more powerful 
>than the client end it just makes sense to do as much work there,  or the 
>dataset is very large and the transfer time to the client would be 
>unreasonable (especially true if WANs like frame-relay is involved).
Almost always the case, as it turns out. generally the users only want to
see a representation of the result that is vanishingly small compared to
the data involved, and it is very, very common to process it all on the
server, or even somededicated background machine.

>My experience is that it is simply a fact that managing the data in the 
>RDBMS is almost always faster than doing so on the client side if the
>dataset is of any significant size;
Right.

>PHP is really slow for large arrays and you slam into memory
>session limits that you want to have in effect for security reasons, 
I wouldn't know what the performance limits on PHP are regarding array
size; the seriously large-scale data mungers I've written or been in
good contact with are written with other things.

>and writing good fast code in C requires someone who really knows what
>they are doing.
One of the reasons I eran the big bucks! :)

>> Well, yeah, except there are some useful generalities to be observed here.
>> I see the initial table as a transaction file, and now Andrew is 
>> being asked to
>> produce columnwise (or feildwise) classificationss for some reason. 
>> Actually, it would seem that the out is not normalized,
>Yea, it is cwazy what those pesky users ask for!  :)
Well, not SO cwazy! those sorts of classifications and crosstabs are a lot
more readable for people, result in better output from the POV of layout,
and may be needed by legacy systems.

>> I can more easily believe the input may be normalized for the 
>> initial purpose of the table/database. However, SQL doesn't handle 
>> this very well, but a program (and it could be PHP,  but doesn't have 
>> to be) that calls a RDBMS/SQL interface layer may very well be the 
>> best way to go. Other tasks for the non-SQL code would be SQL code 
>> generation and managed execution. There are some data lanaguages that 
>> essentially abstract the problem into higher-level commands, and that's 
>> really helpful when such generality is nessesary.

>BIE does some neat stuff since all selects from the database are in XML 
>and you can then do anything an XML transform can do.
Actually, I wasn't thinking of BIE; there are older tools and a variety
of (quasi-proprietary) database and data manipulation tools that reduce
this sort of problem tosingle commands, since the implementors understood
that these kinds of mappings were clearly needed by their audiences.

>And XML transform tools keep getter faster and faster.  It lets you get
>the data and mangle it with essentially no programming.
Yes, they do, but they have a long way to go to catch well-designed compiler
or optimized interpreter-based solutions that do intelligent stuff while
using the RDBMS as a meta-filesystem. The targeting and profiling of the 
data in the host language, coupled with the optimization work that has been
invested in the good RDBMS's is a tough combination to beat for the serious
data handling apps. The economics at this level also tend to make the "no
need for programming" argument moot.

							Regards,
							---> RGB <---


More information about the Members mailing list