[KLUG Advocacy] Regarding data handling/programming effort

Robert G. Brown bob at whizdomsoft.com
Sat Oct 1 15:58:20 EDT 2005


On Mon, 26 Sep 2005 at almost 7:34 AM, Adam Tauno Williams wrote:

>> I don't understand what you're saying, exactly. Perhaps you're arguing in
>> favor of an "All-SQL" or an "All-in the DBMS" solution? Please clarify.

>No,  I'm not arguing for an All-* solution.  I'm saying that the
>appropriate solution for where to process data depends upon allot of
>variables (some external to the technology itself).
Sure. I recall being HANDED the technology "solution", and had to work in
that framework, like it, or not. That is fairly typical, I'm afraid. Getting
past this is one of the benefits of open standards, and portable code.

>Things like sub-selects and outer joins exist in SQL for a reason. I process
>data using a variety of methods -
Yes, I think everyone does. No one wants to use a sledgehammer to swat flies,
and a somewhat harder to use tools that is "better" in other ways (like 
performance, flexibility, lower costs over the lifecycle of the software) may
represent a good choice.


Anyway, let's go over yours....
>1.) PHP (aka LAMP, although with a proprietary database).  This is great
>for interactive access to small sets of data;  it is terrible for data
>processing.
More or less. However, a LOT of activity falls into the category.

>2.) "In SQL" - when you want to integrate with something like a report
>writer it is easiest to just make an 'insane' SQL as making a middleware
>layer would require a pile of code and weeks of man hours for
>development and testing.
Yeah, maybe i'm getting along in years (and experience), but the reporting 
requirements have to be pretty complex before I reseort to a report writer out 
of choice. Maybe it is my mix of cleitns; when a report gets complicated 
enough, they usually want/need to sort of customized processing at the block/
subtotal levels that have to be coded anyway, and that introduces another 
level of complexity. Sometimes it is just better to use one development tool 
for both, rather than one for the "vanilla" base reporting, and then another 
to write low-level API calls.

>And in the end it would be less flexible.
Especially when portability and scalability are part of "flexible".

>"In SQL" is also best when multiple clients of various types developed in
>different languages all need to process the same bit of information or
>when the dataset is very large and the clients are remote.
Right, we just "hoist" the stuff that was developed (or might have been 
developed) in each host language environment and make it part of the DBMS 
environment accesabile to each program at run-time. This is probably one of 
the main dirivng forces behind going to a DBMS in the first place.

>No matter how rigorous the documentation, different developers are going to
>introduce different subtleties in how they go about doing something -
Yes and it's often not economical to explicitly combat this. Some places try,
by having standards for EVERYTHING (they usually have other motivations for 
doing that, but this is often one reason given)... it either:

	a) fails
	b) reduces programmer productivity to about 1 statement a day
	c) both

Most often, c.

>which equals some suit having a piss fit and waving a report in your
>face.
The effective cure for this is communication, mmost often as directly between 
information consumer and software developer as possible. I get less of this 
when I can talk to the people who USE the stuff the software is grinding out; 
as there's less assuming to do... something most software developers (myself 
included) aren't very good at.

Well, a .400 average will get you into the Hall of Fame... if you're a 
baseball player. The same average as a programmer will get you into the shoe 
store .. as an employee.. if you're lucky, more probably on the unemployment 
line.

>3.) BIE.  This is good at data processing,  and doesn't do interactive
>much at all.  You can locate it near, or on, the database server and it
>does very well up to horrendously large datasets.
It SOUNDS like a DB server! :)

>The XML tools start to get wonky when datasets are really really large,...
Like almost EVERYTHING, 'cept maybe DB2 and Postgres, and the properly 
engineered application code above it.

>but they are better than even six months or a year ago, and easily multitudes
>faster then they were a couple of years ago. It also provides sort of a
>database-aware crond.
Still an "emerging" or "maturing" technology. The leading edge implementors 
have a learning curve to climb as well, and it may never be AS efficient as a 
binary RDBMS that is engineered, but as we've seen it doesn't have to be in 
order to be useful.

>4.) .NET - If the data transform is really complicated or just enormous
>I'll code a component in .NET [Mono].  As far as performance is
>concerned this leaves both BIE and PHP staggering around chocking on
>it's dust and wondering what just flew by.

Becuse most .NET stuff is older tools and technologies with more or less 
system independent, lightweight communications protocols to stick it all 
together. So you get the efficiencies and performance of compiled and 
engineered code and the components below each, and ease of integration 
and speed.

>But this means code with dependencies and code that has to be maintained
>which is overhead in man-hours,...
True of every production app, more so as the environment changes. This is why 
improved portability is well leveraged. The problem probably doesn't change a 
lot, but the world in which it's solved does.

>which are way more expensive than more-CPU.
Yes, and that's the driving force behind EVERY "high-level" language.

>And this doesn't take into account other variables...If I thought about 
>it long enough I'm certain I could come up with a dozen more.
Sure. Most of them are distractions.

>I agree with you and in a perfect world would almost always advocate a
>three tier model - {Data store}:{Data Processing}:{Data Presentation}.
I don't care about a "perfect world", but I do care when the distance between 
where I'm sitting and what I need to deliver the project on-time and on- or 
under-budget. When lack of the above (or anything else) stands in my way, I am 
not shy about letting interested parties know, and that we are taking some 
risks.

>But reality introduces a large amount of impediments to implementing a
>technologically perfect solution;  often times the software packages
>themselves introduce the impediments (bad performance, statelessness,
>limited protocols supported, etc...) even without the help of the
>suits.  
See almost any set of "requirements".

>So one needs to select the solution appropriate to the particulars of
>the situation and problem;  our friend who started this thread may have
>perfectly legitimate reasons for wanting to solve the problem in the
>manner he proposed.
He might, but we don't know. Lack of knowledge leads to assumption, which 
often leads us down a path which leads AWAY from where the original poster 
wants us to go. Again, I'll repeat the assertion that many of the command
sequences specified in answers would best be GENERATED by an application and
executed at once, rather than being the script itself. Such more general code 
would solve a whole family of problems, which the original one was a special 
case.

But it's an assertion, which has been very effective before. I don't know if 
it applies here.


>> >and writing good fast code in C requires someone who really knows what
>> >they are doing.
>> One of the reasons I earn the big bucks! :)
>Exactly!  Did the dealership get your Mazerati in yet?
Maserati...when I filled out all the forms, I got to spell it right! :)
Actually, all of 'em! :)

>> >> Well, yeah, except there are some useful generalities to be observed here.
>> >> I see the initial table as a transaction file, and now Andrew is 
>> >> being asked to
>> >> produce columnwise (or feildwise) classificationss for some reason. 
>> >> Actually, it would seem that the out is not normalized,
>> >Yea, it is cwazy what those pesky users ask for!  :)
>> Well, not SO cwazy! those sorts of classifications and crosstabs are a lot
>> more readable for people, result in better output from the POV of layout,
>> and may be needed by legacy systems.
>Gack.  Add "legacy systems" to all my points above.  I try to repress.
Probably a good impulse, but some legacy systems go on longer than we might 
like. I know of legacy systems that have had exit strategies for almost 20 
years now. They can't kill 'em!

>Sure.  But in most smallish businesses you aren't going to find these
>tools, or the expertise required to use them;  which may be unfortunate,
>but is still true.
This is why specialist like me exist. I come in, supply (either directly or 
through collaboration) with the high-powered stuff needed over a (relatively 
short) development cycle, and then leave. It's high income over a short time, 
which isn't much incremental expense over any period of time (when I'm not 
there most of the time).

On that bit of SHAMELESS self-promotion, I concude these responses.

							Regards,
							---> RGB <---



More information about the Advocacy mailing list