[KLUG Advocacy] Regarding data handling/programming effort

Adam Tauno Williams awilliam at whitemice.org
Sun Nov 13 12:39:27 EST 2005


> >> I don't understand what you're saying, exactly. Perhaps you're arguing in
> >> favor of an "All-SQL" or an "All-in the DBMS" solution? Please clarify.
> >No,  I'm not arguing for an All-* solution.  I'm saying that the
> >appropriate solution for where to process data depends upon allot of
> >variables (some external to the technology itself).
> Sure. I recall being HANDED the technology "solution", and had to work in
> that framework, like it, or not. That is fairly typical, I'm afraid. Getting
> past this is one of the benefits of open standards, and portable code.

Yep, Open Standards make everything much easier.

> Anyway, let's go over yours....
> >1.) PHP (aka LAMP, although with a proprietary database).  This is great
> >for interactive access to small sets of data;  it is terrible for data
> >processing.
> More or less. However, a LOT of activity falls into the category.

Absolutely;  but I've seen some attempts to LAMPify some pretty
horrendous stuff.  AJAX does make this a great deal easier for
interactive large datasets.  <aside> For instance a simple case:  We
have ~1,400 suppliers (we call them "vendors").  And we want to have a
user select the vendor as part of a web form?  You give them a drop down
box of 1,400+ entries? No.  You have then page back and forth through
the list of vendors (say 50 at a time) to find the vendor?  Ick.  You
have them "search" for the vendor by name?  This requires round-tripping
and reloading the page several times, which is (a) slow, (b) a terrible
user experience, and (c) you can get some natsty logic in the page (Did
the user just come into me?  Or did they come back as part of a search
for the vendor?  Or did they actually attempt to submit the form and the
vendor was selected?....).  And multiply (c) times hundreds or thousands
of pages. All these solutions stink. </aside>

> >2.) "In SQL" - when you want to integrate with something like a report
> >writer it is easiest to just make an 'insane' SQL as making a middleware
> >layer would require a pile of code and weeks of man hours for
> >development and testing.
> Yeah, maybe i'm getting along in years (and experience), but the reporting 
> requirements have to be pretty complex before I reseort to a report writer out 
> of choice. 

The drive to a report writer for us has been a desire to produce
high-quality printed reports, including things like logos, etc...  That
is a nuisance to program.

> Maybe it is my mix of cleitns; when a report gets complicated 
> enough, they usually want/need to sort of customized processing at the block/
> subtotal levels that have to be coded anyway, and that introduces another 
> level of complexity. 

Actually I've been really impressed with the modern report generator;
they pretty much let you insert coding just for the complicated bit.

> >"In SQL" is also best when multiple clients of various types developed in
> >different languages all need to process the same bit of information or
> >when the dataset is very large and the clients are remote.
> Right, we just "hoist" the stuff that was developed (or might have been 
> developed) in each host language environment and make it part of the DBMS 
> environment accesabile to each program at run-time. This is probably one of 
> the main dirivng forces behind going to a DBMS in the first place.

Yep.

> >No matter how rigorous the documentation, different developers are going to
> >introduce different subtleties in how they go about doing something -
> Yes and it's often not economical to explicitly combat this. 

And it is terribly exhausting to even try.

> Some places try,
> by having standards for EVERYTHING (they usually have other motivations for 
> doing that, but this is often one reason given)... it either:
> 	a) fails
> 	b) reduces programmer productivity to about 1 statement a day
> 	c) both

Yep.

> Most often, c.

Yep.

> >which equals some suit having a piss fit and waving a report in your
> >face.
> The effective cure for this is communication, mmost often as directly between 
> information consumer and software developer as possible. I get less of this 
> when I can talk to the people who USE the stuff the software is grinding out; 
> as there's less assuming to do... something most software developers (myself 
> included) aren't very good at.

Possibly because most of my experience is within one enterprise for
about a decade I'm more prone to say that PEOPLE generally stink at
communication.  Fortunately we have some department heads who "get it"
and are very precise and clear about requirements and desired
end-results.  But a great many people just don't do-details,  they'll
tell you about those dozen exceptions and edge-condition cases only
after they've seen the results and rejected it - although you asked them
many times.

> Well, a .400 average will get you into the Hall of Fame... if you're a 
> baseball player. The same average as a programmer will get you into the shoe 
> store .. as an employee.. if you're lucky, more probably on the unemployment 
> line.

Yep.

> >3.) BIE.  This is good at data processing,  and doesn't do interactive
> >much at all.  You can locate it near, or on, the database server and it
> >does very well up to horrendously large datasets.
> It SOUNDS like a DB server! :)

Pretty much, except that it adds the concept of a "workflow".

> >The XML tools start to get wonky when datasets are really really large,...
> Like almost EVERYTHING, 'cept maybe DB2 and Postgres, and the properly 
> engineered application code above it.

It is interesting the rapid evolution of things.  With the release of
PostgreSQL 8.1, other than the lack of real management tools, I can't
know come up with any features off the top of my head that it lacks and
than Informix/DB2 provides.  They've added IN/OUT parameters, roles, and
bit mapped scanning of indexes;  after only about eight months from the
last release.  Really amazing.

> >but they are better than even six months or a year ago, and easily multitudes
> >faster then they were a couple of years ago. It also provides sort of a
> >database-aware crond.
> Still an "emerging" or "maturing" technology. The leading edge implementors 
> have a learning curve to climb as well, and it may never be AS efficient as a 
> binary RDBMS that is engineered, but as we've seen it doesn't have to be in 
> order to be useful.

Yep, processor power is so stinkin' cheap now.  It is certainly cheaper
than the man-hours required for a 'code' solution.

> >4.) .NET - If the data transform is really complicated or just enormous
> >I'll code a component in .NET [Mono].  As far as performance is
> >concerned this leaves both BIE and PHP staggering around chocking on
> >it's dust and wondering what just flew by.
> Becuse most .NET stuff is older tools and technologies with more or less 
> system independent, lightweight communications protocols to stick it all 
> together. 

Yep.  XML processing in Mono-.NET is libxml2; which has been getting the
snot kicked out of its tires for a long time.

> So you get the efficiencies and performance of compiled and 
> engineered code and the components below each, and ease of integration 
> and speed.

"Awesome"

> >But this means code with dependencies and code that has to be maintained
> >which is overhead in man-hours,...
> True of every production app, more so as the environment changes. This is why 
> improved portability is well leveraged. The problem probably doesn't change a 
> lot, but the world in which it's solved does.
> >which are way more expensive than more-CPU.
> Yes, and that's the driving force behind EVERY "high-level" language.

And yet our software supplier still couldn't make an ERP system  written
in VB fly! :)

> >And this doesn't take into account other variables...If I thought about 
> >it long enough I'm certain I could come up with a dozen more.
> Sure. Most of them are distractions.
> >I agree with you and in a perfect world would almost always advocate a
> >three tier model - {Data store}:{Data Processing}:{Data Presentation}.
> I don't care about a "perfect world", but I do care when the distance between 
> where I'm sitting and what I need to deliver the project on-time and on- or 
> under-budget. When lack of the above (or anything else) stands in my way, I am 
> not shy about letting interested parties know, and that we are taking some 
> risks.

And the anticipated life-span of a solution matters in that trade-off
calculation too.  Although my experience has been that solutions
continue to role on long past that expected end-of-life.

> >> >> Well, yeah, except there are some useful generalities to be observed here.
> >> >> I see the initial table as a transaction file, and now Andrew is 
> >> >> being asked to
> >> >> produce columnwise (or feildwise) classificationss for some reason. 
> >> >> Actually, it would seem that the out is not normalized,
> >> >Yea, it is cwazy what those pesky users ask for!  :)
> >> Well, not SO cwazy! those sorts of classifications and crosstabs are a lot
> >> more readable for people, result in better output from the POV of layout,
> >> and may be needed by legacy systems.
> >Gack.  Add "legacy systems" to all my points above.  I try to repress.
> Probably a good impulse, but some legacy systems go on longer than we might 
> like. I know of legacy systems that have had exit strategies for almost 20 
> years now. They can't kill 'em!

So true.



More information about the Advocacy mailing list