[KLUG Members] Re: Dynamic HTML/HTTP question

Robert G. Brown members@kalamazoolinux.org
Thu, 05 Dec 2002 10:42:20 -0500


John Holland <john@zoner.org> writes:
>What is the target language or environment?
I don't care. I'll write it in any language and worry about integration
later. Actually, I can glue things together with BASH scripts, piping,
puttingthings in intermediate files, etc. It doesn't have to be one
language, or even super fast (although I've written a lot of it in C
so far, and it's pretty damned fast right now).

>What are you trying to do with the data once retrieved?
There is no "try" here. There's only sucess. :)

I'm going to rip theresult to shreds and look for some data, probably 
sections of pages that are tables will get processed into entry items for 
another database, or will go into XML files, or will just get muched on by the 
next thing in the pipe, or maybe the next funvtion, for computation and 
what-all.

I'm working on something that harvests data from a number of source, and does 
all kinds of analysis, and the rsults go out somewhere. I have most of the 
data locally, but occasionaly I'll need to reach out over the nt and query 
THAT database, or fetch stuff from yonder ftp site, or (here's where this 
started) ask a web0based app a question, and then parse the response, and keep
on truckin'.

>One solution is python:
>
>>>> import urllib
>>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>>> print f.read()
This is about the right level of abstraction. I don't know much python;
that can change quickly. 

Does this handle the difference bewteen GET vs. POST ?
In general, I would expect a GET should look like...
http://www.musi-cal.com/cgi-bin/query?spam=1&eggs=2&bacon=0

Which is what the above "looks" like it's doing (and since I'm not encumbered
with any knowledge of Python, I can say whatever I like! :), can you write 
about this (a little)?

>It can also be made to deal with persistent cookies.
That's nice. I don't think I'll need that now; of course, in two 
weeks,that'sll be wrong, and I'll be happy to have this done.

curl provides beterr isolation, this provides more compactness and speed. I'm 
looking at curl now, since it provides a "safe" way to do this, and it is 
isolated at a step (process) level, which is nice sometimes, even at the 
expense of speed and compactness.

							Regards,
							---> RGB <---