[KLUG Members] reading pdf

Bruce Smith members@kalamazoolinux.org
Mon, 29 Mar 2004 16:03:51 -0500


> Is it possible to use ghostscript or something similar to read pdfs to 
> look for certain text?
> 
> The pdf files I'm working with are PDF-1.2 created elsewhere with GNU 
> Ghostscript 6.51, but the text in the file is not readable.
> 
> To clarify, I'm not looking to view the file, or have it read by human 
> eyes, I'm looking to have something that can look for text, or extract 
> the text so I can look for it with an external program.
> 
> Ideas, suggestions and pointers are welcome.

I don't know if this will help with what you're trying to do (whatever
that is :)  but look at the web search engine "htdig".  www.htdig.org  
It had a few methods to extract text from PDF files so it could index
them for it's search engine database.  I remember one method used
Adobe's Acrobat reader 3.0 with some fancy command line options.  It
also had other alternative ways to index PDF's.

I can't go look to be more specific because I no longer run htdig, 
so I'll leave the exercise of researching it to the reader.  :-)
http://www.htdig.org/

 - BS