[KLUG Members] reading pdf
Bruce Smith
members@kalamazoolinux.org
Mon, 29 Mar 2004 16:03:51 -0500
> Is it possible to use ghostscript or something similar to read pdfs to
> look for certain text?
>
> The pdf files I'm working with are PDF-1.2 created elsewhere with GNU
> Ghostscript 6.51, but the text in the file is not readable.
>
> To clarify, I'm not looking to view the file, or have it read by human
> eyes, I'm looking to have something that can look for text, or extract
> the text so I can look for it with an external program.
>
> Ideas, suggestions and pointers are welcome.
I don't know if this will help with what you're trying to do (whatever
that is :) but look at the web search engine "htdig". www.htdig.org
It had a few methods to extract text from PDF files so it could index
them for it's search engine database. I remember one method used
Adobe's Acrobat reader 3.0 with some fancy command line options. It
also had other alternative ways to index PDF's.
I can't go look to be more specific because I no longer run htdig,
so I'll leave the exercise of researching it to the reader. :-)
http://www.htdig.org/
- BS