xpdf comes with a utility called pdftotext. Perhaps that is what you are looking for? Also, searching google for 'pdftotext' I found another application that looks to do the same thing on windows.
xpdf can be found at: http://www.foolabs.com/xpdf/
Hope that helps.
Brad
| [reply] |
It looks like the best solution; thanks. I was hoping for something Perlish that I could use deep in my Perl script, but sometimes you just have to play the hand you're dealt.
--
tbone1
Ain't enough 'O's in 'stoopid' to describe that guy.
- Dave "the King" Wilson
| [reply] |
While it may not be the perfect solution, you can always convert the pdf to html with the mentioned program and them convert the html to text. | [reply] |
I'd tried that, but some of the .pdf files have ugly tables in them, and the tables created by pdftohtml are, um, unpretty. In fact, predicting the output and formats was a royal mess.
--
tbone1
Ain't enough 'O's in 'stoopid' to describe that guy.
- Dave "the King" Wilson
| [reply] |
You have a very difficult job in front of you. PDF isn't a format that translates back nicely into ASCII.
I know for certain that if you have a long paragraph that is visually wrapped into several lines in a PDF, that the text that composes the paragraph is broken up into several strings (well, however many lines there are). This presents problems when you want to sensibly save simple ASCII back out.
There are other issues as well, having to do primarally with getting the text in the correct order in the ASCII file.
Unless you are "cherry picking" a string or two, you'll be happier if you can redefine your problem in another way....
Cheers
-------------------------------------
Nothing is too wonderful to be true
-- Michael Faraday
| [reply] |