|
|
| go ahead... be a heretic | |
| PerlMonks |
Re^2: Extracting content text from PDFsby pat_mc (Pilgrim) |
| on Sep 12, 2008 at 14:07 UTC ( [id://710904]=note: print w/replies, xml ) | Need Help?? |
|
marto -
Thanks for your extremely helpful post ... and apologies for not having responded to it any earlier. My experience was exactly the one clinton describes in the thead you reference: modules like CAM-PDF only produce mildly helpful output. I am very grateful for the reference to the Linux tool pdftotext. With the option -htmlmeta it produces extremely useful, tagged output from a given PDF. This is precisely what I have been looking for in a long time. I will intensify my efforts related to this utility from now on. Thanks again! Pat
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||||||||