Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Extracting content text from PDFs

by pat_mc (Pilgrim)
on Sep 12, 2008 at 14:07 UTC ( [id://710904]=note: print w/replies, xml ) Need Help??


in reply to Re: Extracting content text from PDFs
in thread Extracting content text from PDFs

marto -

Thanks for your extremely helpful post ... and apologies for not having responded to it any earlier. My experience was exactly the one clinton describes in the thead you reference: modules like CAM-PDF only produce mildly helpful output. I am very grateful for the reference to the Linux tool pdftotext. With the option -htmlmeta it produces extremely useful, tagged output from a given PDF. This is precisely what I have been looking for in a long time. I will intensify my efforts related to this utility from now on.

Thanks again!

Pat

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://710904]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2026-03-09 23:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.