Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: PDF Modules Seeking Recommendations

by toma (Vicar)
on Nov 25, 2006 at 19:24 UTC ( #586040=note: print w/ replies, xml ) Need Help??


in reply to PDF Modules Seeking Recommendations

I have used another non-module approach: http://pdftohtml.sourceforge.net . It translates pdf to XML or HTML. The XML isn't valid, but it is not difficult to fix. This code is also based on xpdf.

I like this approach because it gives me a bunch of text box strings with their bounding box coordinates, which I then sort by location. This is important for me because the documents that I parse tend to have an irregular 'document order.'

I have also found pdf tips and tricks on the mostly commercial http://www.pdfzone.com site.

It should work perfectly the first time! - toma


Comment on Re: PDF Modules Seeking Recommendations

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://586040]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2014-08-02 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (53 votes), past polls