Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: PDF Modules Seeking Recommendations

by toma (Vicar)
on Nov 25, 2006 at 19:24 UTC ( #586040=note: print w/ replies, xml ) Need Help??


in reply to PDF Modules Seeking Recommendations

I have used another non-module approach: http://pdftohtml.sourceforge.net . It translates pdf to XML or HTML. The XML isn't valid, but it is not difficult to fix. This code is also based on xpdf.

I like this approach because it gives me a bunch of text box strings with their bounding box coordinates, which I then sort by location. This is important for me because the documents that I parse tend to have an irregular 'document order.'

I have also found pdf tips and tricks on the mostly commercial http://www.pdfzone.com site.

It should work perfectly the first time! - toma


Comment on Re: PDF Modules Seeking Recommendations

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://586040]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2014-12-22 06:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (112 votes), past polls