Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: PDF Modules Seeking Recommendations

by toma (Vicar)
on Nov 25, 2006 at 19:24 UTC ( #586040=note: print w/ replies, xml ) Need Help??


in reply to PDF Modules Seeking Recommendations

I have used another non-module approach: http://pdftohtml.sourceforge.net . It translates pdf to XML or HTML. The XML isn't valid, but it is not difficult to fix. This code is also based on xpdf.

I like this approach because it gives me a bunch of text box strings with their bounding box coordinates, which I then sort by location. This is important for me because the documents that I parse tend to have an irregular 'document order.'

I have also found pdf tips and tricks on the mostly commercial http://www.pdfzone.com site.

It should work perfectly the first time! - toma


Comment on Re: PDF Modules Seeking Recommendations

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://586040]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2015-07-05 22:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls