http://www.perlmonks.org?node_id=586040


in reply to PDF Modules Seeking Recommendations

I have used another non-module approach: http://pdftohtml.sourceforge.net . It translates pdf to XML or HTML. The XML isn't valid, but it is not difficult to fix. This code is also based on xpdf.

I like this approach because it gives me a bunch of text box strings with their bounding box coordinates, which I then sort by location. This is important for me because the documents that I parse tend to have an irregular 'document order.'

I have also found pdf tips and tricks on the mostly commercial http://www.pdfzone.com site.

It should work perfectly the first time! - toma
  • Comment on Re: PDF Modules Seeking Recommendations