in reply to PDF Modules Seeking Recommendations
I have used another non-module approach: http://pdftohtml.sourceforge.net . It translates pdf to XML or HTML. The XML isn't valid, but it is not difficult to fix. This code is also based on xpdf.
I like this approach because it gives me a bunch of text box strings with their bounding box coordinates, which I then sort by location. This is important for me because the documents that I parse tend to have an irregular 'document order.'
I have also found pdf tips and tricks on the mostly commercial http://www.pdfzone.com site.
It should work perfectly the first time! - toma
|
---|
In Section
Seekers of Perl Wisdom