|P is for Practical|
State of the art of PDFs in Perlby mcdave (Beadle)
|on Oct 27, 2012 at 14:57 UTC||Need Help??|
I've been contemplating the state of the art of manipulating PDFs in Perl. The field is littered with the corpses of CPAN modules that try to make it easy to work with PDFs, but I settled on two as being the most useful: PDF::API2 and CAM::PDF. I welcome anyone's comments pointing out things I've missed or other useful tidbits.
My original motivation was a project in which I needed to input an existing PDF (generated by some unknown method) and prepend a coversheet containing a barcode derived from some metadata (passed in as separate arguments; not from the file itself). The barcodes are so people can fax them back to me and I can route the documents, but that's a different story.
If you like counting pixels and keeping track of text's baseline and things like that, you'll love PDF::API2. It's meant to be a low-level tool, and if you want very fine-grained control of your layouts, it's the tool for you. The best examples I found arewkhtmltopdf but it's not Perl. If you're not above system calls, though, it's not bad.
As a low-level tool for creating PDFs, PDF::API2 is everything I want. For reading PDFs, my experience is a bit more mixed. There is a known issue with some features of PDF 1.5 and up. That is a problem for my project, because I consume PDFs people make and "please go back and save this as version 1.4" isn't an option.
To manipulate existing PDFs, CAM::PDF works fine. As of version 1.58, it doesn't claim to broad support for PDF versions beyond 1.5, but my experience is that it can read any PDF I've thrown at it. It bills itself as a PDF manipulation library, and it can do all the helpful things like rearrange pages, import pages from another document, and even clever tricks like swapping out one image for another. So if you have a document and want to tweak it or learn about it, CAM::PDF is a good choice.
In our particular case, we combined the two. We use PDF::API2 to create a one-page coversheet document, then use CAM::PDF to prepend it to the original. It's early days, and nobody is trying to mess me up with complicated PDFs yet, but so far it seems to be working out nicely.