http://www.perlmonks.org?node_id=871973


in reply to How to extract image captions from a PDF file using perl

Perhaps you could convert your PDF files to SVG using inkscape, and then parse the resultant SVG using one of the standard XML processing libraries.

Inkscape has a command line mode that can do almost anything that you can do with the GUI.

inkscape -f Input_file.pdf -l Output_file.svg