![]() |
|
Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re: PDF Parsingby toma (Vicar) |
on Nov 30, 2007 at 07:39 UTC ( #654057=note: print w/replies, xml ) | Need Help?? |
I have tried this a few different ways, and here is my favorite: Use pdftohtml with the -xml option: pdftohtml -xml file.pdf In pdftohtml-0.36, this creates invalid XML output. But it is easy to fix up this XML with a few regular expressions to create valid XML. Then use your favorite XML parser to process the XML. My favorite XML parser is Twig. It should work perfectly the first time! - toma
In Section
Seekers of Perl Wisdom
|
|