Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Read table data from PDF

by Ratazong (Monsignor)
on May 11, 2016 at 10:44 UTC ( #1162740=note: print w/replies, xml ) Need Help??


in reply to Read table data from PDF

Hi perlmad,

I am afraid that is a non-trivial task. To know why, please read the following node by almut: Re: CAM::PDF did't extract all pdf's content

I made good experiences by using an external pdf2txt-converter and the parsing the output - but this of course depends on your input-document.

HTH, Rata

Replies are listed 'Best First'.
Re^2: Read table data from PDF
by ateague (Monk) on May 11, 2016 at 13:59 UTC
    I made good experiences by using an external pdf2txt-converter and the parsing the output - but this of course depends on your input-document.

    As a side note, if you go down this route, make absolutely certain that your external program will extract the text with some sort of X/Y position.

    Unless you have full and complete control over the PDF and its generation, parsing PDF text by fixed position row/column is pretty much guaranteed to end in failure, frustration, and an absolutely massive nest of exceptions and special parsing cases

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1162740]
help
Chatterbox?
and God said, "Let Newton be!"...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2018-04-21 12:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?