Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Example Of Using CAM::PDF Like HTML::TokeParser

by pvaldes (Chaplain)
on Oct 08, 2011 at 16:12 UTC ( #930369=note: print w/ replies, xml ) Need Help??


in reply to Example Of Using CAM::PDF Like HTML::TokeParser

if the pdf layout is the problem, maybe you want consider to use pdftotext playing a little with the layout option,

`pdftotext -layout file.pdf file.txt`; `pdftotext file.pdf second_file.txt`;

you can also extract only the desired pages of the pdf instead the whole file, making the search more easy


Comment on Re: Example Of Using CAM::PDF Like HTML::TokeParser
Download Code
Replies are listed 'Best First'.
Re^2: Example Of Using CAM::PDF Like HTML::TokeParser
by Limbic~Region (Chancellor) on Oct 08, 2011 at 19:31 UTC
    pvaldes,
    As I indicated in my original post, extracting the text didn't work. What I didn't indicate is that I tried every possible tool and variation I could think of to include commercial products. None of the text extractions produce a consistent enough format for me to get at what I need. I understand that what I want to do is not ideal nor easy am may be futile - I however would like to try for myself.

    Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://930369]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (15)
As of 2015-07-31 20:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (281 votes), past polls