Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Comparison word against pdf

by rpnoble419 (Pilgrim)
on Apr 16, 2013 at 19:18 UTC ( #1028993=note: print w/replies, xml ) Need Help??

in reply to Comparison word against pdf

Because of how text is generated in PDF file this will be a next to impossible task. What may look like a complete word in the PDF file may actually be a combination of many letters or groups of letters. Also text does not flow in the same manner as in word.

You can improve your chances of success if you know exactly how the PDF files were created and by what application. If you have access to Adobe Illustrator, you can import the PDF files and see how each page is constructed and this may give you insight in to how to read the PDF objects to extract the text.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1028993]
[Corion]: choroba: Optimizing how Perl reads source code on startup (and then processes it line-by-line)
[Corion]: But in general, it seems to be an interesting approach I should think about - whenever I'm searching for something, to consider if I could search for the end of the token instead of the start of the token

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2016-12-06 15:04 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (108 votes). Check out past polls.