Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Syntactic Confectionery Delight
 
PerlMonks  

Search within a PDF file

by user123 (Initiate)
on Jan 27, 2005 at 15:14 UTC ( [id://425746]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

user123 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am a new member here and looking to get some good help from the wise monks. Does anyone know of a perl module using which I can search for a string within a PDF file? I would like to search for a string, for e.g. "Perl is cool", in a PDF file. If the string is present in the PDF then it should return true else false. Thanks, Sagar

Replies are listed 'Best First'.
Re: Search within a PDF file
by friedo (Prior) on Jan 27, 2005 at 15:16 UTC
      You'll note that those two modules are linked from CPAN.org, the Comprehensive Perl Archive Network.

      Are you familiar with CPAN? Its search is very useful when searching for modules and documentation (another good source is Activestate).

      Ardemus - "This, right now, is our lives."
      I haven't tried them yet but PDF and PDF::Parse APIs do not have the capability to search within a PDF file. The documentation shows that only the PDF doc properties can be retrived. I plan to try it out tonite anyways. Thanks for your help. -Sagar
Re: Search within a PDF file
by dragonchild (Archbishop) on Jan 27, 2005 at 15:27 UTC
    Have you tried grep? Strings, if I remember correctly, are stored as plain-text within the PDF format ...

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      Much of what I see in PDF files is enclosed in 'stream' blocks, which appears to be a compression encoding. grep won't do it. When I am forced to do this myself, I sure hope one of the above mentioned modules or other will take care of pulling out the text I need to look at. (Oh, I'm not looking forward to this!)
Re: Search within a PDF file
by neilwatson (Priest) on Mar 08, 2005 at 14:55 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://425746]
Approved by xorl
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.