Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: search a pdf file

by Samy_rio (Vicar)
on Jun 27, 2007 at 08:49 UTC ( #623555=note: print w/replies, xml ) Need Help??

in reply to search a pdf file

Hi karana, I have tried using CAM::PDF module. I think it helps you.

use strict; use warnings; use CAM::PDF; use CAM::PDF::PageText; my $file = $ARGV[0]; my $search = $ARGV[1]; my $doc = CAM::PDF->new($file) || die "$CAM::PDF::errstr\n"; my $pages = $doc->numPages(); for my $pg (1..$pages) { my $foo = $doc->getPageText($pg); my ($data) = $foo =~ m/$search\s*(\d+)/si; print "In $pg page: $search Value is $data\n"; } __END__ Output is: In 1 page: def Value is 20

Velusamy R.

eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

Replies are listed 'Best First'.
Re^2: search a pdf file
by dpavlin (Friar) on Jun 27, 2007 at 15:09 UTC
    I must add another vote for CAM::PDF. My problem was parsing orders which arrived in pbd, and this module (with few lines of code just like above example) made that hard task easy.

    I did examine all other PDF modules on CPAN, and concluded that there is some great code if you want to remix PDFs, but for extracting content, CAM::PDF is clear winner for me.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://623555]
[planetscape]: Might be "well, come back!" Discipulus ;-)
[stonecolddevin]: that's a 2.0 k/d ratio, i'll take it
[choroba]: #cbstream still down?
[erix]: yeah, annoying
[choroba]: last ambrus: 1 week ago :-(
[erix]: let's invade Hungary
[planetscape]: darn
[erix]: probably on holiday, the slacker
erix is reduced to use a BROWSER. (remember those?)

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (13)
As of 2017-06-22 21:07 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (530 votes). Check out past polls.