Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: PDF extract

by jms53 (Monk)
on Mar 31, 2013 at 10:40 UTC ( #1026346=note: print w/replies, xml ) Need Help??

in reply to PDF extract

Line 9,
my $pdf = PDF::API2->new(-file => "$0.pdf");

If your script is called, you will be making a pdf called . $0 contains the script's name. While not wrong, it reduces the usefulness of your script, as you would have to rename the script each time you want to use it.

I also can't help but notice you only open one pdf file.

J -

Replies are listed 'Best First'.
Re^2: PDF extract
by PerlSufi (Friar) on Mar 31, 2013 at 12:57 UTC
    Thanks J, I meant to change that. I'll continue to try and figure out extracting PDF text..
      Here is what I have so far. When I tried to run it I got the error message Can't call method "getRootDict" on an undefined value..."
      use CAM::PDF; use PDF::API2; my $file_name = shift; my $pdfone = CAM::PDF->new('pdfone.pdf'); for my $page (1 .. $pdfone->numPages()) { my $text = $pdfone->getPageText($page); @lines = split (/\n/, $text); foreach (@lines) { my $pdf = CAM::PDF->new('new.pdf'); $pdfone->appendPDF($pdf); } }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1026346]
[Corion]: Hurr - as I'm running some not-so-static websites nowadays, maybe I really should implement a link checker that crawls these sites and checks that all internal links work ...
[Corion]: (in the sense of not returning 404 errors)
[1nickt]: Corion Surely you have one or more lying around?
[Corion]: 1nickt: Not in the general sense... I only have very specific crawlers, but not a simple crawler like that ;) But maybe that would be a good application/( stress) test for Future::HTTP to parallelize

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2017-10-18 11:25 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (244 votes). Check out past polls.