Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

PDF extract

by PerlSufi (Friar)
on Mar 31, 2013 at 04:02 UTC ( #1026328=perlquestion: print w/replies, xml ) Need Help??
PerlSufi has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I have used the super search to try to find an example of how to extract text from a pdf file and attach it to another pdf file, but haven't found any good examples. Any help would be greatly appreciated! here is my code so far:
use PDF::API2; use PDF::Extract; use CAM::PDF; use constant mm => 25.4 / 72; use constant in => 1 / 72; use constant pt => 1; #create pdf file and page my $pdf = PDF::API2->new(-file => "$0.pdf"); my $page = $pdf->page; $font = $pdf->corefont('Helvetica-Bold'); #create a header my $header_text = $page->text; $header_text->font($font, 24); $header_text->translate(105/mm, 250/mm); $header_text->fillcolor('black'); $header_text->text_center('Thank you for your business!'); $pdf-> save();
I added the CAM::PDF modules and PDF::Extract modules because that is what I found. I just haven't figured out to use them yet.. I want to open a file in the working folder called pdfone.pdf and write it to file I created in the code I posted

Replies are listed 'Best First'.
Re: PDF extract
by tangent (Priest) on Apr 01, 2013 at 02:52 UTC
    You are getting the error "Can't call method "getRootDict" on an undefined value" because 'new.pdf' doesn't exist.

    Manipulating PDF files is quite complex and I found that I had to use both modules to achieve what you are trying to do as PDF::API2 doesn't seem to have any way of extracting text and CAM::PDF doesn't seem to have any way of adding an empty page.

    Hopefully this will help you on your way:

    use strict; use warnings; use CAM::PDF; use PDF::API2; my $pdfone = CAM::PDF->new('input.pdf'); my $pdftwo = PDF::API2->open('output.pdf'); my $font = $pdftwo->corefont('Helvetica-Bold'); for my $pagenum (1 .. $pdfone->numPages() ) { my $text = $pdfone->getPageText($pagenum) or next; my $page = $pdftwo->page(); # add a new page my $pdf_text = $page->text(); $pdf_text->font($font,12); my @lines = split("\n",$text); my ($x,$y) = (50,700); for my $line (@lines) { $pdf_text->translate($x,$y); $pdf_text->text($line); $y = $y - 20; } } $pdftwo->saveas('output.pdf');
      Thank you tangent!! You're a life saver! I was banging my head against the wall all day yesterday on how to do this. I saw that CAM::PDF couldn't create a new page so I knew I had to combine them, I just didn't see how yet. Is there a GIVE MONEY$$ option on here for monks?! :P
        Glad to be of help.
Re: PDF extract
by jms53 (Monk) on Mar 31, 2013 at 10:40 UTC
    Line 9,
    my $pdf = PDF::API2->new(-file => "$0.pdf");

    If your script is called, you will be making a pdf called . $0 contains the script's name. While not wrong, it reduces the usefulness of your script, as you would have to rename the script each time you want to use it.

    I also can't help but notice you only open one pdf file.

    J -
      Thanks J, I meant to change that. I'll continue to try and figure out extracting PDF text..
        Here is what I have so far. When I tried to run it I got the error message Can't call method "getRootDict" on an undefined value..."
        use CAM::PDF; use PDF::API2; my $file_name = shift; my $pdfone = CAM::PDF->new('pdfone.pdf'); for my $page (1 .. $pdfone->numPages()) { my $text = $pdfone->getPageText($page); @lines = split (/\n/, $text); foreach (@lines) { my $pdf = CAM::PDF->new('new.pdf'); $pdfone->appendPDF($pdf); } }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1026328]
Approved by davido
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2017-04-30 12:09 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (537 votes). Check out past polls.