Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

PDF extract

by PerlSufi (Friar)
on Mar 31, 2013 at 04:02 UTC ( #1026328=perlquestion: print w/replies, xml ) Need Help??
PerlSufi has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I have used the super search to try to find an example of how to extract text from a pdf file and attach it to another pdf file, but haven't found any good examples. Any help would be greatly appreciated! here is my code so far:
use PDF::API2; use PDF::Extract; use CAM::PDF; use constant mm => 25.4 / 72; use constant in => 1 / 72; use constant pt => 1; #create pdf file and page my $pdf = PDF::API2->new(-file => "$0.pdf"); my $page = $pdf->page; $font = $pdf->corefont('Helvetica-Bold'); #create a header my $header_text = $page->text; $header_text->font($font, 24); $header_text->translate(105/mm, 250/mm); $header_text->fillcolor('black'); $header_text->text_center('Thank you for your business!'); $pdf-> save();
I added the CAM::PDF modules and PDF::Extract modules because that is what I found. I just haven't figured out to use them yet.. I want to open a file in the working folder called pdfone.pdf and write it to file I created in the code I posted

Replies are listed 'Best First'.
Re: PDF extract
by tangent (Priest) on Apr 01, 2013 at 02:52 UTC
    You are getting the error "Can't call method "getRootDict" on an undefined value" because 'new.pdf' doesn't exist.

    Manipulating PDF files is quite complex and I found that I had to use both modules to achieve what you are trying to do as PDF::API2 doesn't seem to have any way of extracting text and CAM::PDF doesn't seem to have any way of adding an empty page.

    Hopefully this will help you on your way:

    use strict; use warnings; use CAM::PDF; use PDF::API2; my $pdfone = CAM::PDF->new('input.pdf'); my $pdftwo = PDF::API2->open('output.pdf'); my $font = $pdftwo->corefont('Helvetica-Bold'); for my $pagenum (1 .. $pdfone->numPages() ) { my $text = $pdfone->getPageText($pagenum) or next; my $page = $pdftwo->page(); # add a new page my $pdf_text = $page->text(); $pdf_text->font($font,12); my @lines = split("\n",$text); my ($x,$y) = (50,700); for my $line (@lines) { $pdf_text->translate($x,$y); $pdf_text->text($line); $y = $y - 20; } } $pdftwo->saveas('output.pdf');
      Thank you tangent!! You're a life saver! I was banging my head against the wall all day yesterday on how to do this. I saw that CAM::PDF couldn't create a new page so I knew I had to combine them, I just didn't see how yet. Is there a GIVE MONEY$$ option on here for monks?! :P
        Glad to be of help.
Re: PDF extract
by jms53 (Monk) on Mar 31, 2013 at 10:40 UTC
    Line 9,
    my $pdf = PDF::API2->new(-file => "$0.pdf");

    If your script is called, you will be making a pdf called . $0 contains the script's name. While not wrong, it reduces the usefulness of your script, as you would have to rename the script each time you want to use it.

    I also can't help but notice you only open one pdf file.

    J -
      Thanks J, I meant to change that. I'll continue to try and figure out extracting PDF text..
        Here is what I have so far. When I tried to run it I got the error message Can't call method "getRootDict" on an undefined value..."
        use CAM::PDF; use PDF::API2; my $file_name = shift; my $pdfone = CAM::PDF->new('pdfone.pdf'); for my $page (1 .. $pdfone->numPages()) { my $text = $pdfone->getPageText($page); @lines = split (/\n/, $text); foreach (@lines) { my $pdf = CAM::PDF->new('new.pdf'); $pdfone->appendPDF($pdf); } }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1026328]
Approved by davido
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2017-02-19 21:30 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (293 votes). Check out past polls.