Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

merging pdf-files: pdf-merger.pl

by holli (Monsignor)
on Feb 06, 2005 at 15:07 UTC ( #428482=sourcecode: print w/ replies, xml ) Need Help??

Category: PDF
Author/Contact Info holli
Description: This script allows to merge multiple pdf-files to a new pdf-file. Despite from some CPAN-modules it needs File::Log::Shortcut which made it not to CPAN yet. Thus it is included here.


NAME

pdfmerger.pl


DESCRIPTION:

pdfmerger.pl merges multiple Adobe PDF-files to a single document. it supports multiple ``engines'' to produce pdf, namly

PDF::API2
 due to a bug in PDF::Api2, it seems to be impossible to merge files
 that were merged before.
PDF::Reuse
  I encountered a bug with Reuse and documents that were produced by
  MS Word/Acrobat PDF::Writer.
Win32 Acrobat OLE-Automation
  No bookmarks, due to the limitations of the Acrobat-Object-Model.

When the -order option is used only the mentioned files are merged.

@optionsfile is a file that contains options for the script, one per line. This is especially convenient for storing long -order - lists.


SYNOPSIS

$perl pdfmerger.pl -outfile <file> [-inpath <directory>] [-inmask <regex>] [-logfile <file>] [-engine <pdf-engine>] [-order <file 1>] [-order <file 2>] [-order <file n> +] [-bookmark] [-help] [@optionsfile]


ARGUMENTS

-outfile : req. : none : file to create -inpath : opt. : ./ : path to source-files -inmask : opt. : /./ : regex to match files (no glob! +) -logfile : opt. : pdfmerger.log : name of the logfile -bookmark : opt. : false : add bookmarks -help : opt. : no : show help -@optionsfile : opt. : none : file with options -order : opt. : none : sort merged files -engine : opt. : reuse : choose between api2|acro|reuse -backup : opt. : false : backup existing -outfile


PREREQUISITIES:

 Getopt::ArgvFile
 Getopt::Attribute
 POD::Usage
 File::Copy
 File::Basename
 PDF::API2;
 PDF::Reuse;
 Win32::OLE
 File::Log::Shortcut


HISTORY:

version 0.1


AUTOR

holli

The code of pdfmerger.pl
=pod

=head1 NAME

pdfmerger.pl

=head1 SYNOPSIS

 $perl pdfmerger.pl -outfile <file>
                  [-inpath <directory>]
                  [-inmask <regex>]
                  [-logfile <file>]
                  [-engine <pdf-engine>]
                  [-order <file 1>] [-order <file 2>] [-order <file n>
+]
                  [-bookmark]
                  [-help]
                  [@optionsfile]

=head1 ARGUMENTS

 -outfile      : req. : none          : file to create
 -inpath       : opt. : ./            : path to source-files
 -inmask       : opt. : /./           : regex to match files (no glob!
+)
 -logfile      : opt. : pdfmerger.log : name of the logfile
 -bookmark     : opt. : false         : add bookmarks
 -help         : opt. : no            : show help
 -@optionsfile : opt. : none          : file with options
 -order        : opt. : none          : sort merged files
 -engine       : opt. : reuse         : choose between api2|acro|reuse
 -backup       : opt. : false         : backup existing -outfile

=head1 DESCRIPTION:

pdfmerger.pl merges multiple Adobe PDF-files to a single document.
it supports multiple "engines" to produce pdf, namly

=over 3

=item PDF::API2

 due to a bug in PDF::Api2, it seems to be impossible to merge files
 that were merged before.

=item PDF::Reuse

  I encountered a bug with Reuse and documents that were produced by
  MS Word/Acrobat PDF::Writer.

=item Win32 Acrobat OLE-Automation

  No bookmarks, due to the limitations of the Acrobat-Object-Model.

=back

When the -order option is used only the mentioned files are merged.

@optionsfile is a file that contains options for the script, one per l
+ine. This is especially
convenient for storing long -order - lists.


=head1 AUTOR

<a href="http://holli.perlmonk.org/">holli</a>


=head1 PREREQUISITIES:

 Getopt::ArgvFile
 Getopt::Attribute
 POD::Usage
 File::Log::Shortcut
 File::Copy
 File::Spec
 File::Basename
 PDF::API2;
 PDF::Reuse;
 Win32::OLE

=head1 HISTORY:

version 0.1

=cut
# version
our $VERSION = 0.1;

# modules
use strict;
use warnings;

use Getopt::ArgvFile;
use Getopt::Attribute;

use Pod::Usage;

use File::Log::Shortcut;
use File::Spec qw(rel2abs);

use File::Copy;
use File::Basename qw (fileparse);

use PDF::API2;
use PDF::Reuse;


# arguments
our $inpath   : Getopt(inpath=s ./);
our $inmask   : Getopt(inmask=s .);
our $outfile  : Getopt(outfile=s);
our $logfile  : Getopt(logfile=s pdfmerger.rpt);
our $bookmark : Getopt(bookmark);
our $help     : Getopt(help);
our $engine   : Getopt(engine=s api2);
our @order    : Getopt(order=s);
our $backup   : Getopt(backup);

# Help?
pod2usage(-exitval=>0, -verbose=>1)
    if $help;

# inpath?
pod2usage(-message => "'$inpath' is no valid directory!\n", -exitval=>
+1, -verbose=>1)
    unless -e $inpath && -d $inpath;
$inpath .= "/" unless $inpath =~ m:/$:;

# outfile?
pod2usage(-message => "-outfile missing!\n", -exitval=>1, -verbose=>1)
    unless $outfile;

pod2usage(-message => "-outfile is a directory!\n", -exitval=>1, -verb
+ose=>1)
    if -e $outfile && -f $outfile;

# build file list
my $i=0;
my %files = map { $_ => ++$i } grep { /$inmask/i } glob ( "$inpath*.pd
+f" );

$i=0;
my %order = @order ? map { $inpath.$_ => ++$i } @order : %files;

my @dateien = grep { defined $files{$_} } sort { $order{$a} <=> $order
+{$b} } keys %order;

die "no files to merge!\n" unless @dateien;

# create log
my $log = File::Log::Shortcut->new
(
    {
        storeExpText    => 1,                   # Store internally all
+ exp text
        authorName      => "Holli",
        logFileName     => $logfile,
        versionFrom     => "20050206",
    }
);

$log->pProgramHeader(2);

# create pdf
eval
{
    $log->msg(2, "using Engine: $engine\n");
    write_pdfs ($outfile, @dateien);
};

# catch errors
if ( $@ )
{
    $log->msg(2,"\n!**ABORT**!\n$@\n$!\n");
    exit 1;
}

$log->pProgramFooter(2);


#engine-dispatcher
sub write_pdfs
{
    my ($outputpdf, $outdir) = fileparse ($_[0]);


    if ( $backup )
    {
        if ( -e "$outdir$outputpdf")
        {
            # backup existing file
            my $i=0;
            while ( -e "$outdir$outputpdf.$i" ) { $i++ };

            move ("$outdir$outputpdf", "$outdir$outputpdf.$i")
                or die "File '$outputpdf' cannot be renamed\n";

            $log->msg(2, "'$outputpdf' renamed to '$outputpdf.$i'\n");
        }
    }
    else
    {
        unlink "$outdir$outputpdf" if -e "$outdir$outputpdf";
    }



    if ( $engine eq "acrobat" )
    {
        write_pdf_acro (@_);
    }
    elsif ( $engine eq "api2" )
    {
        write_pdf_api2 (@_);
    }
    elsif ( $engine eq "reuse" )
    {
        write_pdf_reuse (@_);
    }
    else
    {
        die "Unknow PDF-Engine!\n";
    }

    -e "$outdir$outputpdf" ?
        $log->pFileInfo(2, "$outdir$outputpdf", "File written $outdir$
+outputpdf : ") :
        die "File '$outputpdf' not written!\n";
}


sub write_pdf_api2
{

    my ($outputpdf, $outdir) = fileparse (shift @_);

    my @pdfFiles = @_;

    my
    (
        $file,
        $pdf,
        $root,
    );

    $pdf = PDF::API2->new( -file => $outdir."\\".$outputpdf );

    #default mediabox to A4
    $pdf->mediabox (0,0,594.9,841.3597);
    $root = $pdf->outlines;

    my $import_page = 0;
    my $document_page = 0;

    foreach $file ( @pdfFiles )
    {
        my ($inputpdf, $inputdir) = fileparse (shift @_);

        my $input = PDF::API2->open( $file );

        my @pages = 1 .. $input->pages;

        if ( scalar @pages > 0 )
        {
            my $outline;
            $outline = $root->outline
                if $bookmark;

            foreach ( @pages )
            {
                ++$import_page;
                ++$document_page;

                my $page = $pdf->importpage($input, $_, $import_page);

                if ( $bookmark )
                {
                    # create bookmark
                    my ($bmtext) = ($inputpdf =~ /([^\.]+)/ );
                    $outline->title($bmtext);

                    my $bm = $outline->outline;
                    $bm->title("page $document_page");

                    $bm->dest($page);
                    $outline->dest($page) if $document_page == 1;
                    $outline->closed;
                }
            }

        }
        $log->pFileInfo(2, "$file", "Processsing $file : ");

    }


    $pdf->preferences( -outlines => 1 )
        if $bookmark;

    $pdf->update;
    $pdf->end;
}


sub write_pdf_acro
{

    eval "use Win32::OLE;";

    my ($outputpdf, $outdir) = fileparse (shift @_);
    my @pdfFiles = map { File::Spec->rel2abs($_) } @_;

    my (
        $_empty_pdf,
        $file,
        $bigpdf,
        $singlepdf,
        $fillpdf,
        $acro,
    );

    my $pages=0;

    $singlepdf  = Win32::OLE->new('AcroExch.AVDoc');
    $bigpdf     = Win32::OLE->new('AcroExch.AVDoc');

    prFile ($outdir."_empty_tmp_.pdf");
    prPage;
    prEnd;

    $bigpdf->open (File::Spec->rel2abs($outdir."_empty_tmp_.pdf"), "")
+;

    foreach $file (@pdfFiles)
    {
        $log->pFileInfo(2, $file, "Processsing $file : ");

        $singlepdf->open($file, "");

        $bigpdf->GetPDDoc->InsertPages
        (
            $bigpdf->GetPDDoc->GetNumPages-1,
            $singlepdf->GetPDDoc(),
            0,
            $singlepdf->GetPDDoc->GetNumPages,
            1
        );

        $pages += $singlepdf->GetPDDoc->GetNumPages;

        $singlepdf->close(1);
    }

    $bigpdf->GetPDDoc->DeletePages ( 0, 0 );

    unless ( $bigpdf->GetPDDoc->GetNumPages == $pages )
    {
        my $p = $bigpdf->GetPDDoc->GetNumPages;

        $bigpdf->close(1),
        #bug in acrobat? some files just donīt get merged
        die "not all pages inserted. shoud be: $pages, is: $p!\n";
    }

    $bigpdf->GetPDDoc->Save(1, File::Spec->rel2abs("$outdir$outputpdf"
+));

    $bigpdf->close(1);

    unlink $outdir."_empty_.pdf";
}


sub write_pdf_reuse
{
    my ($outputpdf, $outdir) = fileparse (shift @_);
    my @pdfFiles = @_;

    prFile ("$outdir$outputpdf");

    my $pages=0;

    for ( @pdfFiles )
    {
        my $page=prDoc($_);
        if ( $bookmark )
        {
            my ($inputpdf, $indir) = fileparse ($_, /\.pdf$/i);
            prBookmark({ text  => $inputpdf, act   => "this.pageNum = 
+$pages;"});

        }
        $log->pFileInfo(2, $_, "Processsing $_ : ");
        $pages += $page;
    }

    prEnd;
}


__END__

The code of File::Log::Shortcut
package File::Log::Shortcut;

use strict;
use warnings;

use POSIX qw(strftime);

our $VERSION = "0.1";

use base qw(File::Log);

sub new
{
    my $proto  = shift;
    my $class  = ref($proto) || $proto;
    my %args   =  %{$_[0]};
    %args = map { lc($_) => $args{$_} } keys %args ;



    my $self = $class->SUPER::new(@_);

    my %defaults =
    (
        appname         => ($main::0 =~ /(?:^|[\/\\])([^\/\\]+)$/)[-1]
+,
        appversion      => $main::VERSION || "",
        authorname      => "",
        _startedepoc    => time,
        versionfrom     => strftime( "%Y%m%d-%X", localtime(  (stat($m
+ain::0))[9]  )   ) ,
        startedat       => strftime( "%Y%m%d-%X", localtime() ),
    );

    for ( keys %defaults )
    {
        $self->{$_} = defined $args{$_} ? $args{$_} : $defaults{$_};
    }

    return $self;
}

sub pProgramHeader
{
    my $self = shift;
    $self->msg(shift, $self->programHeader);
}

sub programHeader
{
    my $self  = shift;

    my $msg = "";
    $msg .= $self->{appname};
    $msg .= " Version $self->{appversion}"    if $self->{appversion};
    $msg .= " from $self->{versionfrom}"      if $self->{versionfrom};
    $msg .= "\n";
    $msg .= "Author: $self->{authorname}\n"   if $self->{authorname};
    $msg .= "Started at $self->{startedat}\n" if $self->{startedat};

    return $msg;
}

sub pProgramFooter
{
    my $self = shift;
    my $debug = shift;
    $self->msg($debug, $self->programFooter);
}

sub programFooter
{
    my $self = shift;

    return "Ended at " . strftime( "%Y%m%d-%X", localtime() ) . " afte
+r " . (time-$self->{_startedepoc}) . "\n";
}

sub pFileInfo
{
    my $self = shift;
    my $debug = shift;
    my $file  = shift;
    my $msg   = shift || "";
    $self->msg($debug, $self->fileInfo ($file, $msg));
}

sub fileInfo
{
    my $self  = shift;
    my $file  = shift;
    my $msg   = shift || "";

    return $msg . strftime( "%Y%m%d-%X", localtime(  (stat($file))[9] 
+ )  ) . "\n";
}

1;

__END__

=head1 NAME

File::Log::Shortcut

=head1 SYNOPSIS

 my $log = File::Log::Shortcut->new
 (
     {
         # the author of the program, defaults to ""
         authorName      => "Holli",

         # date of release, defaults to "last changed" of the script-f
+ile
         versionFrom      => "someday",

         # the scripts version, defaults to $main::VERSION or ""
         appVersion       => "0.1",
     }
 );


 $log->pProgramHeader(2);

 sleep(2);

 open OUT , ">", "testout";
 close OUT;

 sleep(2);

 $log->pFileInfo(2, "testout","File created: ");

 $log->pProgramFooter(2);

 $log->close();

=head1 DESCRIPTION

This is a subclass of File::Log, that offers some convenience. So ever
+ything in the File::Log-documentation
is true for File::Log::Shortcut

=head2 methods

=over 6

=item pProgramHeader ($debug);

calls msg() of the parent class with $debug and a string like the foll
+owing:

 test.pl Version 0.1 from someday
 Author: Holli
 Started at 20050206-04:57:49

=item programHeader

returns the input for pProgramHeader()

=item pProgramFooter ($debug);

calls msg() of the parent class with $debug and a string like the foll
+owing:

 Ended at 20050206-04:57:53 after 4

=item programFooter

returns the input for pProgramFooter()

=item pFileInfo ($debug, $filename, $message);

calls msg() of the parent class with $debug and a string like the foll
+owing:

 File created: 20050205-21:47:16

=item fileInfo ($filename, $message)

 returns the input for pFileInfo

=back

So,
 $log->pFileInfo (2, "file", "message");
is equivalent to
 $log->msg(2, $log->fileInfo ("file", "message"));

=head1 AUTHOR

holli

Comment on merging pdf-files: pdf-merger.pl
Select or Download Code
Re: merging pdf-files: pdf-merger.pl
by jimbojones (Friar) on Apr 08, 2005 at 20:37 UTC
    Hi

    I couldn't get the options to parse -- it kept returning me to the pod usage. I would suggest reworking this with Getopt::Long.

    Also, you shouldn't 'use' both PDF::API2 and PDF::Reuse if you are giving a choice of the PDF engine, ie. I shouldn't have to install both just to use one in this script. You don't 'use' Win32::OLE -- the same can be applied to PDF::API2 and PDF::Reuse.

    However, for me these are minor issues. I'm grateful just for the code to see how to call PDF::API2.

    thanks, Jim

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://428482]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2015-07-04 08:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (58 votes), past polls