Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

PDF Concatenation and Extraction Tool

by rob_au (Abbot)
on Jan 14, 2004 at 21:59 UTC ( #321391=sourcecode: print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info /msg rob_au
Description: This is a PDF concatenation tool designed to merge PDF files or portions thereof together to a single output PDF file. The command line arguments for this tool take the form:

pdfcat.perl [input files ...] [options] [output file]

-i|--input [filename]

Specify an input file for concatenation into the output file. If a single file is specified with the --page parameter, this script can also be used for extracting specific page ranges.

-o|--output [filename]

Specify the output file for concatenated PDF output.


This argument, which follows an input file argument, defines the pages to be extracted for concatenation from a given input file. If this argument is not defined, all pages from the input file are concatenated. The pages specified for extraction may be separated by commas or designed by ranges.

For example, the arguments --input input.pdf --pages 1,4-6 would result in pages 1, 4, 5 and 6 inclusively being extracted for concatenation.


use File::Basename;
use Getopt::Long;
use PDF::API2;

use strict;

#   Process command line arguments and populate corresponding variable

my %pages = ();
my @input = ();

    'i|input=s'         =>  \@input,
    'o|output=s'        =>  \( my $output = '' ),
    'p|page|pages=s'    =>  sub {

        if (scalar @input > 0) {

        #   If an input file name has previously been defined, associa
+te the given page 
        #   ranges to be extracted with the last input file name suppl

            my @files = split /,/, $input[-1];
            push @{ $pages{ $_ } }, $_[1] foreach @files;

exit 1 unless scalar @input > 0 and length $output > 0;

#   Split the input files specified on any comma characters present - 
#   allows for multiple input files to be specified either by multiple
+ --input 
#   arguments or by a single argument in a comma delimited fashion.

@input = map { split /,/ } @input;

#   Open the PDF file for output (via the PDF::API2 object constructor

my $pdf = PDF::API2->new( -file => $output );
my $root = $pdf->outlines;

#   Step through each of the input files specified and extract the doc
#   pages with the options specified.

my $import_page = 0;

foreach my $file ( @input ) {

    my $input = PDF::API2->open( $file );

    #   Expand the page list and range definitions passed with the --p
+age argument 
    #   associated with the given input file.  By default, all pages o
+f the input
    #   file are included in the output.

    my @pages = ();
    if ( exists $pages{ $file } ) {

    @pages = map { split /,/ } @{ $pages{ $file } };
    @pages = map { /^(\d+)-(\d+)$/ ? $1 .. $2 : $_ } @pages;
    else {

        @pages = 1 .. $input->pages;

    #   Import the pages from the input file input the output PDF file
+ being 
    #   constructed

    if (scalar @pages > 0) {

        #   Extract the filename of the input file without the file ex
+tension for 
        #   incorporation into the document outline.

        my ($name, undef, undef) = fileparse($file, '\.[^\.]*');

        my $outline = $root->outline;
        $outline->title( $name );

        #   Step through each of the pages to be imported, import the 
+page and add an 
        #   entry to the document outline.

        my $document_page = 0;
        foreach (@pages) {


            my $page = $pdf->importpage($input, $_, $import_page);
            my $bookmark = $outline->outline;
            $bookmark->title("Page $document_page");
            $outline->dest($page) if $document_page == 1;

$pdf->preferences( -outlines => 1 );

exit 0;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://321391]
[moritz]: specially if you're used to markdown
[holli]: there probably is some nodelet hack that lets you use markdown :)
[LanX]: a) well ... you can use the xml-version to get the original code w/o need of reverse engineering
[moritz]: yes, and that's the next problem: there are piles of workarounds, but not solution, no visible progress
[LanX]: b) you cahnge my wikisyntax to support markdown and stay PM compatible
[holli]: see moritz, the , how do i put it, robust charme of this site is what separates the wheat from the chaff ^^
[LanX]: I'm willing to improve the code, but we have a götterdämmerung at the moment, Corion is the only active god for some time now
[moritz]: and everybody is like "if you just copy 250 lines of ugly JS into your free nodelet, and happent to know about it, there is a 40% that this workaround kinda works, sometimes"
[moritz]: ... "so there is no need to do anything"
[LanX]: and the code is a bit of a mess

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (12)
As of 2017-11-20 19:21 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (291 votes). Check out past polls.