Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Generating multi page PDF with bitmap images, vector graphics and text.

by chrestomanci (Priest)
on Nov 21, 2010 at 22:29 UTC ( [id://872827]=perlquestion: print w/replies, xml ) Need Help??

chrestomanci has asked for the wisdom of the Perl Monks concerning the following question:

Greetings fellow monks.

I am attempting to write a perl script that will generate a pdf file of about 100 pages. Each page needs to contain over 100 bitmap images, and also some text labels and vector graphics. The images are to be laid out in a regular grid pattern, and the overall format layout of each page in the document will be the same. (Different images and labels will be used on each page.)

I have tried a number of different approaches, and none provide a complete solution, so I would like the advice of the monastery on how I might solve the problem.

My basic algorithm looks like this: (pseudocode)

my $pdf = new PDF::Report( 'PageSize' => 'A4', 'PageOrientation' => 'Landscape', undef => undef ); for( my $page_num=0; $page_num<$pages; $page_num++) { for( my $srcY=0; $srcY<10; $srcY++ ) { SQUARE: for( my $srcX=0; $srcX<10; $srcX++ ) { my $imgFile = calculate_src_filename($page_num,$srcY,$srcX) $pdf->addImgScaled($imgFile, $offsetX+(1+$srcX)*$squareSize, $of +fsetY+(1+$srcY)*$squareSize, $squareScale); } } # Code here to add text labels. # Code here to add arrows & symbols as vector graphics. $pdf->newpage() } open(PDF, ">",$outFileName) or die "Error opening $outFileName $!\n"; print PDF $pdf->Finish; close(PDF); return 0;

I first tried PDF::Report (which you can see in my code example, however, I found that the vector graphics support does not work properly. When I attempted to draw shapes. (Simple polygons), they did not appear in the output. There also appears to be no support for rotated text.

I then took a look at the perl bindings for Cairo, and while the support for rotated text and vector graphics are excellent, I can't figure out if there is support for bitmap images, (if there is, it is not documented well). Also there does not appear to be support for multi page PDF output.

I then thought I could go via SVG as an intermediate format, and convert it to PDF using inkscape's svg2pdf tool. The source PDF could be generated either using GD::SVG, or by preparing an SVG template by hand, and then using Template Toolkit to fill it in with the different images and labels for each page. (The vector graphics are the same on each page). I have not so far written any code to try this approach.

I am aware of pdftk, so I know that I can join may single page PDF files into one multi page one fairly easily, However it would be nice to be able to use a library that supports multi page PDF natively.

Given my lengthy description of the problem, can anyone offer suggestions on the best way to proceed from here?

  • Comment on Generating multi page PDF with bitmap images, vector graphics and text.
  • Download Code

Replies are listed 'Best First'.
Re: Generating multi page PDF with bitmap images, vector graphics and text.
by tod222 (Pilgrim) on Nov 21, 2010 at 23:14 UTC

    PDF? Unless you're producing a paper for submission to a site that requires PDFs, this would be better done as separate HTML pages.

    But given the PDF requirement, you need to be aware of the issue that arises when each page is constructed from hundreds of components that get encapsulated within the PDF. You can end up with a PDF containing millions of entities which tends to slow the rendering to a crawl.

    If you must use PDF you may find the resulting file handles better in readers if small images and vector graphics are combined into larger images, reducing the number of items to be rendered.

    It's not a good idea to let the requirement for multi-page PDF support eliminate otherwise good methods, since you can produce the multi-page document by combining single pages as the final processing step.

      Thanks for your warning about generating a PDF with millions of elements, I did not know about that issue. For my problem, I will have around 160_000 elements in the total document, so not a million, but a fairly substantial number, which could probably be reduced by merging some images that are meant to appear adjacent to each other into a single image.

      There is not an absolute requirement for PDF, the requirement is to be able to print the final document onto paper, with precise control over the placement of elements. The document will be printed double sided, and elements on opposite sides of the sheet need to line up to within a millimetre or so. I know that PDF will meet that requirement, but I don't know of any other format that will. Is there a multi page extension to SVG?

        chrestomanci:

        If you want to keep the PDF small, you may want to scale and/or render all images into a common format at the size you want in the final document. Otherwise, some PDF tools may just put in the full-size graphic and scale it down internally. (That's fine, too, if you want them to be able to pull out the full-size graphic from the PDF file. But if you're captioning the graphics with an URL to give them access to them, then you probably don't want to distribute all of them in your PDF, too.)

        ...roboticus

        ...the requirement is to be able to print the final document onto paper, with precise control over the placement of elements. The document will be printed double sided, and elements on opposite sides of the sheet need to line up to within a millimetre or so. I know that PDF will meet that requirement, but I don't know of any other format that will

        Yes, that strict print requirement precludes using HTML.

        You looked into Cairo and SVG as intermediate formats but not any of the venerable print document formats? I guess it's an indication of how much the web has eclipsed print that its document formats are being lost to obscurity:

        FormatModules
        PostScriptCPAN (287 found)
        TeXCPAN (156 found)
        ODF (Open Document Format)CPAN (28 found)
        DVI (Device independent)CPAN (14 found)

        PostScript and TeX go way back, while ODF is new.

Re: Generating multi page PDF with bitmap images, vector graphics and text.
by chrestomanci (Priest) on Nov 26, 2010 at 14:49 UTC

    Thank you tod222 and roboticus for your input.

    After careful consideration, I decided to go for drawing my SVG templates in inkscape, adding template tags into the xml by hand, and then using Text::Template to fill in the template for each page to generate the SVG for each page, which then gets converted to PDF and joined into one big PDF via pdftk.

    The final PDF file weighs in at about 300 megs, takes a while to render and probably contains a fair bit of waste, but it does what I need it to do. Disc space is cheap and I don't expect to email the thing any time soon.

    One small glitch with using a templating system, is that on occasion I would like some elements of the SVG to not appear at all, and it does not appear to be possible to hide an element by template, nor is there a 'visible' property in SVG. (at least I can't find one), that would let me hide things. Instead I will probably end up controlling the colour of object by the template, and change their colours to transparent if I want to hide them.

    tod222: You suggested I might consider TeX, Postscript, DVI or ODF. The reason I did not consider them (apart from familiarity) is that while I am developing, I want to be able to preview what the script has created, and PDF or SVG are very good for that. Also while there are a great number of modules on CPAN for those formats, not many of them looked that useful. The first page of the CPAN search for postscript contains such gems at PostScript::Graph::Paper, PostScript::MailLabels and PostScript::CDCover. All very useful modules I am sure, but only if you are trying to solve those specific problems. Sometimes a large number of search results from CPAN is a bad thing, if you are trying to solve a problem quickly, and you have to many choices to evaluate.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://872827]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-16 09:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found