Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

PDF::API2 processing time

by ndnalibi (Acolyte)
on Oct 21, 2008 at 18:31 UTC ( #718557=perlquestion: print w/replies, xml ) Need Help??

ndnalibi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Everyone, I have a perl script that creates a variable data pdf (2 pages per record) using PDF::API2. The end result will be a printable PDF. My problem is that the proscessing time is about 1 minute per record. Long for a 20 record file, REALLY long for a 1000 record file, TAKE A WEEK VACATION for a 5000 record file! All of the images need to be hi res, so I'm thinking that it takes so long due to the resampling of each picture. My question is: Has anyone been successful in caching frequently used images so that they don't need to be resampled each time; or any other brainspark suggestion to lower the processing time (besides buying a new server - though that is a choice) Currently run on a dual p3 800 ibm eserver with 1.5 gb ram with fedora core 6 OS and perl v5.8.8 I don't mind posting the code, but it's a long one and I figured I'd post it only if needed... Thanks!

Replies are listed 'Best First'.
Re: PDF::API2 processing time
by jethro (Monsignor) on Oct 21, 2008 at 19:15 UTC

    You might use profiling to really check where the time is spent. Maybe you do something very inefficient. See Profiling your code.

    You might post the code of what you are doing. How can we improve code we don't know?

    There is also a mailing list for PDF::API2 users (check the wiki on the sourceforge page of the project). If PDF::API2 is the culprit they might have the inside knowledge you need.

Re: PDF::API2 processing time
by SilasTheMonk (Chaplain) on Oct 21, 2008 at 19:13 UTC
    Have you tried profiling the code? Also I would have thought that with files that size a lot of memory would be required, which might lead to a lot of page swapping. So I would look at options that reduce memory usage. If that is the problem the profiling might be rather misleading. So getting some low level performance data on your machine during the course of its run could be useful. Also does the package itself have any verbose logging options with timestamps?
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Oct 21, 2008 at 20:00 UTC
    Thanks for the pointers - I will check out the profiling. (didn't know it was out there) I did just add ram to the server, with no noticeable difference. top shows 1 processor running at 100% the other at 0% (or close to it) overall 50% +- so it's not taking advantage of the dual processor...maybe it's not supposed to Now for the code: (be kind)
    #!/usr/bin/perl -- # # # # #################################################################### # # # Perl script base for dynamic photo font printing # Outputs a print ready PDF # Command line version - need to pass the file name only as # argument # Liberty Photo-Phonts # # # Requirements: Perl5 UNIX # # Author: Bill O'Connell # Version: 1.0 # Date: 6/12/08 # Modified: 9/17/08 # Modified: 10/21/08 switched to # simplified command line use # # #################################################################### # START USER EDITS #################################################################### $dir = "/var/www/html/font_test/data/"; #################################################################### # END USER EDITS #################################################################### use PDF::API2; use PDF::API2::Lite; #################################################################### #################################################################### my $fn = $ARGV[0]; # file name passed as variable # if no error message is return ed, the upload was successful my ($fNames, $aa, $bb, @current, @currentfiles ); #$tmpl = $FORM{template}; # Used for defining contant point to in conversions use constant mm => 25.4 / 150; use constant in => 1 / 150; use constant pt => 1; my $unixwc = "wc -l " . $dir . $fn; my $full = `$unixwc`; $full =~ s/($dir)//; $full =~ s/($fn)//; $full =~ s/(\n)//; my $pdf = PDF::API2->new(-file => "/tmp/pic_font.pdf"); #defi +nes output file Will need to be dynamic in future to avoid dups open(DATAFILE, $dir . $fn) || die "Open Failed"; #Opens d +atafile for reading while(<DATAFILE>){ #Reads in one line at a + time chomp; ($sa,$se,$op,$comp,$fname,$lname,$ad1,$addr,$city,$state,$ +zip,$post,$vs,$sales,$cell,$slsemail,$dmp) = split /,/; #splits cu +rrent line into variables my $opt = $sa; $opt .= "\/"; $opt .= $se; $opt .= " "; $opt .= $op; my $res = 72/150; #sets resolution variable + - first number is always 72, second is resolution my $picw = 238.17/mm; #sets image width and + height my $pich = 158.75/mm; my $bkhdra = "It\'s all about picking the best +"; #these pre-define all of the text - will be changing my $bkhdrb = "marketing solution for your need +."; # this to an auto word wrap function later my $btxt1a = "Liberty Creative Solutions has t +he solutions you need -"; my $btxt1b = "all under one roof. Our teams of + professionals are experts"; my $btxt1c = "at design, print production and +fulfillment. As a single-"; my $btxt1d = "source provider, we manage the e +ntire process. And that"; my $btxt1e = "means more time for you."; my $btxt2 = "We offer:"; my $btxt3a = $fname .", contact ".$sales." tod +ay to discuss how"; my $btxt3b = "LCS can help ".$comp." choose"; my $btxt3c = "the best marketing solution for +your need."; my $btxt3d = "V: \(708\) 555-5555"; my $btxt3e = "C: ".$cell; # not +e the variables from the datafile sprinkled in my $bkbp1 = "Quick Turnaround"; my $bkbp2 = "Customized, variable marketing so +lutions"; my $bkbp3 = "Responsive customer service"; my $bkbp4 = "Flexible coordination with your v +endors"; my $bkbp5 = "Planet coding to track your mail" +; my $bkbp6 = "Print personalize and mail all un +der one roof"; my $page = $pdf->page; # Adds front page $page->mediabox (685,469 ); # Sets page sizes $page->bleedbox( 5, 5, 675, 459); $page->cropbox ( 5, 5, 675, 459); $page->trimbox ( 9, 9, 666, 450); my $fnt = $pdf->corefont('Arial',-encode => 'latin1'); +# Embeding Fonts - note difference between corefonts and True Type my $tgfnt = $pdf->ttfont('/var/www/cgi-bin/UNIVB___.TTF',- +encode => 'latin1'); my $tradebf = $pdf->ttfont('/var/www/cgi-bin/TRADGBCT.TTF' +, -encode => 'latin1'); my $traderf = $pdf->ttfont('/var/www/cgi-bin/TRADGC18.TTF' +, -encode => 'latin1'); my $postnet = $pdf->ttfont('/var/www/cgi-bin/uspsbarcode.t +tf', -encode => 'latin1'); my $pdb = PDF::API2->open('/var/www/html/font_test/Pumpkin +s/back.pdf'); #opens PDF for back - below is commented for the jpg #my $bk = $pdf->image_jpeg('/var/www/html/font_test/Pumpki +ns/back.jpg'); my $jpeg = $pdf->image_jpeg('/var/www/html/font_test/Pumpk +ins/background.jpg'); #defines and opens images my $png = $pdf->image_png('/var/www/html/font_test/Pumpkin +s/foreground.png'); my $tag = $pdf->image_jpeg('/var/www/html/font_test/Pumpki +ns/tagline.jpg'); my $vname = $fname; # copies variable for first name fo +r disemmination my $vr = $page->gfx(); # I think these are redundant, b +ut defines a graphics element for the first page my $bg = $page->gfx(); my $fi = $page->gfx(); my $tl = $page->gfx(); my $txy = 370; #defines x & y coordinates for use l +ater my $txx = 180; my $pngx = 200; my $pngy = 108; $vname =~ tr/[a-z]/[A-Z]/; # Transforms characters to all +caps - comment this line for multi-case templates @chars = split(//, $vname); #splits name into individual l +etters $elem = scalar(@chars); # counts letters in name if ($elem > 6 ) { #sets starting point -wid- and st +arting height -pngy- by number of characters $wid = ((650 / 2) - (($elem * 76) / 2)); $pngy = 109; } elsif ($elem > 4) { $wid = ((580 / 2) - (($elem * 74) / 2) +); $pngy = 109; } else { $wid = ((480 / 2) - (($elem * 73) / 2)); $pngy = 108; } $vr->image($jpeg,5,15,1.09); # writes background image +(defined image, x coord, y coord, scale) foreach(@chars) { if(!$_ == "") { $img=$pdf->image_png("/var/www/html/fo +nt_test/Pumpkins/$_.png"); $vr->image($img,$wid,$pngy,(5/18)); @now = localtime; if ($elem > 6 ) { $rnd = ((rand(7) - + 3) + 78); $wid = + ($wid + $rnd); $pngy += ($pngy + ($rnd - 80)); } elsif ($elem > 4) { $rnd = ((rand(7) - 3) + 81); $w +id = ($wid + $rnd); $p +ngy = ($pngy + ($rnd - 82)); } else { $rnd = ((rand(7) - + 3) + 83); $wid = + ($wid + $rnd); $pngy += ($pngy + ($rnd - 84)); } } } my $txt = $page->text; $txt->fillcolor('white'); $txt->font($traderf,22); ## set font $txt->translate($txx,$txy); ## set inser +t location $txt->text_center("It's all about picking the best one."); + ## insert text $txt->fillcolor('white'); $txt->font($tgfnt,11); ## set + font $txt->translate(560,24); ## +set insert location $txt->text_center("www.libertycreativesolution +s.com"); ## insert text $fi->image($png,5,15,(3/5)); $tl->image($tag,5,5,(1/2)); $page2 = $pdf->importpage($pdb); #page #$page2->mediabox (704,488 ); +# Sets page sizes #$page2->bleedbox( 5, 5, 699, 483); #$page2->cropbox ( 5, 5, 699, 483); #$page2->trimbox ( 19, 19, 685, 469) +; my $bullet = $page2->gfx(); #my $bki = $page2->gfx(); #$bki->image($bk,0,0,1.1); my $txt2 = $page2->text; $txt2->fillcolor('white'); $bullet->fillcolor('white'); $bullet->strokecolor('white'); $bullet->circle(62,169,2); $bullet->circle(62,184,2); $bullet->circle(62,199,2); $bullet->circle(62,214,2); $bullet->circle(62,229,2); $bullet->circle(62,244,2); $bullet->fill; $txt2->textstart; $txt2->font($tradebf,14); $txt2->translate(55,380); $txt2->text($bkhdra); $txt2->font($tradebf,14); $txt2->translate(55,360); $txt2->text($bkhdrb); $txt2->font($traderf,12); $txt2->translate(55,340); $txt2->text($btxt1a); $txt2->font($traderf,12); $txt2->translate(55,325); $txt2->text($btxt1b); $txt2->font($traderf,12); $txt2->translate(55,310); $txt2->text($btxt1c); $txt2->font($traderf,12); $txt2->translate(55,295); $txt2->text($btxt1d); $txt2->font($traderf,12); $txt2->translate(55,280); $txt2->text($btxt1e); $txt2->font($tradebf,12); $txt2->translate(55,255); $txt2->text($btxt2); $txt2->font($traderf,12); $txt2->translate(72,240); $txt2->text($bkbp1); $txt2->font($traderf,12); $txt2->translate(72,225); $txt2->text($bkbp2); $txt2->font($traderf,12); $txt2->translate(72,210); $txt2->text($bkbp3); $txt2->font($traderf,12); $txt2->translate(72,195); $txt2->text($bkbp4); $txt2->font($traderf,12); $txt2->translate(72,180); $txt2->text($bkbp5); $txt2->font($traderf,12); $txt2->translate(72,165); $txt2->text($bkbp6); $txt2->font($tradebf,12); $txt2->translate(55,145); $txt2->text($btxt3a); $txt2->font($tradebf,12); $txt2->translate(55,130); $txt2->text($btxt3b); $txt2->font($tradebf,12); $txt2->translate(55,115); $txt2->text($btxt3c); $txt2->font($tradebf,12); $txt2->translate(55,100); $txt2->text($btxt3d); $txt2->font($tradebf,12); $txt2->translate(55,85); $txt2->text($btxt3e); $txt2->font($tradebf,12); $txt2->translate(55,60); $txt2->text($slsemail); $txt2->fillcolor('black'); $txt2->font($fnt, 12); $txt2->translate(423,135); $txt2->text("$opt"); $txt2->translate(423,115); $txt2->text("$fname" . " " . "$lname"); $txt2->translate(423,100); $txt2->text("$addr"); $txt2->translate(423,85); $txt2->text("$city, " . "$state " . "$zip"); $txt2->fillcolor('black'); $txt2->font($postnet, 16); $txt2->translate(423,50); $txt2->text("\/$post\/"); $txt2->textend; } $pdf->save; $pdf->end( ); close(DATAFILE);

      Is this your normal indentation? Looks horrible. Check out perltidy, a script than can repair this.

      I would also suggest using

      use strict; use warnings;

      I see you have practically just one really big loop where you process each record. It would have been easy to do some really basic profiling by just putting  print "Doing xyz now\n" every few lines and observing whether that minute for each record is wasted in one place or distributed evenly.

      Please read How do I post a question effectively?. You could have done the profiling before posting the code and then would have known which part of the code is important to post. You could have provided sample data (a few records) so that other monks could run your code.

      I don't have any experience with PDF::API2, but I'm guessing that a lot of the stuff you have in your loop could be moved before the loop. Definitely all the constant strings you create, all the font assignments, probably also the assignments of $pdb, $jpeg, $png, $tag. Just try it and see if it works.

      The same could be done with the images of the chars. Your code probably can be changed to:

      my %char_pics=(); # before the big loop while(<DATAFILE>){ ... foreach(@chars) { if(!$_ == "") { if (exists $char_pics{$_}) { $img= $char_pics{$_}; } else { $img=$pdf->image_png("/var/www/html/font_test/Pumpkins +/$_.png"); $char_pics{$_}= $img; } vr->image(... ...

      If this works it could be quite a speedup since each char has to be read only once from disk instead of every time it is printed

Re: PDF::API2 processing time
by busunsl (Vicar) on Oct 22, 2008 at 08:43 UTC
    I had the same problem a while ago.

    PDF::API2 is a very good module for everything PDF, but it is slow.
    I had to create thousands of PDFs in one case and one PDF with thousands of pages in the other. Not feasible with PDF::API2.

    I went for Inline::C and HARU. It is fast but debugging is a PITA.

    Next version will be C++.

Re: PDF::API2 processing time
by roboticus (Chancellor) on Oct 22, 2008 at 11:19 UTC
    ndnalibi:

    I didn't slog through all the code, but if you're just putting text overlays on a bit of artwork, perhaps you should first make a PDF that has everything but the changeable text bits on it, and use that as a base. Then you could avoid the reformatting of jpeg images, etc.

    ...roboticus
Re: PDF::API2 processing time
by ruzam (Curate) on Oct 30, 2008 at 02:02 UTC

    I've had some experience with PDF::API2 and formatting images. The PDF::API2 code rips apart the source image, decompressing it pixel by pixel, then scales and reformats it into the appropriate PDF image content using nothing more than pure Perl. It's a slow process, sometimes painfully slow. I thought I was being smart working with large high resolution image sources, to preserve the quality of the final PDF, but that just brought the render process to it's knees.

    The most effective thing you can do to speed up render is reduce the size/quality of your source images in a graphic editor first. Reduce it to the smallest image size you think you can use without sacrificing the final result. This will make a huge difference. Any images you can generate in advance and include as a pre-generated background (take out of the loop) would also be a good idea.

Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Oct 22, 2008 at 15:42 UTC
    Thanks for the input.

    I had to run a 1000 record file which should be done tomorrow (had to go out for Halloween), Then I have a month to work on this before the Winter Holiday project.

    Sorry for the ugly code - it looks better on my console.

    I did already think about adding print statements to help profile my code and always do this to debug issues, just never had a program I needed to profile before. I will do this first in the future before posting. I like the idea of pre-loading the images into memory, I agree, that could be a huge savings.

    Another thing I am thinking of is writing a vdx file instead of writing one large pdf. vdx includes each image only once in a serialized form along with PPML which is an xml-type markup language that references the images rather than includes them as pdf's. I've got 400 pages of documentation to read on this though and only recently began understanding the concepts of serializing images.

    FYI- this script adds variable images, not just text, which is part of the problem.

    I'll keep you posted on my findings.
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Oct 29, 2008 at 14:35 UTC
    jethro - you are a genius!

    I moved the variables outside the loop - my bad - obviously not a very experienced move on my part

    "caching" the images made an incredble difference.

    Now I'll work on my indentation...

    FYI - once the images are fully loaded, it takes about 1/2 second per record versus over a minute!
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Oct 29, 2008 at 16:26 UTC
    ...and another big plus - it reduced the final 500 page pdf size from 394MB to 8 - YES 8! -MB

    That's awesome!
Re: PDF::API2 processing time
by ndnalibi (Acolyte) on Nov 04, 2008 at 15:44 UTC
    Hi ruzam- When I originally started this project, I did so with low res files and the speed and output size was manageable. Unfortunately the end result did not look very professional on the printed piece.

    In most cases of standard screen readable pdfs I agree with you, but in my case we need high resolution images for the output. Lower res wouldn't work.

    However the changes jethro suggested solved all of the problems I was having and saved me from rewriting the whole project using PPML. Apparently by moving the image declarations and holding the variable images in an array, PDF::API2 builds the PDF with reused image links instead of writing copies of the images over and over. In my case 1000 records will be an average, so the time and size savings were huge.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://718557]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2020-07-04 05:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?