Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

PDF::API2 external objects

by dd-b (Monk)
on May 08, 2021 at 03:40 UTC ( #11132257=perlquestion: print w/replies, xml ) Need Help??

dd-b has asked for the wisdom of the Perl Monks concerning the following question:

It appears that external objects (specifically, image files placed on the various pages) are not actually read until the pdf->save() call at the end. This may be problematic, in that I'm playing with a script that may generate a rather LARGE PDF containing more than 10,000 images.

I'm wondering if anything like the $pdf->finishobjects(@obj) call might help; but the documentation doesn't really tell me very much about it!

It does look like this module lets you insert pages in any order, and write on pages in any order, so maybe in fact nothing is really done until the end. It might not matter; I'm not actually that worried about the temp directory capacity on modern systems, and a BOTE calculation suggests the sum of the image files, at the size I'm currently proto-typing at, is a very few gigabytes. (Temp files are written out as a necessary step in re-scaling the files to sizes suitable to their appearance in the PDF; yes, using GD you can import an image file, re-scale it in memory, and put that directly into PDF::API2, without going through any temporary files, and it even works...but I was kind of shocked to discover that that was TWO ORDERS OF MAGNITUDE slower than using Image::Imlib2 to read, rescale, and write the image to a temp file and then reading it using the PDF::API2 method.)

(As a side note--I'm kind of shocked how far I've gotten with this since starting to commit code this morning. PDF::API2, despite some documentation issues, which are frequently commented on, seems to be fairly easy to get somewhere with.)

Replies are listed 'Best First'.
Re: PDF::API2 external objects
by vr (Curate) on May 08, 2021 at 12:31 UTC
    It appears that external objects (specifically, image files placed on the various pages) are not actually read until the pdf->save() call at the end.

    I think PDF::Reuse was designed for fast pdf generation with data consumed as they come and pdf parts/portions written (and data disposed) as soon as they are ready. My impression from reading PDF::API2 sources was it also had that goal to write/forget parts no longer required, but it wasn't implemented in earnest. If your concern is that "a very few gigabytes" file would be kept in memory until finally saved -- then yes, it would (i.e. if you succeed).

    If, OTOH, the only issue is with 10000 temp files scattered around -- it's only true for jpeg images, which are imported using their filenames. See this line and thereabouts. The private ' streamfile' key is nowhere to be found in modules to import other image formats. Reading same source file, pass a filehandle instead of filename, but note (line 42) tiny 512 bytes buffer as opposed to 4096 bytes, which would be used in your original test when all jpeg files are read "at the end".

    Without SSCCE, I can't comment on GD being that much slow, but I'm surprised "decode jpeg -- scale -- encode jpeg (and that into memory, not disk file!)" can be be 100 times faster/slower.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11132257]
Approved by NetWallah
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2022-10-04 09:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (16 votes). Check out past polls.

    Notices?