Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Merging images into the content of a PDF document

by Arik123 (Sexton)
on May 22, 2018 at 10:09 UTC ( #1215030=perlquestion: print w/replies, xml ) Need Help??

Arik123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all!

Is there any way, using CAM::PDF or any other PDF module, to make the images part of the content of the PDF document? that way, the user won't be able to select them (i.e. to set the focus on them).

<thanks a lot!

  • Comment on Merging images into the content of a PDF document

Replies are listed 'Best First'.
Re: Merging images into the content of a PDF document
by vr (Curate) on May 22, 2018 at 16:21 UTC
    use strict; use warnings; use feature 'say'; use CAM::PDF; my $fn = 'inlineimage.pdf'; my $pdf = CAM::PDF-> new( $fn ) or die; my $pagenum = 1; my $content = $pdf-> getPageContent( $pagenum ); # say $content; # exit; $content =~ s{ (?<= \s ) ( /\S+ ) \s+ Do (?= \s ) }{ my $obj = $pdf-> dereference( $1, $pagenum ); delete $obj-> { value }{ value }{ Length }; $pdf-> writeInlineImage( $obj ); }gxse; delete $pdf-> getPage( $pagenum ) -> { Resources }{ value }{ XObject }; $pdf-> cleanse; $pdf-> setPageContent( 1, $content ); $fn =~ s/\.pdf$/+$&/i; $pdf-> cleanoutput( $fn );

    I didn't know inlining images prevents them to be selected in e.g. Reader, thanks. This protection won't help much, though, because any tool which claims to optimize a PDF will attempt to un-inline them.

    But, yes, there is a way, with quite a few traps. And you won't move anywhere w/o consulting the manuals.

    Test subject is part of CAM::PDF test suite. Uncomment 2 lines and examine output first.

    1st image is shown with "Do" operator, its argument is name of a resource. 2nd image is inline, whatever bracketed between "BI" and "EI" keywords. The "writeInlineImage" method doesn't write anything to anywhere, but, given name of a resource, returns a string to be inserted in content as inline version.

    String replacement is very crude approach (parsing to content tree is advised instead) -- sequence may happen to be part of actual text content or binary (another inline image, whatever).

    Unsupported inline image dictionary entries are supposed to be ignored. Why do I care to remove the "Length"? It happens, in this very file, to be not just a number (as /Length 45) but indirect object (as /Length 10 0 R). So what? Indirect objects are not allowed in content (syntax unknown, kind of), the "/Length 10" (key-value) is ignored, but "0 R" is unknown to parser and Reader just stops rendering the page. Supposedly, the "writeInlineImage" should have taken care of that.

    Which leads to further quick and dirty fix, of removing "XObject" entry from resources, or otherwise CAM::PDF issues a warning about missing "Length" in stream dictionary. But it was a good thing to do, filesize-wise, as image resources are no longer required anyway.

    Now both images are un-selectable, as required. See the Reference for further limitations of inlining, you will no doubt encounter them, -- considering even for an extremely simple test case there were some already.

      Thanks a lot... this solution works... to an extent.

      The images really seem to be inlined. That is, say $content really outputs the binary stream as it should. However, when the PDF is viewed in Reader, no images are shown. It's as if they're drawn in white color on white background... completely invisible, but I'm sure they're there.

      Any idea what could cause the problem?

        It's not "white color on white background", it's emptiness -- syntax error in content makes Reader to abort rendering (silently, because users are not be alarmed, no-no). Solution works to extent of file it was tested with (since you didn't provide any). And, like I said: what entries are allowed in inline image description, what values are they allowed to have, did you check the manual? SMask is definitely not allowed. Rather, it would be ignored, if it wasn't indirect object. I suspect same as above: "0 R" between BI and ID keywords, but maybe forbidden colorspace or compression, whatever. Arbitrarily deleting soft mask will probably result in change of appearance.

        And it's definitely not Perl anymore :)

      A small update - I played a bit with it, and deleted the SMask (right after you delete Length). Now the Chrome plugin shows the images, but Reader still doesn't. That's unfortunate, since I must use Reader...

Re: Merging images into the content of a PDF document
by thanos1983 (Parson) on May 22, 2018 at 13:14 UTC

    Hello Arik123,

    I have not worked with your question before but I found this previously asked question in the forum How to export all images from pdf via perl?.

    Give it a try on the proposed solutions, I think it will be enough to solve your problem.

    Sorry I have missed read your question. My answer is not applicable on your problem.

    Best Regards, Thanos

    Seeking for Perl wisdom...on the process of learning...not there...yet!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1215030]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2020-11-30 23:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?