Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^10: blank pdf generated using PDF::API2 (Updated)

by lennelei (Acolyte)
on Jul 21, 2017 at 14:42 UTC ( [id://1195724]=note: print w/replies, xml ) Need Help??


in reply to Re^9: blank pdf generated using PDF::API2 (Updated)
in thread blank pdf generated using PDF::API2

No: it worked out of the box. I'm sorry I didn't see the file was protected before: I didn't think about that as there was no password asked and no message displayed (and obviously no errors with scripts) when I manipulated the file manually (either via Acrobat Reader or using sejda console or sejda desktop or even Perl scripts). I'm not a PDF expert but I presume that the password is an authentication mechanism more than a protection as it doesn't prevent anything to read the file. But in that case, how is the content deciphered automatically?

Anyway, for CAM::PDF, the script I gave in my first message is working exactly as I wrote it without any password related stuff:

my$file='file.pdf'; my $oldpdf = CAM::PDF->new($file) or die "$CAM::PDF::errstr\n"; if ($oldpdf->numPages() > 100) { printf " (%d pages)\n", $oldpdf->numPages(); $oldpdf->extractPages(1..100); $oldpdf->cleanoutput("split_$file"); }

Still with CAM::PDF, getPageText method works correctly and displays the real text of the file. I also managed to modify some data with getPageContent and setPageContent but not all data (I tried to obfuscate the file with this but the resulting pdf was corrupted).

And with PDF::API2, xmpMetadata method for example produces unreadable data on that file (I cannot give the result here: it doesn't parse correctly on the site).

I'm now looking for a way to use PDF::API2 the same way CAM::PDF is working: ie. by copying the pdf file and then removing the undesired pages but I'm not sure this is possible

Thank you again for your help: it's almost time for me to end my week at work, so I'll return on that subject on Monday. Nice weekend folks.

Replies are listed 'Best First'.
Re^11: blank pdf generated using PDF::API2 (Updated)
by lennelei (Acolyte) on Jul 25, 2017 at 07:57 UTC

    Hi all,

    just a quick update: I didn't find any way to handle correctly the protected PDF with PDF::API2 :(

    I finally choose to simply check if the PDF is bigger than 100 pages and move it to a specific folder in that case. I then split all the big PDFs using sejda-console. As 99% of the PDFs we received are smaller, sejda is not started too often and the whole process is not significantly longer.

    Thank you all for your help, hopefully PDF::API2 will be able some day to handle those PDFs as well :)

    Best regards.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1195724]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (7)
As of 2024-04-24 08:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found