Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Perl: the Markov chain saw
 
PerlMonks  

Anyone have an example that checks the integrity of a pdf file?

by Anonymous Monk
on Jul 26, 2007 at 18:22 UTC ( #628977=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Does anyone have an efficient batch process that will validate the integrity of pdf files?

Comment on Anyone have an example that checks the integrity of a pdf file?
Re: Anyone have an example that checks the integrity of a pdf file?
by andreas1234567 (Vicar) on Jul 26, 2007 at 19:21 UTC
    Digital signatures allow you to check
    • Whether a message/file has been altered since it was completed.
    • Whether it was actually sent by the person/entity claimed to be the sender.

    It's not clear (to me) what you primary objective is.

    --
    Andreas
Re: Anyone have an example that checks the integrity of a pdf file?
by radiantmatrix (Parson) on Jul 26, 2007 at 20:12 UTC

    This is what message digests are designed for. One creates a digest from a known-good copy of data, and publishes that digest. Then, anyone wishing to determine if their copy of data is the same as the original (was not corrupted), takes the digest of their own copy.

    If the digests match, one is highly confident that file has not become corrupted.

    It would be fairly easy to perform either part of this process in batch, using Perl, by using one of the many Digest modules. Digest::MD5 is one of the most commonly used for this purpose.

    <radiant.matrix>
    Ramblings and references
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: Anyone have an example that checks the integrity of a pdf file?
by waswas-fng (Curate) on Jul 26, 2007 at 21:26 UTC
    If by validate you do not mean check for tampering, but mean does this pdf look like it will parse correctly the short answer is no. A longer answer is that there are many PDF engines out there:

    9 or 10 variants of adobe engines in product or oem libraries
    open source engines such as libpdf, ghost etc
    other 3rd party closed source engines.

    Each of these variants parse and handle pdfs in slightly different ways (some very strict, some loose) -- the only way to actually verify PDF will parse is to apply the engine you are concerned about. Even if parseable without any errors or warnings -- nothing can be said for the visual output matching what is expected.


    -Waswas

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://628977]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2014-04-18 11:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (466 votes), past polls