http://www.perlmonks.org?node_id=986798

mnooning has asked for the wisdom of the Perl Monks concerning the following question:

I need a way to get at the code of a subroutine. Not to execute it. Rather, to independently generate the checksum of the subroutine code. It thought it might be easy using a subs' code ref, but a coderef is only good for executing code, not for seeing the code itself. The end goal is to check each of the subs to tell if any subs have been tampered with by a hacker, independently of an overall file checksum.

Any ideas?

Thanks

Replies are listed 'Best First'.
Re: checksum of subroutine
by chromatic (Archbishop) on Aug 10, 2012 at 20:18 UTC
    ... but a coderef is only good for executing code, not for seeing the code itself.

    It's enough, with the core module B::Deparse:

    use B::Deparse; my $deparse = B::Deparse->new( '-p', '-sC' ); my $source = $deparse->coderef2text( \&some_func );

      This looks like it will serve the case where the software modules are wrapped up in a single Perl PAR executable, wherein I cannot get at the individual files.

      This begs the question "Why would this situation ever arise? I can only tell you there are reasons.

      I love CPAN. Thanks!

Re: checksum of subroutine
by jeffa (Bishop) on Aug 10, 2012 at 18:34 UTC

    Why would a finer grained inspection of the subroutines be any more suspect than any other part of the code? I would think, once the current state of the file has been blessed, that an overall checksum of the file would be more than sufficient to show that ANY changes have been made when NONE were expected. Once you have a corrupted file identified then you can use something like diff to see what changed.

    Otherwise, there happens to be this dynamic language called Perl is that very good at parsing text. ;) Anything from a simple regex to Parse::RecDescent can be used to extract the bits of text that make up a Perl subroutine.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      On second thought, your suggestion contains the answer. Simply use RecDescent to do the parsing just prior to shipping the software, etc.

      Thanks!

      A hacker can replace needed bits in a file, then add other bits so that the overall file checksum stays valid. Doing that gets quantum if you have to hack the subroutines and file checksums.

      As for RecDescent, if I could get at the text of a sub I could parse the sub and checksum it myself. The trick is to get at the text of the subroutine. That is where the question lies. Parse::RecDescent needs something like "$text", where $text is the text of the subroutine. You cannot hand it just a coderef. :-)

        Use Digest::SHA to calculate a digest over the whole file. Although it is not impossible to make two totally different files with the same digest, it is extremely unlikely that both files will have the same length and both will be working programs. It is not as simple as changing a few instructions and adding a few meaningless bytes at the end to "make up" the checksum.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics

        Rather than literal checksums (e.g., sum of all bytes) or even CRCs, maybe investigate some modern 'digital signature' technology. Perhaps start with the Cryptographic hash function discussion.

        "A hacker can replace needed bits in a file, then add other bits so that the overall file checksum stays valid."

        True, but it's very, very hard. And would be made exponentially harder -- effectively impossible -- by taking two checksums of the file using different algorithms. For example taking both a SHA-512 and Whirlpool hash of the file, then concatenating them.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'