Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

high speed checksum for video finger printing?

by faber (Novice)
on Feb 04, 2012 at 21:52 UTC ( #951861=perlquestion: print w/ replies, xml ) Need Help??
faber has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys,

I'm wondering if anyone here has thought about a high speed checksum for video finger printing. I'd like to avoid if possible reading an entire video file to determine it's checksum, rather I would like to only read segments of the data to try and determine this.

My first thoughts were to use crc32 against selected segments of the files, (say 1 megabyte every 2 megabytes of data) or something like that.

I understand that without checksuming the entire file it's very hard to guarantee uniqueness, however I'm more concerned with speed.

Any thoughts?

---

Alright guys, first generation of File::Fingerprint::Huge is up on cpan. I'll update it with some further refinements as I move forward. Thanks for all of you help!

Comment on high speed checksum for video finger printing?
Re: high speed checksum for video finger printing?
by BrowserUk (Pope) on Feb 04, 2012 at 22:01 UTC

    Use the filesize to seed a random number generator, and then read 100 random 4- or 8-byte chunks from the file, stick'em together and checksum them.

    The odds of duplicates are billions to 1.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Ah yes, This is a great idea and could be very useful for the right types of data management cases. I think I'm going to do this, likely call it as suggested File::Fingerprint::Huge if no one has anything similar to this already.
        if no one has anything similar to this already.

        Nothing I've seen, so go for it.

        My suggestion would be to use Math::Random::MT as the PRNG. It is portable and reproducible cross-platform.

        Then something like:

        use Math::Random::MT qw[ rand srand ]; use Digest::CRC qw[ crc64 ]; sub fingerPrintFile{ my $file = shift; my $filesize = -s( $file ); srand $filesize; open my $fh, "<', $file or die $!; ## assuming CRC-64 my $chunks = int( $filesize / 8 ) - 1; ## Added sort per RichardK's suggestion below. my @posns = sort{ $a <=> $b } map 8*int( rand $chunks ), 1 .. 100; my $rawSample = join '', map{ seek $fh, $_, 0; read( $fh, my $chun +k, 8 ); $chunk } @posns; close $fh; return crc64( $rawSample ); }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: high speed checksum for video finger printing?
by InfiniteSilence (Curate) on Feb 04, 2012 at 23:14 UTC

    I was going to recommend File::Fingerprint but I realized that your files are likely to be HUGE so this will not work efficiently. However, I would take BrowserUk's recommendation, build a new module called File::Fingerprint::Enormous or perhaps File::Fingerprint::BigVideo, package it ,and post it back up to CPAN.

    Celebrate Intellectual Diversity

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://951861]
Approved by Perlbotics
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2014-08-22 23:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (168 votes), past polls