comment on

If you can slurp the file, you can reduce the time by 75%. See the comments after the end block show various steps to arriving at this:

#! perl -sw
use 5.010;
use strict;
use Time::HiRes qw[ time ];

my $start = time;

my $size = -s $ARGV[ 0 ];
die("File not a multiple of 4 bytes")
    unless ( $size % 4 ) == 0;

open my $fh, "<:raw", $ARGV[ 0 ]  or die;
my $data;
{
    local $/;
    $data = <$fh>;
}
close $fh;
open $fh, '<', \$data;

my $check_value = 0;
my $buf;

while( read( $fh, $buf, 4 ) ) {

    $check_value ^= unpack 'L', $buf;
    $check_value = ( ( $check_value & 0x7fffffff ) << 1 )
        | ( $check_value >> 31 );
}

say $check_value;

printf "Took: %f seconds\n", time() -$start;

__END__
## Original code
C:\test>767001.pl 767001-small.dat
2779316821
Took: 13.011000 seconds

## Eliminate in loop conditions;
C:\test>767001.pl 767001-small.dat
2779316821
Took: 11.577000 seconds

## Use Ikegami's re-write
C:\test>767001.pl 767001-small.dat
2779316821
Took: 10.453000 seconds

## Use RAM-file
C:\test>767001.pl 767001-small.dat
2779316821
Took: 3.148000 seconds
[download]

However, then I tried a different tack--reading and processing the file in larger chunks--and halved that again while eliminating the slurp limitation. Again see the comments, but 64K chunks seems optimal on my system:

#! perl -sw
use 5.010;
use strict;
use Time::HiRes qw[ time ];
use constant BUFSIZ => 64 * 1024;

my $start = time;

my $size = -s $ARGV[ 0 ];
die("File not a multiple of 4 bytes")
    unless ( $size % 4 ) == 0;

open my $fh, "<:raw", $ARGV[ 0 ]  or die;

my $check_value = 0;
my $buf;

while( read( $fh, $buf, BUFSIZ ) ) {

    for ( unpack 'L*', $buf ) {
        $check_value ^= $_;
        $check_value = ( ( $check_value & 0x7fffffff ) << 1 )
            | ( $check_value >> 31 );
    }
}

say $check_value;

printf "Took: %f seconds\n", time() -$start;

__END__
## Process 4K chunks
C:\test>767001-buk 767001-small.dat
2779316821
Took: 1.771000 seconds

## Process 16K chunks
C:\test>767001-buk 767001-small.dat
2779316821
Took: 1.750000 seconds

## Process 64K chunks
C:\test>767001-buk 767001-small.dat
2779316821
Took: 1.775000 seconds

...
## Process 256K chunks
C:\test>767001-buk 767001-small.dat
2779316821
Took: 1.804000 seconds
[download]

But for ultimate speed, combine the above technique with some Inline::C and you can reduce the time to 1/1000th of your original:

#! perl -sw
use 5.010;
use strict;
use Inline C => Config => BUILD_NOISY => 1;
use Inline C => <<'END_C',  NAME => '_767001', CLEAN_AFTER_BUILD => 0;

U32 checksum( U32 sum, SV *buf ) {
    int i;
    int n = SvCUR( buf ) >> 2;
    U32 *p = (U32 *)SvPVX( buf );

    for( i = 0; i < n; ++i ) {
            sum ^= p[ i ];
        sum = ( ( sum & 0x7fffffff ) << 1 ) | ( sum >> 31 );
    }
    return sum;
}

END_C

use Time::HiRes qw[ time ];
use constant BUFSIZ => 64 * 1024;

my $start = time;

my $size = -s $ARGV[ 0 ];
die("File not a multiple of 4 bytes")
    unless ( $size % 4 ) == 0;

open my $fh, "<:raw", $ARGV[ 0 ]  or die;

my $sum = 0;
my $buf;

while( read( $fh, $buf, BUFSIZ ) ) {
    $sum = checksum( $sum, $buf );
}

say $sum;

printf "Took: %f seconds\n", time() -$start;

__END__
C:\test>767001-IC 767001-small.dat
2779316821
Took: 0.014622 seconds
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re: Improving performance of checksum calculation by BrowserUk
in thread Improving performance of checksum calculation by Crackers2

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl Monk, Perl Meditation
	PerlMonks