Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Tallying co-occurence of numbers

by BrowserUk (Patriarch)
on Jun 17, 2016 at 20:28 UTC ( [id://1166004]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Tallying co-occurence of numbers
in thread Tallying co-occurence of numbers

What are the ranges of your 3 numbers?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

Replies are listed 'Best First'.
Re^4: Tallying co-occurence of numbers
by K_Edw (Beadle) on Jun 18, 2016 at 09:38 UTC

    Quite large.

    1st Number: 1-20

    2nd + 3rd Number: 1-1,200,000

    Running it on a real sample, I get around 3,000,000 unique lines when printing the hash out (all 3 numbers + frequency per line).

      You could try packing your numbers into a 64-bit int; it might save some space:

      ++$hash{ pack 'Q', $n_1to20 * 1.2e6**2 + $a_1to1_2e6 * 1.2e6 + $b_1to +1_2e6 };

      It depends on the mix of sizes of the larger numbers. (I'll think on it some more.)

      Also, try pre-extending your hash to 3 million: keys %hash = 3e6;


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

      You can gain a tad more by truncating the 64-bit int to 6 bytes, but you're into a world of diminishing returns:

      #! perl -slw use strict; #use Math::Random::MT qw[ rand ]; use Devel::Size qw[ total_size ]; our $S //= 1; srand( $S ); my %hash; for( 1 .. 1e6 ) { my( $x, $y, $z ) = ( int( rand 20 ), int( rand 1.2e6 ), int( rand +1.2e6 ) ); # ++$hash{ $x }{ $y }{ $z }; # ++$hash{ join $;, $x, $y, $z }; # ++$hash{ pack 'Q', $x * 1.2e6**2 + $y * 1.2e6 + $z }; ++$hash{ unpack 'A6', pack 'Q', $x * 1.2e6**2 + $y * 1.2e6 + $z }; } print total_size( \%hash ), ' ', scalar keys %hash; __END__ ++$hash{ $x }{ $y }{ $z }; + 269 897 378 ++$hash{ join $;, $x, $y, $z }; + 106 036 953 40% ++$hash{ pack 'Q', $x * 1.2e6**2 + $y * 1.2e6 + $z }; + 98 388 672 36.5% ++$hash{ unpack 'A6', pack 'Q', $x * 1.2e6**2 + $y * 1.2e6 + $z }; + 96 193 539 35.6%

      Beyond that, if it is still a problem, print the triplets to stdout and pipe the results through your system sort and then into a another perl script that counts them:

      C:\test>perl -E"say join ' ', int( rand 20 ), sort{ $a<=>$b } int( ran +d 1.2e6 ), int( rand 1.2e6 ) for 1 .. 1e6" | sort | perl -nle"if($last eq $_){ ++$n }else{ print qq[$last : $n];$n=1} $l +ast=$_" | wc -l 999978

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1166004]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2024-04-19 07:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found