Speeding permutation counting

albert has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Speeding permutation counting by blokhead (Monsignor) on Jul 18, 2007 at 14:26 UTC
I think the previous replies have misunderstood your question. If I understand correctly, you want to take a pair of strings, put them on top of each other, and read down the columns: `string A: 0 1 0 1 string B: 1 1 0 1 \| \| \| \| \| \| \| +--> 11 \| \| +----> 00 \| +------> 11 +--------> 01` [download] .. i.e, not just look at all the pairs of characters in an individual string.. Splitting each string into an array seems wasteful, as arrays are much more memory-heavy than strings. You can use substr to index into any position in the string to get individual characters. Also, instead of 4 different cases of logic involving 4 similarly-named variables $c00 through $c11, you can use a hash to really simplify the code: `my @data = qw[ 111011001010011110100010100001 111010000010010110000010000001 000101011100001000110101110000 000101111101001001111101111110 111011001010111110100010100001 000100010100000000010001010000 ]; use Data::Dumper; for my $i (0 .. $#data) { for my $j ($i+1 .. $#data) { my %counts; $counts{ substr($data[$i],$_,1) . substr($data[$j],$_,1) }++ for 0 .. length($data[$i]) - 1; print Dumper \%counts; } }` [download] Another cute solution is to use chop to trim off characters at a time from both strings. It generates these character-pairs in reverse order, but since you only care about the final count, it's ok. Since it modifies the string, you have to do it on a copy. `for my $i (0 .. $#data) { for my $j ($i+1 .. $#data) { my %counts; my ($str1, $str2) = @data[$i,$j]; $counts{ chop($str1) . chop($str2) }++ while length($str1) and length($str2); print Dumper \%counts; } }` [download] blokhead	[reply] [d/l] [select]
Re^2: Speeding permutation counting by albert (Monk) on Jul 18, 2007 at 14:57 UTC
Yes, this is the correct interpretation of my question, and thanks for your suggestions. I benchmarked your two suggestions relative to my original, and things are definitely improved: `Rate orig substring blokhead blokchop orig 1199/s -- -45% -62% -69% substring 2162/s 80% -- -31% -44% blokhead 3144/s 162% 45% -- -18% blokchop 3840/s 220% 78% 22% --` [download] "Substring" is closer to my original (too verbose) style, so some of the gains are also coming from the more succinct, easier to read code. Best so far is the "chop" based suggestion. Still wondering if there might be other improvements, or some vastly different way to solve this. Thanks for these suggestions, they really have helped. -a	[reply] [d/l]
Re^3: Speeding permutation counting by ikegami (Patriarch) on Jul 18, 2007 at 15:44 UTC
Do you have the benchmark code? I consider benchmark results useless without the accompanying benchmark code. It's also stopping us from comparing our own solutions without redoing all your work. If you're worried about the space it takes, use `<readmore>` or `<spoiler>` tags.	[reply] [d/l] [select]
Re^4: Speeding permutation counting by albert (Monk) on Jul 18, 2007 at 16:23 UTC
Re^5: Speeding permutation counting by ikegami (Patriarch) on Jul 18, 2007 at 17:37 UTC
Re: Speeding permutation counting by BrowserUk (Patriarch) on Jul 18, 2007 at 15:04 UTC
Instead of breaking your strings up and then having to iterate the characters, you can use bitwise-string operations to process the characters in parallel and then use tr/// to count the results. This results in a 10x speed up over your original: #! perl -slw use strict; use Benchmark::Timer; my $T = new Benchmark::Timer; use Math::Random::MT qw[ rand srand ]; our $S \|\|= 1; our $B \|\|= 32; our $N \|\|= 1000; srand( 1 ); my @strings = map { unpack 'b'. $B, rand( 2**32 ) } 1 .. $N; my $label = "$N strings of $B bits (srand:$S)"; $T->start( $label ); for my $i ( 0 .. $#strings ) { for my $j ( $i+1 .. $#strings ) { print join "\t", $i, $j, ( $strings[ $i ] \| $strings[ $j ] ) =~ tr[0][0], ## 0 +0 ( ~$strings[ $i ] & $strings[ $j ] ) =~ tr[\1][\1], ## 0 +1 ( $strings[ $i ] & ~$strings[ $j ] ) =~ tr[\1][\1], ## 1 +0 ( $strings[ $i ] & $strings[ $j ] ) =~ tr[1][1]; ## 1 +1 } } $T->stop( $label ); $T->report; __END__ ## Original foreach my $string (@strings) { my @items = split //, $string; $string = \@items; } for ( my $i = 0 ; $i < @strings ; $i++ ) { for ( my $j = $i + 1 ; $j < @strings ; $j++ ) { my ( $c00, $c01, $c10, $c11 ) = ( 0, 0, 0, 0 ); for ( my $k = 0 ; $k < @{ $strings[$i] } ; $k++ ) { $c00++ if ${$strings[$i]}[$k] == 0 && ${$strings[$j]}[$k] == 0; $c01++ if ${$strings[$i]}[$k] == 0 && ${$strings[$j]}[$k] == 1; $c10++ if ${$strings[$i]}[$k] == 1 && ${$strings[$j]}[$k] == 0; $c11++ if ${$strings[$i]}[$k] == 1 && ${$strings[$j]}[$k] == 1; } print join( "\t", $i, $j, $c00, $c01, $c10, $c11 ), "\n"; } } c:\test>627253 >nul 1 trial of 1000 strings of 32 bits (srand:1) (40.336s total) c:\test>627253 >nul 1 trial of 1000 strings of 32 bits (srand:1) (40.477s total) ## Bitwise + tr/// c:\test>627253 >nul 1 trial of 1000 strings of 32 bits (srand:1) (4.711s total) c:\test>627253 >nul 1 trial of 1000 strings of 32 bits (srand:1) (4.696s total) [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Speeding permutation counting by albert (Monk) on Jul 18, 2007 at 15:40 UTC
This is a significant improvement. I knew there would be something to do with bitwise operators, but I was not versed enough to come up with such a solution. This is giving the following benchmarks: `Rate orig blokchop browserUK orig 1178/s -- -69% -94% blokchop 3829/s 225% -- -81% browserUK 20097/s 1606% 425% --` [download] In my testing, this is improving about 17x on my original, and about 5x on blokhead's chop based solution. Thanks for the excellent suggestion, -a	[reply] [d/l]
Re^2: Speeding permutation counting by Limbic~Region (Chancellor) on Jul 20, 2007 at 15:02 UTC
BrowserUk, Once I understood the problem, I came up with a solution very similar to yours. When I logged in this morning, I noticed that you had already came up with the idea. Why does work have to get in the way of fun? I doubt there would be a speed improvement but I am pretty sure you only have to count '01' since '10' is just a reflection. Additionally, you should only need to calculate ('11' and '00') or ('11' and '10') or ('00' and '10'). The remaining two can be obtained with math. Cheers - L~R	[reply]
Re^3: Speeding permutation counting by albert (Monk) on Jul 21, 2007 at 23:38 UTC
In my problem '01' is different than '10', and I need to count them separately. Still, your point of not needing to count the last is well taken, as I can count 3, and thereby know the 4th. However, by doing this, I'm not seeing any significant performance gains, and it may be a bit slower. `Rate browser+LR browserUK browser+LR 19055/s -- -2% browserUK 19509/s 2% --` [download] Code for benchmarking... <Reveal this spoiler or all in this thread>	[reply] [d/l] [select]
Re: Speeding permutation counting (/../g) by tye (Sage) on Jul 18, 2007 at 14:19 UTC
`$count{$_}++ for $string =~ /../g;` [download] Update: With the clarifications of the problem statement: `while( @strings ) { my( $x, $y )= splice @strings, 0, 2; my %count; while( $x =~ /(.)/g ) { my $x1= $1; $y =~ /(.)/g or die "..."; $count{ $x1.$1 }++; # Or replace all of the above with: #$count{ "$1" . ( $y =~ /(.)/g ? "$1" : '' ) }++; } print join " ", %count, $/; }` [download] with the (simple) cleaning up of the output left as an excercise. - tye	[reply] [d/l] [select]
Re: Speeding permutation counting by dsheroh (Monsignor) on Jul 18, 2007 at 15:22 UTC
Other monks have already provided excellent optimizations for this particular case. I'd like to add a more modest optimization which applies to your original code and is more generally useful: Don't make unnecessary comparisons. In the original implementation, you're making eight comparisons on each pair of characters when you only need two. `if (${$strings[$i]}[$k]) { if (${$strings[$j]}[$k]) { $c11++; } else { $c10++; } } else { if (${$strings[$j]}[$k]) { $c01++; } else { $c00++; } }` [download]	[reply] [d/l]
Re^2: Speeding permutation counting by ikegami (Patriarch) on Jul 18, 2007 at 16:20 UTC
I thougth a hash might be faster, but no. `Rate hash if hash 9329/s -- -11% if 10473/s 12% -- Rate hash if hash 9436/s -- -11% if 10599/s 12% -- Rate hash if hash 9341/s -- -12% if 10662/s 14% --` [download] Benchmark code: <Reveal this spoiler or all in this thread> A hash is definitely cleaner, though.	[reply] [d/l] [select]
Re^3: Speeding permutation counting by dsheroh (Monsignor) on Jul 18, 2007 at 18:38 UTC
Cool... Guess I should've benchmarked my suggestion instead of just assuming it would be slower than the hash-based ideas. Actually, unless I've severely mis-benchmarked it, it appears that my modest optimization (combined with blokhead's suggestion of using `chop`) even outdoes BrowserUK's bitwise version and still more so with fewer lines (based on BrowserUK's test code, I'm comparing every pair of lines): `100 lines/100 iterations: Rate bitwise ifchop bitwise 33.1/s -- -41% ifchop 56.5/s 71% -- 10 lines/10000 iterations: Rate bitwise ifchop bitwise 3226/s -- -57% ifchop 7463/s 131% -- 1000 lines/5 iterations: s/iter bitwise ifchop bitwise 3.06 -- -23% ifchop 2.37 29% --` [download] (~~I actually suspect I'm doing something wrong in my test because I don't see why the number of lines would affect the relative performance at all.~~ Update: I was doing something wrong. I forgot to pull the strings out of the array and was just comparing indexes in my version. With my version fixed to actually do the comparisons, the bitwise version is much faster, as I had initially expected: `Rate ifchop bitwise ifchop 4.22/s -- -87% bitwise 33.3/s 690% --` [download] ) Benchmark code: <Reveal this spoiler or all in this thread>	[reply] [d/l] [select]
Re: Speeding permutation counting by ikegami (Patriarch) on Jul 18, 2007 at 15:56 UTC
use strict; use warnings; my @strings = qw/ 111011001010011110100010100001 111010000010010110000010000001 000101011100001000110101110000 000101111101001001111101111110 111011001010111110100010100001 000100010100000000010001010000 /; s/(?<=.)/\0/g for my @strings1 = @strings; # "abc" => "a\0b\0c\0" s/(?=.)/\0/g for my @strings2 = @strings; # "abc" => "\0a\0b\0c" for my $i (0..$#strings) { for my $j ($i+1..$#strings) { my %c; ++$c{$_} for ($strings1[$i] \| $strings2[$j]) =~ /../g; print(join("\t", $i, $j, $c{'00'}\|\|0, $c{'01'}\|\|0, $c{'10'}\|\|0, +$c{'11'}\|\|0), "\n"); } } [download]	[reply] [d/l]
Re^2: Speeding permutation counting by ikegami (Patriarch) on Jul 18, 2007 at 16:55 UTC
Nevermind. It was an interesting idea, but it sucks. `Rate ikegami albert browseruk ikegami 5.76/s -- -26% -89% albert 7.84/s 36% -- -86% browseruk 54.8/s 851% 599% --` [download] Both ikegami and albert's version will be faster per string when processing 1000 strings instead of 100 (since they both have a big overhead), but not enough to even approach BrowserUk's version. Benchmark code: <Reveal this spoiler or all in this thread>	[reply] [d/l] [select]
Re: Speeding permutation counting by leocharre (Priest) on Jul 18, 2007 at 14:36 UTC
Is this what you want, as tye pointed out, `#!/usr/bin/perl -w use strict; use Smart::Comments '###'; my $report ={}; my @strings = qw( 0101 0001 ); for (@strings){ my $string = $_; $report->{$_}++ for $string=~/../g; } ### $report ### should be ### 01 3 ### 00 1` [download] Or should the result be 01=3x, 10=1x, 00=2x ?	[reply] [d/l]
Re: Speeding permutation counting by moritz (Cardinal) on Jul 18, 2007 at 14:21 UTC
I haven't tried it, but I'd suggest the following for each string: Use unpack to split your string into pairs, and then unpack again, this time discarding the first character (otherwise you would get only pairs where the first index is even). Then use either your counting technique, or a hash. And benchmark it! Update: This posting is irrelevant since I interpreted the original question wrongly. Perl 6 in German	[reply]
Re: Speeding permutation counting by sgt (Deacon) on Jul 18, 2007 at 14:58 UTC
Have you tried representing your strings as bit-strings. Maybe vec() and related are not the fastest, but Bit::Vector::Minimal or Bit::Vector could be useful. You talk about unique pairs. If you don't have duplicated strings (which can be avoided via implicit normalization in your choice of representation or a cure such as 'sort -u') then any two strings make a unique pair. You have n(n+1)/2 pairs or n(n-1)/2 if you don't want the "diagonal" i.e pairs like (s, s). cheers --stephan	[reply]
Re: Speeding permutation counting by NiJo (Friar) on Jul 19, 2007 at 21:16 UTC
Most solutions as of yet are working on a char by char basis. Treating them as numbers seems to be more natural. Using binary does not handle the 1+1 carry over. Switching to decimal provides the required carry over space: Basically you do this: `1 * 1010 + 2 * 1100 = ---------- 3210` [download] I'll leave it to others to count 0s 1s 2s and 3s, put some loops around and to benchmark it. This drastically reduces the required operations. I expect it to be some orders of magnitude faster. Math::BigInt also provides a C speedup.	[reply] [d/l]
Re: Speeding permutation counting by Limbic~Region (Chancellor) on Jul 19, 2007 at 15:49 UTC
albert, I found this a neat an interesting problem but I haven't had a chance to try out my idea. I do have a couple of questions first: Do you need the count breakout per position or just total over all strings? With 010 and 101, should the count of "01" and "10" be 2 per position for a total of 6? In other words, do each strings need to be compared twice taking turns being on top? I don't think any answer to the questions would invalidate my idea but it might change the implementation a bit. Since I do not have time to try it myself, I am laying it out here. I believe you should be able to mathematically determine the counts without doing the comparisons. I believe all you need to do is count how many 0s and 1s there are for each position. To further speed up the process, I might even consider doing the counting using Inline::C. Cheers - L~R	[reply]
Re^2: Speeding permutation counting by BrowserUk (Patriarch) on Jul 19, 2007 at 20:20 UTC
If you just need to count the number of set bits there is no need to drop into Inline C. This example from perlfunc unpack will do that quite efficiently: `my $setBits = unpack("%32b*", $strings[ $i ];` [download] but I don't see how you get from that to the 4 numbers the OP wants? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^3: Speeding permutation counting by Limbic~Region (Chancellor) on Jul 19, 2007 at 20:46 UTC
BrowserUk, What I am suggesting is that if you know the number of 1s and 0s in all strings in position 1, you should be able to calculate the number of '11', '00', '10', and '01' at position 1 without comparing any of the strings. This would still have to be done for all strings at all positions - I am just trying to eliminate the comparisons. Cheers - L~R	[reply]
Re^4: Speeding permutation counting by BrowserUk (Patriarch) on Jul 19, 2007 at 21:15 UTC
Re^5: Speeding permutation counting by Limbic~Region (Chancellor) on Jul 20, 2007 at 00:08 UTC
Some notes below your chosen depth have not been shown here


Your skill will accomplish what the force of many cannot
	PerlMonks