http://www.perlmonks.org?node_id=400340

replicant4 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks, I have a serious problem and I would really appreciate and help that you can give me I have created two arrays (@site1 @site2) each of which contains string variables (called $site1, and $site2). an example of a $site is: $site = 'AATKKM' I need a script that will compare $site1 of @site1 with every $site variable in @site2. It should give me for example the following pairs (if $site2 = 'GGGGG') AG AG TG KG KG MG these will be my pairwise combinations for every combination that i find I need to have a counter that will increment by 1 e.g AG -> count_AG++; I don't expect anyone to give me a full working script but any ideas on the overall structure that it should have would be more than helpfull. Thanks in advance for your help

Replies are listed 'Best First'.
Re: Comparing two arrays and counting pairwise comparisons
by ikegami (Patriarch) on Oct 18, 2004 at 23:11 UTC

    I'm not sure if I understood correctly. Does this do the trick? I made the slight modification of putting the count of "AG"s (for example) in $count{AG}, not $count_AG.

    use strict; use warnings; use Algorithm::Loops qw( MapCar ); my @site1 = qw( AATKKM 123456 ); my @site2 = qw( GGGGGG !@#$%^ ); my $site1; my $site2; my @pairs; my %counts; foreach $site1 (@site1) { my @site1_parts = split(//, $site1); foreach $site2 (@site2) { my @site2_parts = split(//, $site2); MapCar { $counts{$_[0].$_[1]}++ } \@site1_parts, \@site2_parts; } } print($_, ': ', $counts{$_}, $/) foreach (sort keys %counts); __END__ output ====== 1!: 1 1G: 1 2@: 1 2G: 1 3#: 1 3G: 1 4$: 1 4G: 1 5%: 1 5G: 1 6G: 1 6^: 1 A!: 1 A@: 1 AG: 2 K$: 1 K%: 1 KG: 2 MG: 1 M^: 1 T#: 1 TG: 1
Re: Comparing two arrays and counting pairwise comparisons
by BrowserUk (Patriarch) on Oct 19, 2004 at 00:12 UTC
    #! perl -slw use strict; use List::Util qw[ min ]; use Data::Dumper; my @site1 = qw[ abcde cdefg efghi ]; my @site2 = qw[ zyxwv xwvut vutrs ]; my %counts; for my $site1 ( @site1 ){ for my $site2 ( @site2 ) { $counts{ substr( $site1, $_, 1 ) . substr( $site2, $_, 1 ) }++ + for 0 .. min( length( $site1 )-1, length( $site2 )-1 ); } } print Dumper \%counts; __END__ P:\test>400340.pl $VAR1 = { 'bu' => 1, 'cv' => 2, 'fu' => 2, 'av' => 1, 'hw' => 1, 'ax' => 1, 'du' => 2, 'gv' => 2, 'es' => 1, 'is' => 1, 'ct' => 1, 'iv' => 1, 'gx' => 1, 'fr' => 1, 'cx' => 2, 'dy' => 1, 'az' => 1, 'ev' => 3, 'ez' => 1, 'et' => 2, 'it' => 1, 'dw' => 2, 'hu' => 1, 'dr' => 1, 'ex' => 2, 'fy' => 1, 'cz' => 1, 'by' => 1, 'fw' => 2, 'bw' => 1, 'gs' => 1, 'hr' => 1, 'gt' => 2 };

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

      I agree that the description is suboptimal but this does not generate all the permutations. Having the second string as all the same char does make it ambiguous.

        Hmmm. Changed your tune I see...


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Comparing two arrays and counting pairwise comparisons
by borisz (Canon) on Oct 18, 2004 at 23:13 UTC
    I read the description twice and did not understand, what you want to do.
    Boris
Re: Comparing two arrays and counting pairwise comparisons
by tachyon (Chancellor) on Oct 19, 2004 at 00:19 UTC

    Your description is ambiguous. With the second string all the same char we have to infer either you want

    1. char1 str1 against char1 str2, char2 str1 against char2 str2 or you want all the permutaions ie
    2. char1 str1 against char(1..n) str2, char2 str1 against char(1..n) str2....

    I am assuming you want all the permutations. There are two general approaches. Brute force and the more clever approach. Brute force won't scale well as it is O(n^2). The clever approach is 0(n) plus an O(n^2) whisker that represents the number of possible tokens. As you can see both approaches yield the same results. Assuming those are the results you wanted ;-)

    use Data::Dumper; my $site1 = 'AATKKM' x 20; my $site2 = 'GGGGGA' x 20; # brute force my (%hash, $loops); for my $base1( split //, $site1 ) { for my $base2( split //, $site2 ) { $hash{"${base1}_$base2"}++; $loops++; } } # precount incidence of tokens my (%s1, %s2, %hash_e, $loops_e); do{ $s1{$_}++; $loops_e++ } for split //, $site1; do{ $s2{$_}++; $loops_e++ } for split //, $site2; # still loop within loop but the are now far fewer loops to do # as we only do one per token pair and calculate the total pairs # from our precount data for my $base1( keys %s1 ) { for my $base2( keys %s2 ) { $hash_e{"${base1}_$base2"} = $s1{$base1}*$s2{$base2}; $loops_e++; } } print "Brute force loops $loops\nEfficeient Loops $loops_e\n\n"; print Data::Dumper->Dump([\%hash, \%s1, \%s2, \%hash_e], [qw( hash s1 +s2 hash_e)] ); __DATA__ Brute force loops 14400 Efficeient Loops 248 $hash = { 'T_G' => '2000', 'A_A' => '800', 'T_A' => '400', 'M_G' => '2000', 'M_A' => '400', 'K_G' => '4000', 'K_A' => '800', 'A_G' => '4000' }; $s1 = { 'A' => '40', 'K' => '40', 'T' => '20', 'M' => '20' }; $s2 = { 'G' => '100', 'A' => '20' }; $hash_e = { 'T_G' => '2000', 'A_A' => '800', 'T_A' => '400', 'M_G' => '2000', 'M_A' => '400', 'K_G' => '4000', 'K_A' => '800', 'A_G' => '4000' };

    cheers

    tachyon

      (From memory!)

      Your algorithm is O(N^2) but mine is O(N)

      True enough, but which N?

      I don't think your code comes close to answering the question. Your treating all the strings in each array as a single concatenated string. I'm not, and I don't believe that's what the OP wants.

      Using your algorithm on my test data produces this:

      Which is solving a totally different problem to the one I belive the OP wants to solve.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon