http://www.perlmonks.org?node_id=731537

duggles has asked for the wisdom of the Perl Monks concerning the following question:

(apologies in advance for this title - I couldn't think of a more explicit way to summarize the question.... ;-)

This code is used in a cryptogram solving script. After a pattern match for a word is pulled from a dictionary, I can usually find an exclusive single match, which happens often enough to make me happy, but many times there is a group of matches found that I've just ignored. But then it dawned on me that in some cases, especially those with just a few results, some of the letters DID match and I could use just those letters in my translation. I started working on a way of building a mask of just the matching letters.

The code below works, giving me a string mask (e.g. '-ou--e--' for 'yourself' and 'southern') and a hash containing the letters and positions that match, but I'm certain there must be a more efficient, faster, more elegant way to do this. I thought about a regex but that's a very weak area for me so I thought I'd throw this out there and see if I could learn something.

Any help towards my education is sincerely appreciated!

FYI and if it matters - I'm running on Windows XP with perl-5.10

#!/usr/bin/perl -w use strict; my %groups = ( group1 => ['cooling', 'rooting', 'hooting', 'looking', 'doormat', +'cooking', 'cookies', 'noodles'], # desired result "-oo----" group2 => ['yourself', 'southern'], # desired result "-ou--e--" group3 => ['arden', 'arlen', 'asked'] # desired result "a--e-" ); print " = = = = = = = = = = = = = = = = = = = \n\n"; foreach my $group (keys %groups) { my $mask = ''; my %matches = (); my @group = @{$groups{$group}}; foreach (@{$groups{$group}}) { print "\t$_\n"; } my $group_count = $#group; my $group_length = length $group[0]; print "group word count:$group_count group string length:$group_le +ngth\n"; for (my $i = 0; $i < $group_count +1; $i++) { for (my $j = 0; $j < $group_length; $j++) { if ($i) { if (substr ($group[$i],$j,1) eq substr ($group[$i-1],$ +j,1)) { print substr ($group[$i],$j,1); if (substr ($mask,$j,1) ne "-") { substr ($mask +,$j,1) = substr ($group[$i],$j,1); } $matches{$j}++ } else { print "-"; substr ($mask,$j,1) = "-"; } } } if ($i) { print "($group[$i-1])"; print " ($group[$i]) "; } print " $mask\n"; } print "\n"; print "letter\t\tposition\n"; foreach my $key (sort (keys(%matches))) { if ($matches{$key} == $group_count) { print substr($group[0],$key,1), " \t\t$key \n"; } } print "\n"; print " = = = = = = = = = = = = = = = = = = = \n\n"; }
OUTPUT: c:\mycode>perlmonk-code-example.pl = = = = = = = = = = = = = = = = = = = yourself southern group word count:1 group string length:8 -ou--e--(yourself) (southern) -ou--e-- letter position o 1 u 2 e 5 = = = = = = = = = = = = = = = = = = = cooling rooting hooting looking doormat cooking cookies noodles group word count:7 group string length:7 -oo-ing(cooling) (rooting) -oo-ing -ooting(rooting) (hooting) -oo-ing -oo-ing(hooting) (looking) -oo-ing -oo----(looking) (doormat) -oo---- -oo----(doormat) (cooking) -oo---- cooki--(cooking) (cookies) -oo---- -oo--es(cookies) (noodles) -oo---- letter position o 1 o 2 = = = = = = = = = = = = = = = = = = = arden arlen asked group word count:2 group string length:5 ar-en(arden) (arlen) ar-en a--e-(arlen) (asked) a--e- letter position a 0 e 3 = = = = = = = = = = = = = = = = = = =
Life is short, but it's wide -- Chuck Pyle

Replies are listed 'Best First'.
Re: Matching or masking specific characters in an array of strings
by almut (Canon) on Dec 19, 2008 at 14:25 UTC

    Here's a bit-logic based approach:

    #!/usr/bin/perl -w use strict; my %groups = ( group1 => ['cooling', 'rooting', 'hooting', 'looking', 'doormat', +'cooking', 'cookies', 'noodles'], # desired result "-oo----" group2 => ['yourself', 'southern'], # desired result "-ou--e--" group3 => ['arden', 'arlen', 'asked'] # desired result "a--e-" ); sub show_bits { #print unpack("B*", shift), "\n"; } for my $g (values %groups) { my $w1 = $g->[0]; my $and = "\xff" x length($w1); my $or = "\0" x length($w1); for my $w (@$g) { print "$w\n"; #print join(" " x 7, split //, $w), "\n"; show_bits($w); $and &= $w; show_bits($and); $or |= $w; show_bits($or); } my $xor = $and ^ $or; show_bits($xor); $xor =~ tr/\0/\xff/c; show_bits($xor); my $mask = ~$xor; show_bits($mask); my $common = $w1 & $mask; $common =~ tr/\0/-/; print "$common\n"; my $i = -1; print map {$i++; $_ ne "-" ? "$_ : $i\n" : ()} split //, $common; print "\n"; } __END__ yourself southern -ou--e-- o : 1 u : 2 e : 5 cooling rooting hooting looking doormat cooking cookies noodles -oo---- o : 1 o : 2 arden arlen asked a--e- a : 0 e : 3

    Uncomment the commented out prints to see how the bit strings are being transformed/combined...

    (Update: fixed cut-n-paste error $and ^= $or — thanks johngg)

      Thank you very much almut! We can consider this question closed!

      I knew I could count on the monks! In addition to my regex skills being way below what they should be, my bit manipulation and use of packing is much worse...

      The code is elegant, effective, and I'm certain much faster than what I had. I will be studying this code to see if I can use this sort of logic to help me in future coding efforts.

      Thanks again!

      If the knowledge and innovative thinking of the perlmonks could only be harnessed for a single purpose - I'm sure world peace would be a piece of cake!!!!

      --- duggles ;-)

      Life is short, but it's wide -- Chuck Pyle
      Can someone please explain what is happening here? I'm wondering why a solution like this isn't mentioned in this thread - $w =~ s/^aeiou/-/g;

        They're not talking about "masking away" characters from a given character set in a single string. They're talking about masking multiple strings together in order find out which characters those strings have in common. Hence:

        
          southern
        - yourself
        ----------
          -ou--e--
        
        

        (where - (minus) is the masking operation.)

        I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.
Re: Matching or masking specific characters in an array of strings
by jeanluca (Deacon) on Dec 19, 2008 at 13:53 UTC
    I think I understand what you're looking for. Here is an example which I think will help you
    my $x1 = "hello" ; my $x2 = "balln" ; $x1 =~ s/[^$x2]/-/g ; print $x1 ;
    Produces: --ll-

    If not, ignore this post :)

    Cheers LuCa

    UPDATE: I think I have to admit that my approach doesn't work at all, sorry

      thanks! you're right it doesn't quite cover the complete problem, but it furthers my regex education!

      Life is short, but it's wide -- Chuck Pyle
Re: Matching or masking specific characters in an array of strings
by oko1 (Deacon) on Dec 19, 2008 at 14:39 UTC

    Here's a successive character-by-character approach that might work for you:

    #!/usr/bin/perl -w use strict; my %groups = ( group1 => ['cooling', 'rooting', 'hooting', 'looking', 'doormat', +'cooking', 'cookies', 'noodles'], # desired result "-oo----" group2 => ['yourself', 'southern'], # desired result "-ou--e--" group3 => ['arden', 'arlen', 'asked'] # desired result "a--e-" ); my %matches; for my $group (keys %groups){ my $last; for my $word (@{$groups{$group}}){ if (!defined $last){ $last = $word; next; } else { die "Length mismatch ('$last'/'$word')\n" unless length($last) == length($word); my @chars = split //, $word; for (0..$#chars){ substr $last, $_, 1, "-" unless $chars[$_] eq substr $last, $_, 1; } } } $matches{$group} = $last; } print "$_: $matches{$_}\n" for sort keys %matches;

    Output:

    group1: -oo---- group2: -ou--e-- group3: a--e-

    --
    "Language shapes the way we think, and determines what we can think about."
    -- B. L. Whorf
      This is also a good method, but I think the bit logic code from almut is probably going to be faster, especially if I have a longer list to process.

      Thanks anyway!! There are some ideas here that will help me along!

      Life is short, but it's wide -- Chuck Pyle
Re: Matching or masking specific characters in an array of strings
by pat_mc (Pilgrim) on Dec 19, 2008 at 22:16 UTC
    Now, I realise that my post is not going to make any ground-breaking contribution anymore this late in the process ... and after the neat bit-wise operation solution suggested by almut there is little to be added. Still, I would like to assert that this is a neat problem and offer my humble solution which may at least win a prize for conciseness (and if not - might at least get considered in the Perl obfuscation section ;-). The noteworthy feature about this implementation is that it accomplishes the entire analysis process in a single regex!

    Here it goes:
    #! /usr/bin/perl/ -w use strict; my @strings = qw ( cooling rooting hooting looking doormat ); our @common_letters; my $reference = shift @strings; () = $reference =~ /(.)(?{ my $letter = $1; my $position = $-[0]; my $bolean = 1; for ( @strings ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean ); + })/gx; print ( $common_letters[ $_ ] || "-" ) for ( 0 .. length $reference );

    P. S.: It certainly will challenge on the regex-side as you had intended originally ;-) Lemme know what you think!

      You are going to get one extra hyphen because

      print ( $common_letters[ $_ ] || "-" ) for ( 0 .. length $reference );

      should be

      print ( $common_letters[ $_ ] || "-" ) for ( 0 .. length( $reference ) + - 1 );

      Cheers,

      JohnGG

        Thanks for pointing this out, johngg!

        You are, of course, perfectly right ... I apologise for overlooking this detail. As regards the overall approach of my code, however, I am glad to report that this correction changes nothing.