Matching or masking specific characters in an array of strings

duggles has asked for the wisdom of the Perl Monks concerning the following question:

(apologies in advance for this title - I couldn't think of a more explicit way to summarize the question.... ;-)

This code is used in a cryptogram solving script. After a pattern match for a word is pulled from a dictionary, I can usually find an exclusive single match, which happens often enough to make me happy, but many times there is a group of matches found that I've just ignored. But then it dawned on me that in some cases, especially those with just a few results, some of the letters DID match and I could use just those letters in my translation. I started working on a way of building a mask of just the matching letters.

The code below works, giving me a string mask (e.g. '-ou--e--' for 'yourself' and 'southern') and a hash containing the letters and positions that match, but I'm certain there must be a more efficient, faster, more elegant way to do this. I thought about a regex but that's a very weak area for me so I thought I'd throw this out there and see if I could learn something.

Any help towards my education is sincerely appreciated!

FYI and if it matters - I'm running on Windows XP with perl-5.10

#!/usr/bin/perl -w
use strict;

my %groups = ( 
    group1 => ['cooling', 'rooting', 'hooting', 'looking', 'doormat', 
+'cooking', 'cookies', 'noodles'], # desired result "-oo----"
    group2 => ['yourself', 'southern'], # desired result "-ou--e--"
    group3 => ['arden', 'arlen', 'asked'] # desired result "a--e-"
);

print " = = = = = = = = = = = = = = = = = = = \n\n";

foreach my $group (keys %groups) {
    my $mask = '';
    my %matches = ();
    my @group = @{$groups{$group}};
    foreach (@{$groups{$group}}) {
        print "\t$_\n";
    }

    my $group_count = $#group;
    my $group_length = length $group[0];
    print "group word count:$group_count group string length:$group_le
+ngth\n";
    for (my $i = 0; $i < $group_count +1; $i++) {
        for (my $j = 0; $j < $group_length; $j++) {
            if ($i) { 
                if (substr ($group[$i],$j,1) eq substr ($group[$i-1],$
+j,1)) { 
                    print substr ($group[$i],$j,1);
                    if (substr ($mask,$j,1) ne "-") {    substr ($mask
+,$j,1) = substr ($group[$i],$j,1); }
                    $matches{$j}++
                }
                else { 
                    print "-"; 
                    substr ($mask,$j,1) = "-";
                }
            }
        }
        if ($i) { 
            print "($group[$i-1])"; 
            print " ($group[$i])  ";
        }
        print " $mask\n";
    }
    print "\n";
    
    print "letter\t\tposition\n";
    foreach my $key (sort (keys(%matches))) {
        if ($matches{$key} == $group_count) {
            print  substr($group[0],$key,1), " \t\t$key \n";
        }
    }
    print "\n";
    print " = = = = = = = = = = = = = = = = = = = \n\n";
}
[download]

OUTPUT:
c:\mycode>perlmonk-code-example.pl
 = = = = = = = = = = = = = = = = = = =

        yourself
        southern
group word count:1 group string length:8

-ou--e--(yourself) (southern)   -ou--e--

letter          position
o               1
u               2
e               5

 = = = = = = = = = = = = = = = = = = =

        cooling
        rooting
        hooting
        looking
        doormat
        cooking
        cookies
        noodles
group word count:7 group string length:7

-oo-ing(cooling) (rooting)   -oo-ing
-ooting(rooting) (hooting)   -oo-ing
-oo-ing(hooting) (looking)   -oo-ing
-oo----(looking) (doormat)   -oo----
-oo----(doormat) (cooking)   -oo----
cooki--(cooking) (cookies)   -oo----
-oo--es(cookies) (noodles)   -oo----

letter          position
o               1
o               2

 = = = = = = = = = = = = = = = = = = =

        arden
        arlen
        asked
group word count:2 group string length:5

ar-en(arden) (arlen)   ar-en
a--e-(arlen) (asked)   a--e-

letter          position
a               0
e               3

 = = = = = = = = = = = = = = = = = = =
[download]

Life is short, but it's wide -- Chuck Pyle

Comment on Matching or masking specific characters in an array of strings Select or Download Code

Replies are listed 'Best First'.
Re: Matching or masking specific characters in an array of strings by almut (Canon) on Dec 19, 2008 at 14:25 UTC
Here's a bit-logic based approach: #!/usr/bin/perl -w use strict; my %groups = ( group1 => ['cooling', 'rooting', 'hooting', 'looking', 'doormat', +'cooking', 'cookies', 'noodles'], # desired result "-oo----" group2 => ['yourself', 'southern'], # desired result "-ou--e--" group3 => ['arden', 'arlen', 'asked'] # desired result "a--e-" ); sub show_bits { #print unpack("B*", shift), "\n"; } for my $g (values %groups) { my $w1 = $g->[0]; my $and = "\xff" x length($w1); my $or = "\0" x length($w1); for my $w (@$g) { print "$w\n"; #print join(" " x 7, split //, $w), "\n"; show_bits($w); $and &= $w; show_bits($and); $or \|= $w; show_bits($or); } my $xor = $and ^ $or; show_bits($xor); $xor =~ tr/\0/\xff/c; show_bits($xor); my $mask = ~$xor; show_bits($mask); my $common = $w1 & $mask; $common =~ tr/\0/-/; print "$common\n"; my $i = -1; print map {$i++; $_ ne "-" ? "$_ : $i\n" : ()} split //, $common; print "\n"; } __END__ yourself southern -ou--e-- o : 1 u : 2 e : 5 cooling rooting hooting looking doormat cooking cookies noodles -oo---- o : 1 o : 2 arden arlen asked a--e- a : 0 e : 3 [download] Uncomment the commented out `print`s to see how the bit strings are being transformed/combined... (Update: fixed cut-n-paste error `$and ^= $or` — thanks johngg)	[reply] [d/l] [select]
Re^2: Matching or masking specific characters in an array of strings by duggles (Acolyte) on Dec 19, 2008 at 14:50 UTC
Thank you very much almut! We can consider this question closed! I knew I could count on the monks! In addition to my regex skills being way below what they should be, my bit manipulation and use of packing is much worse... The code is elegant, effective, and I'm certain much faster than what I had. I will be studying this code to see if I can use this sort of logic to help me in future coding efforts. Thanks again! If the knowledge and innovative thinking of the perlmonks could only be harnessed for a single purpose - I'm sure world peace would be a piece of cake!!!! --- duggles ;-) Life is short, but it's wide -- Chuck Pyle	[reply]
Re^2: Matching or masking specific characters in an array of strings by nbezzala (Scribe) on Apr 12, 2012 at 16:07 UTC
Can someone please explain what is happening here? I'm wondering why a solution like this isn't mentioned in this thread - $w =~ s/^aeiou/-/g;	[reply]
Re^3: Matching or masking specific characters in an array of strings by jdporter (Paladin) on Apr 12, 2012 at 16:46 UTC
They're not talking about "masking away" characters from a given character set in a single string. They're talking about masking multiple strings together in order find out which characters those strings have in common. Hence: southern - yourself ---------- -ou--e-- (where `-` (minus) is the masking operation.) I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.	[reply] [d/l]
Re: Matching or masking specific characters in an array of strings by jeanluca (Deacon) on Dec 19, 2008 at 13:53 UTC
I think I understand what you're looking for. Here is an example which I think will help you `my $x1 = "hello" ; my $x2 = "balln" ; $x1 =~ s/[^$x2]/-/g ; print $x1 ;` [download] Produces: --ll- If not, ignore this post :) Cheers LuCa UPDATE: I think I have to admit that my approach doesn't work at all, sorry	[reply] [d/l]
Re^2: Matching or masking specific characters in an array of strings by duggles (Acolyte) on Dec 19, 2008 at 14:38 UTC
thanks! you're right it doesn't quite cover the complete problem, but it furthers my regex education! Life is short, but it's wide -- Chuck Pyle	[reply]
Re: Matching or masking specific characters in an array of strings by oko1 (Deacon) on Dec 19, 2008 at 14:39 UTC
Here's a successive character-by-character approach that might work for you: #!/usr/bin/perl -w use strict; my %groups = ( group1 => ['cooling', 'rooting', 'hooting', 'looking', 'doormat', +'cooking', 'cookies', 'noodles'], # desired result "-oo----" group2 => ['yourself', 'southern'], # desired result "-ou--e--" group3 => ['arden', 'arlen', 'asked'] # desired result "a--e-" ); my %matches; for my $group (keys %groups){ my $last; for my $word (@{$groups{$group}}){ if (!defined $last){ $last = $word; next; } else { die "Length mismatch ('$last'/'$word')\n" unless length($last) == length($word); my @chars = split //, $word; for (0..$#chars){ substr $last, $_, 1, "-" unless $chars[$_] eq substr $last, $_, 1; } } } $matches{$group} = $last; } print "$_: $matches{$_}\n" for sort keys %matches; [download] Output: `group1: -oo---- group2: -ou--e-- group3: a--e-` [download] -- "Language shapes the way we think, and determines what we can think about." -- B. L. Whorf	[reply] [d/l] [select]
Re^2: Matching or masking specific characters in an array of strings by duggles (Acolyte) on Dec 19, 2008 at 14:57 UTC
This is also a good method, but I think the bit logic code from almut is probably going to be faster, especially if I have a longer list to process. Thanks anyway!! There are some ideas here that will help me along! Life is short, but it's wide -- Chuck Pyle	[reply]
Re: Matching or masking specific characters in an array of strings by pat_mc (Pilgrim) on Dec 19, 2008 at 22:16 UTC
Now, I realise that my post is not going to make any ground-breaking contribution anymore this late in the process ... and after the neat bit-wise operation solution suggested by almut there is little to be added. Still, I would like to assert that this is a neat problem and offer my humble solution which may at least win a prize for conciseness (and if not - might at least get considered in the Perl obfuscation section ;-). The noteworthy feature about this implementation is that it accomplishes the entire analysis process in a single regex! Here it goes: `#! /usr/bin/perl/ -w use strict; my @strings = qw ( cooling rooting hooting looking doormat ); our @common_letters; my $reference = shift @strings; () = $reference =~ /(.)(?{ my $letter = $1; my $position = $-[0]; my $bolean = 1; for ( @strings ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean ); + })/gx; print ( $common_letters[ $_ ] \|\| "-" ) for ( 0 .. length $reference );` [download] P. S.: It certainly will challenge on the regex-side as you had intended originally ;-) Lemme know what you think!	[reply] [d/l]
Re^2: Matching or masking specific characters in an array of strings by johngg (Canon) on Dec 20, 2008 at 00:21 UTC
You are going to get one extra hyphen because `print ( $common_letters[ $_ ] \|\| "-" ) for ( 0 .. length $reference );` [download] should be `print ( $common_letters[ $_ ] \|\| "-" ) for ( 0 .. length( $reference ) + - 1 );` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re^3: Matching or masking specific characters in an array of strings by pat_mc (Pilgrim) on Dec 25, 2008 at 22:13 UTC
Thanks for pointing this out, johngg! You are, of course, perfectly right ... I apologise for overlooking this detail. As regards the overall approach of my code, however, I am glad to report that this correction changes nothing.	[reply]

Back to Seekers of Perl Wisdom