Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

How to use regular expression for this?

by davidz2f (Initiate)
on Aug 12, 2005 at 22:02 UTC ( [id://483426]=perlquestion: print w/replies, xml ) Need Help??

davidz2f has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering how to do this in Perl: Say a pattern is "ABCDEFGHI", how to determine a string matchs any three letters in the right order: for example, the matched strings are: ABCXXXXXX; AXCDXXXXX; AXXDXFXXX; .... AXXXXXXHI; ........ XXXXXXGHI; How to use regular expression for this? (In the real program, the patern could be >70 letters long and may require matching 35 letters in the right order).

Replies are listed 'Best First'.
Re: How to use regular expression for this?
by AReed (Pilgrim) on Aug 12, 2005 at 22:19 UTC
    Must you use a regular expression? If not, this works:
    #!/usr/bin/perl use warnings; use strict; my @teststrings = qw(ABCXXXXXX AXCXXXXXX AXXDXFXXX AXXXXXXHI XXXXXXGHI + XXXXXXXXX AXXXXXXXI); my $pattern = "ABCDEFGHI"; my $desired_matches = 3; for (@teststrings) { my $result = $pattern ^ $_; my $matches = $result =~ tr/\0/ /; print "$_\n" if $matches == $desired_matches; }
      I don't have to use regular expression. Can you explain the meaning of: my $result = $pattern ^ $_; my $matches = $result =~ tr/\0/ /; Thanks!!
        To elaborate on QM's solution explanation of AReed's solution, the "^" (exclusive-OR) operator, applied to two strings, will treat them as bit vectors, and return a string whose characters each represent 8 bits, in proper order, from the XOR result.

        If the two strings have the same character at the same offset, the XOR result will be zero (null byte); the "tr/\0/ /" operation will not only convert the nulls to spaces, but will also return the number of nulls found and converted.

        It's okay if the two strings are not the same length, because the shorter one will be padded with nulls as needed (and any value XOR 0 gives back that same value).

        my $result = $pattern ^ $_;
        This XORs $pattern and $_ together, and stores it in $result.
        my $matches = $result =~ tr/\0//;
        This counts the number of matches, and puts that into $matches.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: How to use regular expression for this?
by planetscape (Chancellor) on Aug 12, 2005 at 23:05 UTC
Re: How to use regular expression for this? (LCS)
by tye (Sage) on Aug 12, 2005 at 23:53 UTC

    This looks like what Algorithm::Diff's LCS (longest common subsequence) does. It even has an optimized LCS_length().

    - tye        

      I assumed from the example strings given that the positions of the characters within the strings was important. That is, "ABCXXXXXX" would be a good match, but "XABCXXXXX" would not.

      If position is not important then my solution doesn't work at all. If my assumption was correct however, then using Algorithm::Diff::LCSidx to compare the pattern against the target string will work, but requires the additional step of checking to see if the contents of the two arrays of indices pointed to by the returned references are equivalent. Unless I missed something when skimming through the documentation that is.

      Thanks, tye. Hanging out around here I definitely learn something new (usually several somethings) every day.

        Yes, I chose to interpret the problem as described (N characters "in order" with no mention of position) and to interpret the lining up of the positions as merely a typographic choice to make the correspondence more visually obvious.

        There certainly are other ways to interpret the problem and I don't even assert that the interpetation that I chose is the most likely to be correct. (:

        If my assumption was correct however, then using Algorithm::Diff::LCSidx to compare the pattern against the target string will work, but requires the additional step of checking to see if the contents of the two arrays of indices pointed to by the returned references are equivalent.

        That'd work for the examples given, but could fail for a case like "CCCDDFFIII".

        - tye        

        Thanks a lot. AReed. Actually in my case the string and the pattern are usually not the same length, generally strings are longer than the length of the pattern. ( so string XXXXXABCXXXXXXXXXXXXXXXXXXX would be a match too.) That is why I initially think about regular expression. I guess I can still use your method by loop through the string and compare the substring with the same length as the pattern.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://483426]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-03-19 06:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found