How to use regular expression for this?

by davidz2f (Initiate)
 on Aug 12, 2005 at 22:02 UTC Need Help??
davidz2f has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering how to do this in Perl: Say a pattern is "ABCDEFGHI", how to determine a string matchs any three letters in the right order: for example, the matched strings are: ABCXXXXXX; AXCDXXXXX; AXXDXFXXX; .... AXXXXXXHI; ........ XXXXXXGHI; How to use regular expression for this? (In the real program, the patern could be >70 letters long and may require matching 35 letters in the right order).
• Comment on How to use regular expression for this?

Replies are listed 'Best First'.
Re: How to use regular expression for this?
by AReed (Pilgrim) on Aug 12, 2005 at 22:19 UTC
Must you use a regular expression? If not, this works:
```#!/usr/bin/perl
use warnings;
use strict;

my @teststrings = qw(ABCXXXXXX AXCXXXXXX AXXDXFXXX AXXXXXXHI XXXXXXGHI
+ XXXXXXXXX AXXXXXXXI);

my \$pattern = "ABCDEFGHI";
my \$desired_matches = 3;

for (@teststrings) {
my \$result = \$pattern ^ \$_;
my \$matches = \$result =~ tr/\0/ /;
print "\$_\n" if \$matches == \$desired_matches;
}
I don't have to use regular expression. Can you explain the meaning of: my \$result = \$pattern ^ \$_; my \$matches = \$result =~ tr/\0/ /; Thanks!!
To elaborate on QM's solution explanation of AReed's solution, the "^" (exclusive-OR) operator, applied to two strings, will treat them as bit vectors, and return a string whose characters each represent 8 bits, in proper order, from the XOR result.

If the two strings have the same character at the same offset, the XOR result will be zero (null byte); the "tr/\0/ /" operation will not only convert the nulls to spaces, but will also return the number of nulls found and converted.

It's okay if the two strings are not the same length, because the shorter one will be padded with nulls as needed (and any value XOR 0 gives back that same value).

```my \$result = \$pattern ^ \$_;
This XORs \$pattern and \$_ together, and stores it in \$result.
```my \$matches = \$result =~ tr/\0//;
This counts the number of matches, and puts that into \$matches.

-QM
--
Quantum Mechanics: The dreams stuff is made of

Re: How to use regular expression for this?
by planetscape (Chancellor) on Aug 12, 2005 at 23:05 UTC
Re: How to use regular expression for this? (LCS)
by tye (Sage) on Aug 12, 2005 at 23:53 UTC

This looks like what Algorithm::Diff's LCS (longest common subsequence) does. It even has an optimized LCS_length().

- tye

I assumed from the example strings given that the positions of the characters within the strings was important. That is, "ABCXXXXXX" would be a good match, but "XABCXXXXX" would not.

If position is not important then my solution doesn't work at all. If my assumption was correct however, then using Algorithm::Diff::LCSidx to compare the pattern against the target string will work, but requires the additional step of checking to see if the contents of the two arrays of indices pointed to by the returned references are equivalent. Unless I missed something when skimming through the documentation that is.

Thanks, tye. Hanging out around here I definitely learn something new (usually several somethings) every day.

Yes, I chose to interpret the problem as described (N characters "in order" with no mention of position) and to interpret the lining up of the positions as merely a typographic choice to make the correspondence more visually obvious.

There certainly are other ways to interpret the problem and I don't even assert that the interpetation that I chose is the most likely to be correct. (:

If my assumption was correct however, then using Algorithm::Diff::LCSidx to compare the pattern against the target string will work, but requires the additional step of checking to see if the contents of the two arrays of indices pointed to by the returned references are equivalent.

That'd work for the examples given, but could fail for a case like "CCCDDFFIII".

- tye

Thanks a lot. AReed. Actually in my case the string and the pattern are usually not the same length, generally strings are longer than the length of the pattern. ( so string XXXXXABCXXXXXXXXXXXXXXXXXXX would be a match too.) That is why I initially think about regular expression. I guess I can still use your method by loop through the string and compare the substring with the same length as the pattern.

Create A New User
Node Status?
node history
Node Type: perlquestion [id://483426]
Approved by Corion
help
Chatterbox?
 [Corion]: (that module has been deprecated by its author already, so that's fair. Although I wonder why the backtracking can't be fixed to handle the formfeeds gracefully) [choroba]: not enough tuits? [Corion]: choroba: Yeah, maybe. I'm also unaware of who uses Email:: modules, but that's more my limited horizon of things ;) [Corion]: Ah - there even is the replacement of Email::Address::XS , by the bug reporter, which hopefully fixes this bug already ;)

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2018-06-20 11:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Should cpanminus be part of the standard Perl release?

Results (116 votes). Check out past polls.

Notices?