osfameron has asked for the wisdom of the Perl Monks concerning the following question:
I've recently got very interested in Ascii art and animation.
I've stumbled across an interesting problem: how to search
for one Ascii picture within another.
e.g. does the picture:
MM
oo
<
-/ contain /^\ ?
/^\ C
: C :
8===8
|^|
|||
-~-~
In pseudocode the best algorithm I've found is:
Data
- Target list
- Match list (the picture to search for)
- Results list
Main sub
- Escape the match list for regex special characters, e.g. .+*?()[]{}
- For each line of target,
- for each match found against first line of match, run submatch specifying the position of the match.
- If submatch returned true, add the line and position to the results list.
- Return all results.
submatch sub
- Taking next lines of target & match, return
false if there is no match at specified position.
- Return a true value (like the position of the match)
if there are no more lines left of the match.
- Return the value of the *next* submatch (e.g.
recurse)
This is of course just a search for a 2 dimensional literal
string, but it would be as easy to search for a list of
regexes. Is there any practical use for that?
Is there already a module for doing this type of match?
Or can anyone suggest a better algorithm? (I think that
recursing makes sense for this problem, but I'm open to
suggestions!)
Cheerio!
Osfameron
http://osfameron.perlmonk.org/chickenman/
Re: Two-dimensional match/regex?
by rob_au (Abbot) on May 22, 2002 at 10:31 UTC
|
I'm sure I must be missing the problem at hand to a certain extent, but have you looked at the m option for regular expression matches which allows for multi-line matches. For example, the regular expression match for the sample match given may look something like the following - \/\^\\\n.+C - to match:
- the character sequence /^\,
- a new-line character (\n),
- one or more characters (.+), and,
- the character, C.
This regular expression can be used in conjunction with the multi-line match option for regular expressions as follows, similar to how one would use the case-insensitive option for regular expressions:
if ($target =~ /\/\^\\\n.+C/m) {
}
For further information see perlman:perlre
Update - This regular expression has been tested and does work. The effectiveness of a multi-line regular expression match can be tailored within the regular expression itself - That is, the more specific which you make your regular expression, the greater the specificity of the match, but as with all targeted solutions, there is a trade-off between sensitivity and specificity.
| [reply] [d/l] [select] |
|
I think the above won't work as such, because it might
find false matches (e.g. where the second line isn't
aligned with the first.)
But, I could find the first line and then generate
a multi-line regex with the rest of the search string, using /m as you suggest: something like
/^.{$pos}$string1.*\n^.{$pos}$string2.*/m
to make it search only at the right position...
(Apologies for dodgy regex, not tested and probably wrong).
thanks for the tip, I should have
thought of that already!
Cheerio!
Osfameron
http://osfameron.perlmonk.org/chickenman/
| [reply] [d/l] [select] |
|
You could always surround the terms which lie between the
stuff you want to match with (), and check that $1, $2,
etc. are the same length as the first, second, etc. line
matched.
| [reply] |
Re: Two-dimensional match/regex?
by Aristotle (Chancellor) on May 22, 2002 at 14:26 UTC
|
| [reply] [d/l] |
Re: Two-dimensional match/regex?
by educated_foo (Vicar) on May 22, 2002 at 15:59 UTC
|
You might be able to use the (??{ ... }) construct to construct a regex with each line anchored at the right place. I haven't worked through the details, but it seems possible. Multi-dimensional pattern matching is actually an area of active research, and there are some interesting (and complex) algorithms out there to do it. Depending on time and level of interest, you might want to take a look on citeseer for "two dimensional pattern matching".
/s
Update: So, I actually tried this, and found you probably don't want to go this way:
my $pat = qr{^
(?{$pre0 = pos})
(.*)
(?{$x = pos() - $pre0; $pre = qr/.{$x}/; print STDERR "$pre\n
+"})
aa
(?{$post0 = pos})
(.*)
(?{$x =pos() - $post0; $post = qr/.{$x}/; print STDERR "$post
+\n"})
$
^(??{ $pre }) bb (??{ $post }) $
^(??{ $pre }) cc (??{ $post }) $
}xm;
is intended to match
aa
bb
cc
but in fact yields:
perl% perl -w 2dpm.pl
(?-xism:.{8})
zsh: 4283 segmentation fault (core dumped)
perl% perl -v
This is perl, v5.6.1 built for ppc-linux-thread-multi
Not sure if this is my fault or not.
| [reply] [d/l] [select] |
Re: Two-dimensional match/regex?
by robin (Chaplain) on May 23, 2002 at 16:24 UTC
|
As far as I know, the best algorithm for doing exact two-dimensional matching is still the one that was
published in 1993 by Ricardo Baeza-Yates and
Mireille Régnier. You can download
their paper about it.
I don't know much about two-dimensional regular expressions, but this
looks like a good place to start researching the subject if you're interested. | [reply] |
Re: Two-dimensional match/regex?
by Anonymous Monk on May 23, 2002 at 13:20 UTC
|
Here's a subroutine that solves the problem if you want
an *exact* match - no regex supported. The solution can
be extended to regexes, but you have to be careful,
iterating with m//g over a line to "find all matches"
will fail, as m//g will never find overlapping matches.
sub multi_match {
my ($large, $small) = @_;
my @large = split /\n/ => $large;
my @small = split /\n/ => $small;
my $first = $small [0];
foreach my $i (0 .. @large - @small) {
my $pos = 0;
LOOP:
while ((my $new_pos = index ($large [$i], $first, $pos)) != -1
+) {
$pos = $new_pos + 1;
foreach my $j (1 .. $#small) {
next LOOP unless
length $large [$j] >= $new_pos + length $small [$
+j] &&
$small [$j] eq substr $large [$i + $j], $new_pos,
length $small [$j];
}
return ($i, $new_pos);
}
}
return;
}
| [reply] [d/l] |
Re: Two-dimensional match/regex?
by abstracts (Hermit) on May 23, 2002 at 19:03 UTC
|
Here is my stab at the problem. Basically we pad the image with spaces to make a rectangular image. We do the same with the subimage but keeping the width of the subimage the same as the width of the image (padding with spaces). Create a regex by matching anything in space or the same character otherwise. Print the row and column of where the match occured
#!/usr/bin/perl -w
use strict;
my $image = <<'EOF';
MM
oo
<
-/
/^\
: C :
8===8
|^|
|||
-~-~
EOF
my $subimage = <<'EOF';
/^\
C
EOF
matches($image, $subimage);
sub matches{
my ($img, $simg) = @_;
my $len = 0;
my $normalize = sub {
my @arr = split /\n/, shift;
for(map length, @arr){
$len = $_ if $len < $_;
}
join "", map {$_ . ' ' x ($len - length)} @arr;
};
$img = $normalize->($img);
$simg = quotemeta $normalize->($simg);
$simg =~ s/\\ /./g;
if($img =~ /(.*?)$simg/){
my $l = length $1;
my $row = int($l / $len);
my $col = $l % $len;
print "Matches at $row x $col\n";
} else {
print "No match\n";
}
}
# Matches at 4 x 1
The row and column start with zero.
Hope this helps,,, | [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
|
|