OK, I'll try my best at explaining the "unrolling the loop" mechanism (I don't have MRE at hand, so feel free to correct me if I am blatantly wrong!):
I use this technique in 2 cases:
- When I want to match a string between 2 1-char delimiters, but the end delimiter can be included in the string through an escape mechanism. For example a double-quoted string that can include the double-quote if backslashed. In this case the "naive" m/"[^"]*"/ would not work.
- When I want to match a string between 2 delimiters but the end delimiter is several characters long, such as in C comments (/*comment*/). In this case /*(.*?)*/ would work but it might be slow (I think, I have not benchmarked it).
In the first case here is how you want to match:
- the start delimiter,
- then an "easy" string (the normal* part): anything that does not include the escape character,
- if you find the escape character then you want to skip the next character (the special part), whatever it is, and then match the following "easy" string (the second normal*), until you get to the next escape character or the end delimiter,
- finally you match the end delimiter
Now if you want to match a string with a multi-character end delimiter here is how to do it:
- match the start delimiter,
- match anything until you get to the first character of the end delimiter,
- if that character is not followed by the rest of the end delimiter then you can keep on matching "regular" characters until the next time you find that character,
- match the end delimiter
A potential pitfall is that you want to make sure you don't consume the characters just after the first character of the end delimiter, or things like **/ (the first character of the end delimiter is there twice in a row, once as a regular character and once as the start of the end delimiter) would not be processed properly.
I guess a couple of examples might be appropriate.
First matching a double-quoted string, double quotes can be escaped using \":
#!/bin/perl -w
use strict;
while( <DATA>)
{ next if(/^\s*#/); # skip comments in DATA
chomp;
# split data into string to match and expected result(s)
my( $string, @expected)= split /\s*=>\s*/;
while( $string=~ m{" # the start deli
+miter
([^\\"]* # anything but t
+he end of the string or the escape char
(?:\\. # the escape
+ char preceeding an escaped char (any char)
[^\\"]* # anything b
+ut the end of the string or the escape char
)*) # repeat
"}gx) # the end delimi
+ter
{ my $match= $1;
my $expected= shift @expected;
unless( $match eq $expected)
{ print "unexpected result line $.: found /$match/, expectin
+g /$expected/\n"; }
}
}
__DATA__
# string to match => expected results(s)
toto
"a string" tata => a string
toto "a string" => a string
toto "a string" tata => a string
toto "a \" string" tata => a \" string
toto "\" string" tata => \" string
toto "a\"" tata => a\"
toto "\"" tata => \"
toto "\"\"" tata => \"\"
toto "string 1" "string 2" tata => string 1 => string 2
toto "string 1 =>
toto "string 1" "string => string 1
toto "tata\\" tutu => tata\\
toto "tata\\\"" tutu => tata\\\"
And now how to match C-like comments:
#!/bin/perl -w
use strict;
while( <DATA>)
{ chomp;
next if(^\s*#); # skip comments
# split the data into the string to match and the expected result(
+s)
my( $string, @expected)= split /\s*=>\s*/;
while( $string=~ m{/\* # the delimite
+r
([^*]* # anything but
+ the beginning of the delimiter
(?:\*(?!>/) # the beginn
+ing of the delimiter, not preceeding the rest of the delimiter
# (?!>/)
+means "not before /, do not use the next char)
[^/]* # anything b
+ut the beginning of the delimiter
)*) # repeat
\*/}gx) # the end of t
+he delimiter
{ my $match= $1;
my $expected= shift @expected || '';
unless( $match eq $expected)
{ print "unexpected result line $.: found /$match/, expectin
+g /$expected/\n"; }
}
}
__DATA__
# string => result(s)
toto =>
toto /*foo*/ tata => foo
/*foo*/ tata => foo
toto /*foo*/ => foo
toto /*foo*bar*/ => foo*bar
toto /*foo**/ => foo*
toto /**/ =>
/***/ => *
/*/*/ => /
|