Dear Monks,
I'm using perl for processing ancient texts (which is cool!), the problem is: I am having troubles with pattern matching in regular expressions. More in detail, I have a list of exemplars of Caesar's "De bello gallico", some of them being partly damaged. The first line of each exemplar is loaded in to an array as follows:
$var[0] = "Gallia est omnis divisa in partes tres";
$var[1] = "Gallia est omnis divisa in ...";
$var[2] = "Gallia est omnis ...";
$var[3] = "Gallia";
$var[4] = "... omnis divisa in ...";
$var[5] = "Gallia est ... tres";
$var[6] = "Gallia ... partes tres";
$var[7] = "Gallia est ... partes tres";
$var[8] = "Gallia ... divisa ... tres";
$var[9] = "... tres";
$var[10] = "quattuor";
The broken words are expressed by "...", which means to "one or more words are missing". I would like to compare these strings so that perl outputs a message when they do not match. This is useful to spot unexpected variations. In this case, only the last string should not match the others. My idea is to substitute "..." with ".+", and so far I got this:
for ($i=0;$i<=$#var;$i++) {
$var[$i] =~ s/\.\.\./\.\+/g;
for ($j=$i+1;$j<=$#var;$j++) {
$var[$j] =~ s/\.\.\./\.\+/g;
if ($var[$i] !~ m/$var[$j]/) {
print "$i-$j:\t[$var[$i]] and [$var[$j]] DO NOT MATCH!\n";
}
}
}
which prints:
0 - 10 DO NOT MATCH!
1 - 5 DO NOT MATCH!
1 - 6 DO NOT MATCH!
1 - 7 DO NOT MATCH!
1 - 8 DO NOT MATCH!
1 - 9 DO NOT MATCH!
1 - 10 DO NOT MATCH!\
2 - 4 DO NOT MATCH!
2 - 5 DO NOT MATCH!
2 - 6 DO NOT MATCH!
2 - 7 DO NOT MATCH!
2 - 8 DO NOT MATCH!
2 - 9 DO NOT MATCH!
2 - 10 DO NOT MATCH!
3 - 4 DO NOT MATCH!
3 - 5 DO NOT MATCH!
3 - 6 DO NOT MATCH!
3 - 7 DO NOT MATCH!
3 - 8 DO NOT MATCH!
3 - 9 DO NOT MATCH!
3 - 10 DO NOT MATCH!
4 - 5 DO NOT MATCH!
4 - 6 DO NOT MATCH!
4 - 7 DO NOT MATCH!
4 - 8 DO NOT MATCH!
4 - 9 DO NOT MATCH!
4 - 10 DO NOT MATCH!
5 - 6 DO NOT MATCH!
5 - 7 DO NOT MATCH!
5 - 8 DO NOT MATCH!
5 - 10 DO NOT MATCH!
6 - 7 DO NOT MATCH!
6 - 8 DO NOT MATCH!
6 - 10 DO NOT MATCH!
7 - 8 DO NOT MATCH!
7 - 10 DO NOT MATCH!
8 - 10 DO NOT MATCH!
9 - 10 DO NOT MATCH!
Is there a way to put special characters in the left part of the regular expression? I think this is what goes wrong with my idea, but I can't find the solution for it. Thank you so much for your help!
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.