in reply to
Using special characters in left part of a regex match?
G'day shamat,
Firstly, my comments (some of which have already been mentioned in earlier responses):
-
Do you really want to compare all fragments with each other? I can envisage a situation where you're attempting to decide whether "... est ..." matches "... in ...". Perhaps you'd want to filter badly damaged fragments from any sort of matching whatsoever.
-
I think you'd be better off comparing the fragments with a single reference string. You wrote "... some of them being partly damaged.", so presumably some of them are complete.
-
You wrote "... only the last string should not match ..." (that would be "quattuor"). If that's the case, "Gallia" should probably be "Gallia ..."
-
The output you show does not match the code that creates it. From the code you posted, I'd be expecting output like:
N-M: [string1] and [string2] DO NOT MATCH!
Here's a solution that takes all of the above into account:
#!/usr/bin/env perl
use strict;
use warnings;
my @exemplars = <DATA>;
my $reference = shift @exemplars;
print "Reference string: $reference";
for (@exemplars) {
my $exemplar = $_;
s/[.]{3}/.+?/g;
if ($reference !~ /$_/) {
print "NO MATCH: $exemplar";
}
}
__DATA__
Gallia est omnis divisa in partes tres
Gallia est omnis divisa in ...
Gallia est omnis ...
Gallia
... omnis divisa in ...
Gallia est ... tres
Gallia ... partes tres
Gallia est ... partes tres
Gallia ... divisa ... tres
... tres
quattuor
Gallia ...
Output:
$ pm_latin_fragments.pl
Reference string: Gallia est omnis divisa in partes tres
NO MATCH: Gallia
NO MATCH: quattuor