Re: Longest String Matching
by kcott (Archbishop) on Apr 08, 2013 at 12:14 UTC
|
G'day Dr Manhattan,
I think you decided that splitting the word into individual characters was a good approach and then got rather bogged down attempting to wring a solution from this technique. A regex using alternation would probably be a better way to do this.
$ perl -Mstrict -Mwarnings -E '
my %lexicon = map { $_ => 1 } qw{own known};
my $string = q{unknown};
my $alt = join q{|} => sort { length $b <=> length $a } keys %lexi
+con;
my $re = qr{ ( (?> $alt ) ) $ }x;
$string =~ $re;
say $1;
'
known
Also, unless the hash %lexicon already exists for some other purpose, you'd probably be better off with an array:
$ perl -Mstrict -Mwarnings -E '
my @lexicon = qw{own known};
my $string = q{unknown};
my $alt = join q{|} => sort { length $b <=> length $a } @lexicon;
my $re = qr{ ( (?> $alt ) ) $ }x;
$string =~ $re;
say $1;
'
known
Note how sort puts the longest string first, i.e. to test for "known" before testing for "own". See perlreftut for an introduction to alternation; and perlre for more information as well as a description of the (?>pattern) construct (in the Extended Patterns section).
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Longest String Matching
by space_monk (Chaplain) on Apr 08, 2013 at 11:56 UTC
|
I'm open to better ideas.... :-)
while (length($word) && !exists $lookup{$word}) {
substr($word, 0, 1) = ""; # remove 1st char
}
# $word will be empty or the matched string
Update: This probably works well if the lexicon contains a lot of words as the execution time of this is proportional only to the length of the word. hdb has contributed an improvement to reduced the complexity a little, using the common trick of guaranteeing a match for zero length. There is a regex solution below this which is interesting too...
A Monk aims to give answers to those who have none, and to learn from those who know more.
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
Re: Longest String Matching
by choroba (Cardinal) on Apr 08, 2013 at 11:57 UTC
|
You can use rindex to find the last occurrence of a substring in a string:
#!/usr/bin/perl
use warnings;
use strict;
my @lexikon = sort { length $b <=> length $a } qw/ own known /;
my $word = "unknown";
print rightside($word), "\n";
sub rightside {
my $word = shift;
my $length = length $word;
for my $l (@lexikon) {
return $l if $length - length $l == rindex $word, $l;
}
return q();
}
| [reply] [Watch: Dir/Any] [d/l] |
Re: Longest String Matching
by hdb (Monsignor) on Apr 08, 2013 at 11:49 UTC
|
use strict; use warnings;
my @lexikon = qw/ own known /;
my $word = "unknown";
my $longestmatch = "";
foreach my $w (@lexikon) {
if( $word =~ /$w$/ ) {
$longestmatch=$w if length($w)>length($longestmatch);
}
}
# $longestmatch now has the longest match
$word =~ /(.*)($longestmatch$)/;
print "$1+$2\n";
| [reply] [Watch: Dir/Any] [d/l] |
Re: Longest String Matching
by Rahul6990 (Beadle) on Apr 08, 2013 at 12:03 UTC
|
%wordlist= ("unknown", -1);
%lexicon = ("own", 1, "known", 1);
foreach my $q (keys %wordlist)
{
push (@array, split(//, $q));
$end = $#array;
$mover = 1;
print "\nend : $end \nmove : $mover\n";
&rightside();
print "$right\n";
@array = ();
}
sub rightside
{
last if($mover == 0 or $move == $end);
if (exists $lexicon{join q(), @array[$mover..$end]})
{
$right = join q(), @array[$mover..$end];
print "\nright:$right\n";
}
else
{
$mover++;
&rightside();
}
}
the problem in your code was, you were starting with the smallest string ($mover = 2 and $end = 6)which always return 'own' thats why the output was showing 'own'. Now we are starting with the largest possible string and keeps on reducing the string size till we find the matched string in lexicon or the string length becomes 0.
| [reply] [Watch: Dir/Any] [d/l] |
Re: Longest String Matching
by AnomalousMonk (Archbishop) on Apr 09, 2013 at 01:40 UTC
|
I don't know if this will be as fast as pemungkah's approach, but the Perl regex engine builds a trie for alternations, so even quite large alternations (354K words in Moby file) are very fast. The big time sink is reading all the words and building the alternation string in the first place, but of course this could be done in a separate step and the resulting joined string saved to a file for later quick access. (Also note that the dictionary file I'm using has oddities such as 'words' like "d's" and "e's", and apparently every letter of the alphabet appears as a word.)
>perl -wMstrict -le
"my $wordlist = shift // die 'no wordlist filename given';
open my $fh_wordlist, '<', $wordlist
or die qq{opening '$wordlist': $!};
my $words =
join '|',
sort { length($b) <=> length($a) }
map { chomp; $_; }
<$fh_wordlist>
;
close $fh_wordlist or die qq{closing '$wordlist': $!};
;;
my @lexicon = qw(
no now know known unknown antidisestablishment
husband wife husband's wife's xyzzy a
);
for my $word (@lexicon) {
my ($longest_at_end) = $word =~ m{ \B ($words) \z }xms;
print qq{'$word' -> '$longest_at_end'};
}
" ..\..\moby\mwords\354984si.ngl
'no' -> 'o'
'now' -> 'ow'
'know' -> 'now'
'known' -> 'own'
'unknown' -> 'known'
'antidisestablishment' -> 'disestablishment'
'husband' -> 'band'
'wife' -> 'ife'
'husband's' -> 'd's'
'wife's' -> 'e's'
'xyzzy' -> 'y'
Use of uninitialized value $longest_at_end in concatenation (.) or str
+ing at -e line 1.
'a' -> ''
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
Re: Longest String Matching
by grizzley (Chaplain) on Apr 08, 2013 at 14:19 UTC
|
As you know exactly where you want to match (at the end of string), you can first get length n of a word you are searching for, then using substr function get substring of length n of the word you are searching in and just do eq comparison of two strings:
#!perl -l
@array = ('own', 'known', 'owl');
$q = 'unknown';
$longestmatching = '';
$longestn = 0;
for $testword(@array)
{
$n = length($testword);
next if $n <= $longestn;
$substr = substr($q, length($q)-$n, $n);
if($testword eq $substr)
{
$longestmatching = $testword;
$longestn = $n;
}
}
print $longestmatching;
Additionally you can do sort of @array from longest to shortest string as others mentioned. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Longest String Matching
by pemungkah (Priest) on Apr 08, 2013 at 20:35 UTC
|
Reverse the string you're searching, and add the items you're searching for (reversed too) to a Tree::Trie, which is optimized for fast prefix searches. You're not using a has but it's still O(1).
| [reply] [Watch: Dir/Any] |