Re: Find common prefix from a list of strings (tye)
by tye (Sage) on Jul 14, 2003 at 19:55 UTC
|
sub FindCommonPrefix {
my $model= pop @_;
my $len= length($model);
for my $item ( @_ ) {
my $dif= $model ^ substr($item,0,$len);
$len= length( ( $dif =~ /^(\0+)/ )[0] );
substr( $model, $len )= "";
}
return $model;
}
- tye | [reply] [d/l] |
|
Nice solution; ++tye.
Changing
$len= length( ( $dif =~ /^(\0+)/ )[0] );
to
$len= length( ( $dif =~ /^(\0+)/ )[0] || '' );
will make it run quiet under warnings when the common prefix is the empty string.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |
|
| [reply] |
Re: Find common prefix from a list of strings
by bobn (Chaplain) on Jul 14, 2003 at 19:43 UTC
|
my $posn = -1;
my $same = 1;
while(defined $same and $posn <= length $files[0]){
my $chr = substr($files[0], ++$posn, 1);
for my $name (@files) {
undef $same and last if(substr($name, $posn, 1) ne $chr
or length $name < $posn);
}
}
print 'Prefix is "', substr($files[0], 0, $posn), '"';
But I can't help but think there's a better way, such as:
# UNTESTED
$max = ( sort { $a <=> $b } map { length } @files )[0];
# length of smallest filename - I hope.
LIST:
for $len ( 1..$max) )
{
$s = substr($file[0],0,$len);
last LIST unless scalar grep { /^$s/ } @files == @files;
# precedence issue here?
}
print "prefix: '",substr($file[0],1,$len-1), "'\n";
Update: added line to get min length of any filename.
--Bob Niederman, http://bob-n.com | [reply] [d/l] [select] |
|
Drat! Thanks for the reality check - the die was just there for testing, I should have taken it out as it does not add anything, similarly the for was a hold over from an earlier attempt. I've got to proof read my posts more!
You obviously have the same must-be-a-better-way (MBABW?) feeling, and I like the idea of the grep trick, though I must admit I usually have a fear of labels. Anyway your code will work with a few minor fixes:
my $s;
LIST:
for my $len ( 1..length $files[0]) {
$s = substr($files[0],0,$len - 1);
last LIST unless (scalar grep { /^$s/ } @files) == @files;
}
chop $s;
print "prefix: $s\n";
Update: Not silly at all bobn, I should really have specified that an empty string is the correct answer for no common prefix - thanks for pointing that out.
--
I'd like to be able to assign to an luser | [reply] [d/l] |
|
Actually, there's one other "bug" - if there are never any matches, I will print out that the prefix is '' - well actually, as a zero-length string, that's actually not untrue, but still silly.
--Bob Niederman, http://bob-n.com
| [reply] |
|
Loved your chop $s; - it's the easiest and coolest way to deal with the "I found the condition that makes me stop, but i already added one too many characters" - in fact, I've used it that way myself.
--Bob Niederman, http://bob-n.com
| [reply] |
Re: Find common prefix from a list of strings
by Jenda (Abbot) on Jul 14, 2003 at 20:20 UTC
|
@files = qw(model4run1 model2run1 model4run2 model1run1);
my $first = shift(@files);
my $combined = $first;
foreach (@files) {
$combined &= $_;
}
$combined ^= $first;
$combined =~ s/[^\x00].*//;
my $prefix = substr($first, 0, length($combined));
print qq{Prefix is "$prefix"\n};
First I AND together all the elements of the array (therefore only bits that are the same for all items are set), then I XOR the result with the first element to get zeroes for all characters that are the same, then strip everything from the first nonzero character and last get as many characters from the first element as you have zeroes.
Update: You can of course replace the my $first = shift(@files); by my $first = $files[0];, the result will be the same and @files will be preserved.
Update: The code is wrong. It returns an incorrect result for @files = qw(model4run1 model5run1);
See a fixed version here. It makes my solution a little slower, but it's still the fastest.
Jenda
Always code as if the guy who ends up maintaining your code
will be a violent psychopath who knows where you live.
-- Rick Osborne
Edit by castaway: Closed small tag in signature | [reply] [d/l] [select] |
|
Here is a sort version of Jenda's solution
my @files = qw(model4run1 model2run1 model4run2 model1run1);
my $same = $files[0];
$same &=$_ for @files;
($same) = split 0x00, $same,2;
print "Prefix is \"$same\"\n";
| [reply] [d/l] |
|
Uh oh ... I wanted to point out that your solution is wrong, found a conter example and ... it broke my code as well :-(
Here is a fixed solution:
@files = qw(model4run1 model5run1);
#@files = qw(model4run1 model2run1 model4run2 model1run1);
my $first = shift(@files);
my $and = $first;
my $or = $first;
foreach (@files) {
$and &= $_;
$or |= $_;
}
my $combined = $and ^ $or;
$combined =~ s/[^\x00].*//;
my $prefix = substr($first, 0, length($combined));
print qq{Prefix is "$prefix"};
Jenda
Always code as if the guy who ends up maintaining your code
will be a violent psychopath who knows where you live.
-- Rick Osborne
Edit by castaway: Closed small tag in signature | [reply] [d/l] |
|
|
|
Re: Find common prefix from a list of strings
by artist (Parson) on Jul 14, 2003 at 19:48 UTC
|
@files = qw(model4run1 modexl2run1 model4run2 model1run1);
my @array = split //,shift @files;
my @prefix;
foreach (@files) {
@prefix = ();
while ($c = shift @array) {
next unless /^$c/;
s/^$c//;
push @prefix,$c;
}
@array = @prefix;
}
print join ""=>@prefix,"\n";
| [reply] [d/l] |
Re: Find common prefix from a list of strings
by Jenda (Abbot) on Jul 14, 2003 at 21:11 UTC
|
I benchmarked the suggested solutions, here are the results using the Albannach's list:
Benchmark: timing 100000 iterations of Albannach, Jenda, artist, bobn,
+ demerphq, tilly, tye, yosefm...
Albannach: 9 wallclock secs ( 8.75 usr + 0.00 sys = 8.75 CPU) @ 11
+424.65/s
Jenda: 2 wallclock secs ( 2.21 usr + 0.00 sys = 2.21 CPU) @ 45
+187.53/s
artist: 89 wallclock secs (86.88 usr + 0.00 sys = 86.88 CPU) @ 11
+50.96/s
bobn: 7 wallclock secs ( 7.69 usr + 0.00 sys = 7.69 CPU) @ 13
+002.21/s
demerphq: 28 wallclock secs (27.22 usr + 0.00 sys = 27.22 CPU) @ 36
+73.90/s
tilly: 4 wallclock secs ( 3.89 usr + 0.00 sys = 3.89 CPU) @ 25
+733.40/s
tye: 5 wallclock secs ( 4.37 usr + 0.00 sys = 4.37 CPU) @ 22
+899.02/s
yosefm: 25 wallclock secs (24.50 usr + 0.00 sys = 24.50 CPU) @ 40
+82.30/s
and here using a slightly longer list:@files = qw(model4run1 model4rundfsdf model2run1 model4run2 model1run1
+ modelka
modelujeme modeluji modelme modelsdfgsdfg);
...
Benchmark: timing 100000 iterations of Albannach, Jenda, artist, bobn,
+ demerphq, tilly, tye, yosefm.
..
Albannach: 17 wallclock secs (15.34 usr + 0.01 sys = 15.35 CPU) @ 65
+13.81/s
Jenda: 4 wallclock secs ( 3.31 usr + 0.01 sys = 3.32 CPU) @ 30
+075.19/s
artist: 268 wallclock secs (250.31 usr + 0.04 sys = 250.35 CPU) @
+ 399.44/s
bobn: 13 wallclock secs (12.90 usr + 0.00 sys = 12.90 CPU) @ 77
+52.54/s
demerphq: 59 wallclock secs (57.94 usr + 0.01 sys = 57.95 CPU) @ 17
+25.54/s
tilly: 11 wallclock secs ( 8.01 usr + 0.02 sys = 8.03 CPU) @ 12
+453.30/s
tye: 11 wallclock secs (10.35 usr + 0.00 sys = 10.35 CPU) @ 96
+58.10/s
yosefm: 33 wallclock secs (31.75 usr + 0.00 sys = 31.75 CPU) @ 31
+50.00/s
Jenda
Always code as if the guy who ends up maintaining your code
will be a violent psychopath who knows where you live.
-- Rick Osborne
Edit by castaway: Closed small tag in signature | [reply] [d/l] [select] |
Re: Find common prefix from a list of strings
by Zaxo (Archbishop) on Jul 15, 2003 at 01:12 UTC
|
Wanting the common prefix makes for a nifty shortcut. Sort the strings alphabetically and you only need to compare the first and last of them,
sub common_prefix {
my ($first, $last) = (sort @_)[0,-1];
my $i = 0;
while (substr($first, $i, 1) eq substr($last, $i, 1)) { $i++}
substr $first, 0, $i;
}
After Compline, Zaxo | [reply] [d/l] |
Re: Find common prefix from a list of strings
by yosefm (Friar) on Jul 14, 2003 at 20:15 UTC
|
@files = qw(model1bbb model2ccc model3i moduuu);
PREF: for my $i (0..length($files[0])-1) {
my $substr = substr($files[0], 0, $i+1);
(/^$substr/ or ((print 'Pref', substr($files[0], 0, $i)), last PREF))
+for (@files[1..$#files]);
}
Hope this helps.
Update: some people preceded me while I tested this... | [reply] [d/l] |
Re: Find common prefix from a list of strings
by tilly (Archbishop) on Jul 14, 2003 at 20:17 UTC
|
Here is a bad solution just for variety.
my $sep = chr(033);
my $str = join $sep, @files;
if ($str =~ /^([^$sep]*)[^$sep]*($sep\1[^$sep]*)*\z/s) {
print "Common substring: '$1'\n";
}
else {
die "Pattern unexpectedly failed to match?";
}
Of course it assumes that chr(033) does not appear in the strings... | [reply] [d/l] |
Re: Find common prefix from a list of strings
by demerphq (Chancellor) on Jul 14, 2003 at 20:44 UTC
|
sub insert {
my $trie=shift;
my $str=shift;
$trie=$trie->{$_}||={}
foreach (split //,$str);
}
sub common {
my $trie=shift;
my $common="";
while (1==scalar keys %$trie) {
my $char=(keys %$trie)[0];
$common.=$char;
$trie=$trie->{$char};
}
$common;
}
my %trie;
insert(\%trie,$_) foreach qw(model4run1 model2run1 model4run2 model1ru
+n1);
print common(\%trie);
But it could be argued I have Trie's on my brain. :-)
---
demerphq
<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
| [reply] [d/l] [select] |
|
c:\@Work\Perl\monks\Albannach>perl -wMstrict -le
"sub insert {
my $trie=shift;
my $str=shift;
$trie=$trie->{$_}||={}
foreach (split //,$str);
}
;;
sub common {
my $trie=shift;
my $common=\"\";
while (1==scalar keys %$trie) {
my $char=(keys %$trie)[0];
$common.=$char;
$trie=$trie->{$char};
}
$common;
}
;;
my %trie;
insert(\%trie,$_) foreach qw(a ab abc);
print common(\%trie);
"
abc
Shouldn't 'a' be the longest common prefix?
| [reply] [d/l] [select] |
|
sub insert {
my $trie=shift;
my $str=shift;
$trie=$trie->{$_}||={}
foreach (split //,$str);
$trie->{''}= $str;
}
sub common {
my $trie=shift;
my $common="";
while (!exists($trie->{''}) and 1==scalar keys %$trie) {
my $char=(keys %$trie)[0];
$common.=$char;
$trie=$trie->{$char};
}
$common;
}
my %trie;
insert(\%trie,$_) foreach qw(a ab abc);
print common(\%trie);
Sorry it took so long to reply.
---
$world=~s/war/peace/g
| [reply] [d/l] |
Re: Find common prefix from a list of strings
by antirice (Priest) on Jul 15, 2003 at 00:42 UTC
|
Yes, it is inspired by tye's version. However, in the spirit of TMTOWTDI:
sub Prefix {
my ($m,$n) = (sort @_)[0,-1];
my $dif = $m ^ $n;
my $len= length( ( $dif =~ /^(\0*)/ )[0] );
substr( $m, 0,$len );
}
Why sort? Since it's a builtin, it should be rather fast. Also, you're guaranteed that the first and last element of the sorted array will have the shortest prefix match of any two strings within the array. This sub is actually very fast.
antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1 | [reply] [d/l] [select] |
|
sub Prefix {
my ($ar_strings,
) = @_;
return '' unless @$ar_strings;
my ($m, $n) = (sort @$ar_strings)[0,-1];
($m ^ $n) =~ m{ \A \x00* }xms;
return substr $m, 0, $+[0];
}
| [reply] [d/l] |
Re: Find common prefix from a list of strings
by Willard B. Trophy (Hermit) on Jul 14, 2003 at 22:06 UTC
|
Though the problem's well and answered by the time I got to this, it did strike me that Text::Abbrev would sort of work to solve this.
Indeed, if you do:
%abbrevs = abbrev(@files);
@possibles = sort { length($a) <=> length($b) } ( keys(%abbrevs) );
the first entry in @possibles is most of the way to the solution. I don't have time to look into this further, alas, but it might be another way to do it.
-- bowling trophy thieves, die! | [reply] [d/l] |
Re: Find common prefix from a list of strings
by Anonymous Monk on Jul 14, 2003 at 20:49 UTC
|
There's a functional module somewhere with reduce in it, I think.
Anyway, you said map, here's my solution...
my @ans =
grep {$_}
map {
my $func = $_;
reduce( sub{
my($first, $second) = @_;
return ($first eq $second) && $first;
}
, map(&$func, @data));
}
map {
my $ctr = $_;
sub
{
my $str = $_;
substr($str, 0, $ctr);
}
}
(1..length($data[0]));
print pop(@ans), "\n";
I faked a reduce to test it, it works on this dataset. | [reply] [d/l] |
|
You're right that there is a functional module with reduce in it.
It is called List::Util.
| [reply] |
Re: Find common prefix from a list of strings
by sgifford (Prior) on Jul 14, 2003 at 22:07 UTC
|
Here's another golfish one:
sub maxprefix
{
my $p = shift;
grep {chop($p) until (/^$p/)} @_;
$p;
}
| [reply] [d/l] |
|
I liked this one, but why would you use a grep there..? Also, metacharacters in $p will break it.
sub maxprefix {
my $p = shift;
for (@_) { chop $p until /^\Q$p/ }
return $p;
}
Depending on the nature of your data this might be a win:
sub maxprefix {
my $s = reverse shift;
my $p = '';
for (@_) { $p .= quotemeta chop $s while /^$p/ }
chop $p;
return $p;
}
Makeshifts last the longest.
| [reply] [d/l] [select] |
|
That approach is quite nice. Here's a unsightly twist on it:
#!/usr/bin/perl -wl
use strict;
use warnings;
sub Prefix {
my $p = shift;
($p)=$_=~join'','\A',map("(\Q$_",split//,$p),join'?',(')')x length(
+$p)or return'' for @_;
$p
}
while (<DATA>) {
print "$_=> ", Prefix(split ' ',$_);
}
__DATA__
model4run1 model2run1 model4run2 model1run1
model4run1 model2run1 model4run2 abbot
model4run1 model2run1 abbot model1run1
model4run1 model2run1 model4run2 monk
model4run1 model2run1 monk model1run1
testing terse testing time
but I prefer the chop. | [reply] [d/l] |
|
|
|
|
sub maxprefix {
my $p = shift(@_);
for(@_) { chop $p until /^\Q$p/ }
return $p;
}
Update: The one you added has problems as well:
sub maxprefix {
my $s = reverse shift;
my $p = '';
for (@_) { $p .= quotemeta chop $s while /^$p/ }
chop $p;
return $p;
}
It will find the maximum prefix between the first string and the string of the others that has the longest common prefix with the first string:
print maxprefix("testing","terse","tester","time"),$/;
prints test not t.
| [reply] [d/l] [select] |
|
|
sub common_suffix {
my $comm = shift @_;
while ($_ = shift @_) {
$_ = substr($_,-length($comm))
if (length($_) > length($comm));
$comm = substr($comm,-length($_))
if (length($_) < length($comm));
if (( $_ ^ $comm ) =~ /(\0*)$/) {
$comm = substr($comm, -length($1));
} else {
return undef;
}
}
return $comm;
}
sub common_prefix {
my $comm = shift @_;
while ($_ = shift @_) {
$_ = substr($_,0,length($comm))
if (length($_) > length($comm));
$comm = substr($comm,0,length($_))
if (length($_) < length($comm));
if (( $_ ^ $comm ) =~ /^(\0*)/) {
$comm = substr($comm,0,length($1));
} else {
return undef;
}
}
return $comm;
}
| [reply] [d/l] |
Re: Find common prefix from a list of strings
by qmole (Beadle) on Jul 14, 2003 at 21:00 UTC
|
Not exactly golfed to death, but:
@files = qw(model4run1 model2run1 model4run2 model1run1);
$x++,"$files[0] "=~/(.{$x})/ while (grep/^$1/,@files)==@files;
chop($_=$1);
print;
| [reply] [d/l] |
Re: Find common prefix from a list of strings
by Jasper (Chaplain) on Jul 15, 2003 at 10:10 UTC
|
@files = qw(model4run1 model2run1 model4run2 model);
$reg = ' \\1[^ ]*' x $#files;
"@files" =~ /^([^ ]*)[^ ]*$reg$/;
print $1;
Of course, all the usual catches apply, like having spaces, return characters, etc. in your filenames.
No idea how slow this is..
Jasper | [reply] [d/l] |