Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Find common prefix from a list of strings

by Albannach (Prior)
on Jul 14, 2003 at 19:22 UTC ( #274114=perlquestion: print w/replies, xml ) Need Help??

Albannach has asked for the wisdom of the Perl Monks concerning the following question:

Given: a list of strings, in my case a list of filenames (log files from a simulation package), e.g. @files = qw(model4run1 model2run1 model4run2 model1run1)
Output: the set of characters common to the begining of all the strings, in the above example "model"

I am having a mental block on this little snippet which, though working, simply does not look elegant to me yet. Something tells me it could be a simple map application, but my mental map interface seems to be read-only. I'd appreciate any and all suggestions, comments, or even golf.

my $posn = -1; my $same = 1; while(defined $same){ die if $posn > length $files[0]; my $chr = substr($files[0], ++$posn, 1); for my $name (1..$#files) { undef $same and last if(substr($files[$name], $posn, 1) ne $chr or length $files[$name] < $posn); } } print 'Prefix is "', substr($files[0], 0, $posn), '"';

--
I'd like to be able to assign to an luser

Replies are listed 'Best First'.
Re: Find common prefix from a list of strings (tye)
by tye (Sage) on Jul 14, 2003 at 19:55 UTC
    sub FindCommonPrefix { my $model= pop @_; my $len= length($model); for my $item ( @_ ) { my $dif= $model ^ substr($item,0,$len); $len= length( ( $dif =~ /^(\0+)/ )[0] ); substr( $model, $len )= ""; } return $model; }
                    - tye

      Nice solution; ++tye.

      Changing

      $len= length( ( $dif =~ /^(\0+)/ )[0] );
      to
      $len= length( ( $dif =~ /^(\0+)/ )[0] || '' );
      will make it run quiet under warnings when the common prefix is the empty string.

      -sauoq
      "My two cents aren't worth a dime.";
      

        Oops. That + should have been a *. Thanks for the hint. (:

                        - tye
Re: Find common prefix from a list of strings
by bobn (Chaplain) on Jul 14, 2003 at 19:43 UTC

    die is not the way to normally end execution. And for can iterate a list without indexing.

    my $posn = -1; my $same = 1; while(defined $same and $posn <= length $files[0]){ my $chr = substr($files[0], ++$posn, 1); for my $name (@files) { undef $same and last if(substr($name, $posn, 1) ne $chr or length $name < $posn); } } print 'Prefix is "', substr($files[0], 0, $posn), '"';
    But I can't help but think there's a better way, such as:
    # UNTESTED $max = ( sort { $a <=> $b } map { length } @files )[0]; # length of smallest filename - I hope. LIST: for $len ( 1..$max) ) { $s = substr($file[0],0,$len); last LIST unless scalar grep { /^$s/ } @files == @files; # precedence issue here? } print "prefix: '",substr($file[0],1,$len-1), "'\n";

    Update: added line to get min length of any filename.



    --Bob Niederman, http://bob-n.com
      Drat! Thanks for the reality check - the die was just there for testing, I should have taken it out as it does not add anything, similarly the for was a hold over from an earlier attempt. I've got to proof read my posts more!

      You obviously have the same must-be-a-better-way (MBABW?) feeling, and I like the idea of the grep trick, though I must admit I usually have a fear of labels. Anyway your code will work with a few minor fixes:

      my $s; LIST: for my $len ( 1..length $files[0]) { $s = substr($files[0],0,$len - 1); last LIST unless (scalar grep { /^$s/ } @files) == @files; } chop $s; print "prefix: $s\n";
      Update: Not silly at all bobn, I should really have specified that an empty string is the correct answer for no common prefix - thanks for pointing that out.

      --
      I'd like to be able to assign to an luser

        Actually, there's one other "bug" - if there are never any matches, I will print out that the prefix is '' - well actually, as a zero-length string, that's actually not untrue, but still silly.

        --Bob Niederman, http://bob-n.com
        Loved your chop $s; - it's the easiest and coolest way to deal with the "I found the condition that makes me stop, but i already added one too many characters" - in fact, I've used it that way myself.

        --Bob Niederman, http://bob-n.com
Re: Find common prefix from a list of strings
by Jenda (Abbot) on Jul 14, 2003 at 20:20 UTC
    @files = qw(model4run1 model2run1 model4run2 model1run1); my $first = shift(@files); my $combined = $first; foreach (@files) { $combined &= $_; } $combined ^= $first; $combined =~ s/[^\x00].*//; my $prefix = substr($first, 0, length($combined)); print qq{Prefix is "$prefix"\n};
    First I AND together all the elements of the array (therefore only bits that are the same for all items are set), then I XOR the result with the first element to get zeroes for all characters that are the same, then strip everything from the first nonzero character and last get as many characters from the first element as you have zeroes.

    Update: You can of course replace the my $first = shift(@files); by my $first = $files[0];, the result will be the same and @files will be preserved.

    Update: The code is wrong. It returns an incorrect result for

    @files = qw(model4run1 model5run1);
    See a fixed version here. It makes my solution a little slower, but it's still the fastest.

    Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature

      Here is a sort version of Jenda's solution
      my @files = qw(model4run1 model2run1 model4run2 model1run1); my $same = $files[0]; $same &=$_ for @files; ($same) = split 0x00, $same,2; print "Prefix is \"$same\"\n";

        Uh oh ... I wanted to point out that your solution is wrong, found a conter example and ... it broke my code as well :-(

        Here is a fixed solution:

        @files = qw(model4run1 model5run1); #@files = qw(model4run1 model2run1 model4run2 model1run1); my $first = shift(@files); my $and = $first; my $or = $first; foreach (@files) { $and &= $_; $or |= $_; } my $combined = $and ^ $or; $combined =~ s/[^\x00].*//; my $prefix = substr($first, 0, length($combined)); print qq{Prefix is "$prefix"};

        Jenda
        Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
           -- Rick Osborne

        Edit by castaway: Closed small tag in signature

Re: Find common prefix from a list of strings
by artist (Parson) on Jul 14, 2003 at 19:48 UTC
    An alternative:
    @files = qw(model4run1 modexl2run1 model4run2 model1run1); my @array = split //,shift @files; my @prefix; foreach (@files) { @prefix = (); while ($c = shift @array) { next unless /^$c/; s/^$c//; push @prefix,$c; } @array = @prefix; } print join ""=>@prefix,"\n";
Re: Find common prefix from a list of strings
by Jenda (Abbot) on Jul 14, 2003 at 21:11 UTC

    I benchmarked the suggested solutions, here are the results using the Albannach's list:

    Benchmark: timing 100000 iterations of Albannach, Jenda, artist, bobn, + demerphq, tilly, tye, yosefm... Albannach: 9 wallclock secs ( 8.75 usr + 0.00 sys = 8.75 CPU) @ 11 +424.65/s Jenda: 2 wallclock secs ( 2.21 usr + 0.00 sys = 2.21 CPU) @ 45 +187.53/s artist: 89 wallclock secs (86.88 usr + 0.00 sys = 86.88 CPU) @ 11 +50.96/s bobn: 7 wallclock secs ( 7.69 usr + 0.00 sys = 7.69 CPU) @ 13 +002.21/s demerphq: 28 wallclock secs (27.22 usr + 0.00 sys = 27.22 CPU) @ 36 +73.90/s tilly: 4 wallclock secs ( 3.89 usr + 0.00 sys = 3.89 CPU) @ 25 +733.40/s tye: 5 wallclock secs ( 4.37 usr + 0.00 sys = 4.37 CPU) @ 22 +899.02/s yosefm: 25 wallclock secs (24.50 usr + 0.00 sys = 24.50 CPU) @ 40 +82.30/s
    and here using a slightly longer list:
    @files = qw(model4run1 model4rundfsdf model2run1 model4run2 model1run1 + modelka modelujeme modeluji modelme modelsdfgsdfg); ... Benchmark: timing 100000 iterations of Albannach, Jenda, artist, bobn, + demerphq, tilly, tye, yosefm. .. Albannach: 17 wallclock secs (15.34 usr + 0.01 sys = 15.35 CPU) @ 65 +13.81/s Jenda: 4 wallclock secs ( 3.31 usr + 0.01 sys = 3.32 CPU) @ 30 +075.19/s artist: 268 wallclock secs (250.31 usr + 0.04 sys = 250.35 CPU) @ + 399.44/s bobn: 13 wallclock secs (12.90 usr + 0.00 sys = 12.90 CPU) @ 77 +52.54/s demerphq: 59 wallclock secs (57.94 usr + 0.01 sys = 57.95 CPU) @ 17 +25.54/s tilly: 11 wallclock secs ( 8.01 usr + 0.02 sys = 8.03 CPU) @ 12 +453.30/s tye: 11 wallclock secs (10.35 usr + 0.00 sys = 10.35 CPU) @ 96 +58.10/s yosefm: 33 wallclock secs (31.75 usr + 0.00 sys = 31.75 CPU) @ 31 +50.00/s

    Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature

Re: Find common prefix from a list of strings
by Zaxo (Archbishop) on Jul 15, 2003 at 01:12 UTC

    Wanting the common prefix makes for a nifty shortcut. Sort the strings alphabetically and you only need to compare the first and last of them,

    sub common_prefix { my ($first, $last) = (sort @_)[0,-1]; my $i = 0; while (substr($first, $i, 1) eq substr($last, $i, 1)) { $i++} substr $first, 0, $i; }

    After Compline,
    Zaxo

Re: Find common prefix from a list of strings
by yosefm (Friar) on Jul 14, 2003 at 20:15 UTC
    @files = qw(model1bbb model2ccc model3i moduuu); PREF: for my $i (0..length($files[0])-1) { my $substr = substr($files[0], 0, $i+1); (/^$substr/ or ((print 'Pref', substr($files[0], 0, $i)), last PREF)) +for (@files[1..$#files]); }

    Hope this helps.

    Update: some people preceded me while I tested this...

Re: Find common prefix from a list of strings
by tilly (Archbishop) on Jul 14, 2003 at 20:17 UTC
    Here is a bad solution just for variety.
    my $sep = chr(033); my $str = join $sep, @files; if ($str =~ /^([^$sep]*)[^$sep]*($sep\1[^$sep]*)*\z/s) { print "Common substring: '$1'\n"; } else { die "Pattern unexpectedly failed to match?"; }
    Of course it assumes that chr(033) does not appear in the strings...
Re: Find common prefix from a list of strings
by demerphq (Chancellor) on Jul 14, 2003 at 20:44 UTC

    Use a Trie.

    sub insert { my $trie=shift; my $str=shift; $trie=$trie->{$_}||={} foreach (split //,$str); } sub common { my $trie=shift; my $common=""; while (1==scalar keys %$trie) { my $char=(keys %$trie)[0]; $common.=$char; $trie=$trie->{$char}; } $common; } my %trie; insert(\%trie,$_) foreach qw(model4run1 model2run1 model4run2 model1ru +n1); print common(\%trie);

    But it could be argued I have Trie's on my brain. :-)


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
      c:\@Work\Perl\monks\Albannach>perl -wMstrict -le "sub insert { my $trie=shift; my $str=shift; $trie=$trie->{$_}||={} foreach (split //,$str); } ;; sub common { my $trie=shift; my $common=\"\"; while (1==scalar keys %$trie) { my $char=(keys %$trie)[0]; $common.=$char; $trie=$trie->{$char}; } $common; } ;; my %trie; insert(\%trie,$_) foreach qw(a ab abc); print common(\%trie); " abc

      Shouldn't  'a' be the longest common prefix?

        Yeah. We have to track the end position and stop accordingly. A simple fix is to use the '' slot in the hash to hold "end state" data for the trie. Eg:

        sub insert { my $trie=shift; my $str=shift; $trie=$trie->{$_}||={} foreach (split //,$str); $trie->{''}= $str; } sub common { my $trie=shift; my $common=""; while (!exists($trie->{''}) and 1==scalar keys %$trie) { my $char=(keys %$trie)[0]; $common.=$char; $trie=$trie->{$char}; } $common; } my %trie; insert(\%trie,$_) foreach qw(a ab abc); print common(\%trie);

        Sorry it took so long to reply.

        ---
        $world=~s/war/peace/g

Re: Find common prefix from a list of strings
by antirice (Priest) on Jul 15, 2003 at 00:42 UTC

    Yes, it is inspired by tye's version. However, in the spirit of TMTOWTDI:

    sub Prefix { my ($m,$n) = (sort @_)[0,-1]; my $dif = $m ^ $n; my $len= length( ( $dif =~ /^(\0*)/ )[0] ); substr( $m, 0,$len ); }

    Why sort? Since it's a builtin, it should be rather fast. Also, you're guaranteed that the first and last element of the sorted array will have the shortest prefix match of any two strings within the array. This sub is actually very fast.

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1

      Passes same tests as above, but with no capture or length call, might be slightly faster — but no benchmarking done. (I passed all arrays by reference in my testing.)

      sub Prefix { my ($ar_strings, ) = @_; return '' unless @$ar_strings; my ($m, $n) = (sort @$ar_strings)[0,-1]; ($m ^ $n) =~ m{ \A \x00* }xms; return substr $m, 0, $+[0]; }
Re: Find common prefix from a list of strings
by Willard B. Trophy (Hermit) on Jul 14, 2003 at 22:06 UTC
    Though the problem's well and answered by the time I got to this, it did strike me that Text::Abbrev would sort of work to solve this.

    Indeed, if you do:

    %abbrevs = abbrev(@files); @possibles = sort { length($a) <=> length($b) } ( keys(%abbrevs) );

    the first entry in @possibles is most of the way to the solution. I don't have time to look into this further, alas, but it might be another way to do it.

    --
    bowling trophy thieves, die!

Re: Find common prefix from a list of strings
by Anonymous Monk on Jul 14, 2003 at 20:49 UTC

    There's a functional module somewhere with reduce in it, I think.

    Anyway, you said map, here's my solution...

    my @ans = grep {$_} map { my $func = $_; reduce( sub{ my($first, $second) = @_; return ($first eq $second) && $first; } , map(&$func, @data)); } map { my $ctr = $_; sub { my $str = $_; substr($str, 0, $ctr); } } (1..length($data[0])); print pop(@ans), "\n";

    I faked a reduce to test it, it works on this dataset.

      You're right that there is a functional module with reduce in it.

      It is called List::Util.

Re: Find common prefix from a list of strings
by sgifford (Prior) on Jul 14, 2003 at 22:07 UTC
    Here's another golfish one:
    sub maxprefix { my $p = shift; grep {chop($p) until (/^$p/)} @_; $p; }
      I liked this one, but why would you use a grep there..? Also, metacharacters in $p will break it.
      sub maxprefix { my $p = shift; for (@_) { chop $p until /^\Q$p/ } return $p; }
      Depending on the nature of your data this might be a win:
      sub maxprefix { my $s = reverse shift; my $p = ''; for (@_) { $p .= quotemeta chop $s while /^$p/ } chop $p; return $p; }

      Makeshifts last the longest.

        That approach is quite nice. Here's a unsightly twist on it:
        #!/usr/bin/perl -wl use strict; use warnings; sub Prefix { my $p = shift; ($p)=$_=~join'','\A',map("(\Q$_",split//,$p),join'?',(')')x length( +$p)or return'' for @_; $p } while (<DATA>) { print "$_=> ", Prefix(split ' ',$_); } __DATA__ model4run1 model2run1 model4run2 model1run1 model4run1 model2run1 model4run2 abbot model4run1 model2run1 abbot model1run1 model4run1 model2run1 model4run2 monk model4run1 model2run1 monk model1run1 testing terse testing time
        but I prefer the chop.
        sub maxprefix { my $p = quotemeta shift; for (@_) { chop $p until /^$p/ } return $p; }

        This will still often fail on metacharacters because you'll get a "trailing \ in regexp" error.

        sub maxprefix { my $p = shift(@_); for(@_) { chop $p until /^\Q$p/ } return $p; }

        Update: The one you added has problems as well:

        sub maxprefix { my $s = reverse shift; my $p = ''; for (@_) { $p .= quotemeta chop $s while /^$p/ } chop $p; return $p; }

        It will find the maximum prefix between the first string and the string of the others that has the longest common prefix with the first string:

        print maxprefix("testing","terse","tester","time"),$/;
        prints test not t.

                        - tye
        sub common_suffix { my $comm = shift @_; while ($_ = shift @_) { $_ = substr($_,-length($comm)) if (length($_) > length($comm)); $comm = substr($comm,-length($_)) if (length($_) < length($comm)); if (( $_ ^ $comm ) =~ /(\0*)$/) { $comm = substr($comm, -length($1)); } else { return undef; } } return $comm; } sub common_prefix { my $comm = shift @_; while ($_ = shift @_) { $_ = substr($_,0,length($comm)) if (length($_) > length($comm)); $comm = substr($comm,0,length($_)) if (length($_) < length($comm)); if (( $_ ^ $comm ) =~ /^(\0*)/) { $comm = substr($comm,0,length($1)); } else { return undef; } } return $comm; }
Re: Find common prefix from a list of strings
by qmole (Beadle) on Jul 14, 2003 at 21:00 UTC
    Not exactly golfed to death, but:
    @files = qw(model4run1 model2run1 model4run2 model1run1); $x++,"$files[0] "=~/(.{$x})/ while (grep/^$1/,@files)==@files; chop($_=$1); print;
Re: Find common prefix from a list of strings
by Jasper (Chaplain) on Jul 15, 2003 at 10:10 UTC
    why not try a regexp?
    @files = qw(model4run1 model2run1 model4run2 model); $reg = ' \\1[^ ]*' x $#files; "@files" =~ /^([^ ]*)[^ ]*$reg$/; print $1;

    Of course, all the usual catches apply, like having spaces, return characters, etc. in your filenames.

    No idea how slow this is..

    Jasper

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://274114]
Approved by premchai21
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2019-12-11 06:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?