http://www.perlmonks.org?node_id=974762

technojosh has asked for the wisdom of the Perl Monks concerning the following question:

I have a group of strings, that all share a common beginning substring. This substring will not always be identical, the only thing I know is that in any given array of strings, there *will* be a common substring at the beginning.

Context: As of today, I have a reporting script that will produce an HTML table, row-by-row, of a set of tests. I want to show a "header" row, that displays the test category, and this is found in the substring problem I mention above.

Two example sets of strings:

my @array_of_test_names = ( 'History - Family History - Suite with Patient - Clinic - Select R +emove(-) Icon of Relative. - Clinic', 'History - Family History - Suite with Patient - Clinic - Save aft +er Entering values in all fields - Clinic', 'History - Family History - Suite with Patient - Clinic - Select A +dd(+) Icon of Relative. - Clinic', 'History - Family History - Suite with Patient - Clinic - Multiple + selection of relatives - Clinic', 'History - Family History - Suite with Patient - Clinic - Interact +ion Checking message not provided if procedure code is selected - Cli +nic', 'History - Family History - Suite with Patient - Clinic - Interact +ion Checking message provided if no procedure code is selected - Clin +ic' ); my @second_sample_array = ( 'WorkCenter - List Patients - To check the items displayed in Filt +er By combo box', 'WorkCenter - List Patients - To check Filter By combo box when no + patients are assigned on initial login', 'WorkCenter - List Patients - Order in which patients are displaye +d' );

Any guidance on how to capture a common beginning substring would be greatly appreciated.

Replies are listed 'Best First'.
Re: Find the common beginning substring in an array of strings (Updated)
by BrowserUk (Patriarch) on Jun 06, 2012 at 16:43 UTC

    Updated: To correct odd number problem identified my mbethke and AnomalousMonk below.

    @array_of_test_names = ( ... );; $mask = '';; $mask ^= $_ for @array_of_test_names;; $mask ^= $array_of_test_names[0] if @array_to_test_names &1' $mask =~ m[^(\0+)] and $len = length( $1 );; print substr $array_of_test_names[ 0 ], 0, $len;; History - Family History - Suite with Patient - Clinic -

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Pretty smart and super fast, only works with an even number of strings though. Off the top of my head I'd say for an uneven number just duplicate one of the strings.

      Gotta go and pick up my son from kindergarten, otherwise there'd be code to support my claim :)

        I've updated my node with a correction. Thanks.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      But that doesn't work for arrays with odd numbers of elements:

      >perl -wMstrict -MData::Dump -le "my @array_of_test_names = qw(ABCxyz ABCDEfoo ABCDbar); ;; my $mask = '';; $mask ^= $_ for @array_of_test_names;; dd $mask; ;; my $len; $mask =~ m[^(\0+)] and $len = length( $1 );; my $min = substr $array_of_test_names[ 0 ], 0, $len;; print qq{'$min'}; " "ABCx^}\35o" Use of uninitialized value $len in substr at -e line 1. ''

      Update: I didn't see mbethke's post until after I posted, but I agree that simple fix will solve the odd-element problem.

      Not sure if I miss something, but it seems there is a problem.

      With this array for example:

      my @array_of_test_names = ( "History--abcdef", "History--aaa", "History--123", "History--1X", );
      output: History--a

      updated: bad try

      A try:

      my $mask = ''; $mask ^= $_ ^ $array_of_test_names[0] for @array_of_test_names; $mask =~ m[^(\0+)] and $len = length( $1 ); print substr $array_of_test_names[ 0 ], 0, $len; print "\n";

      update2: btw, a not pretty solution

      my $res = ''; for my $i ( 0 .. (length($array_of_test_names[0]) -1)) { my $letter = substr ($array_of_test_names[0],$i,1); last if ( scalar grep { $letter ne substr($array_of_test_names[$_] +,$i,1)} (0 .. $#array_of_test_names)); $res .= $letter; }; print $res,"\n";
Re: Find the common beginning substring in an array of strings
by AnomalousMonk (Archbishop) on Jun 06, 2012 at 18:35 UTC

    Assuming a character set with a single-character-to-byte mapping:

    >perl -wMstrict -le "use List::Util qw(reduce); ;; my @ra = qw(ABCDxyz ABCfoo ABCDEbar); ;; use vars qw($a $b); my $min_start_seq = reduce { length($a) < length($b) ? $a : $b } map m{ \A \x00* }xmsg, map $ra[0] ^ $_, @ra ; my $common_starting_substring = $ra[0] & ~$min_start_seq; print qq{'$common_starting_substring'}; " 'ABC'
      This is the code that does the trick against all the examples I have so far... I don't have enough time to post it, but there is a 180-element array I have on my laptop that was not working with the code from the other comment. This was just tricky enough that I thought seeing the way some different monks approached it would be fun. I remain open to other solutions, as this is kind of a fun problem to think about.

      Thanks!!

        In that case, though I'm certain the masking method others have posted is the best solution, here's a recursive/substr method, just for fun:

        #!/usr/bin/env perl use Modern::Perl; sub findstr { if( @_ == 1 ){ return shift; } my $x = shift; my $y = shift; my $l = length $x > length $y ? length $y : length $x; for( my $i=$l; $i>0; $i--){ if( substr($x, 0, $i) eq substr($y, 0, $i)){ return findstr(substr($x,0,$i), @_); } } } my @array_of_test_names = ( 'History - Family History - Suite with Patient - Clinic - Select R +emove(-) Icon of Relative. - Clinic', 'History - Family History - Suite with Patient - Clinic - Save aft +er Entering values in all fields - Clinic', 'History - Family History - Suite with Patient - Clinic - Select A +dd(+) Icon of Relative. - Clinic', 'History - Family History - Suite with Patient - Clinic - Multiple + selection of relatives - Clinic', 'History - Family History - Suite with Patient - Clinic - Interact +ion Checking message not provided if procedure code is selected - Cli +nic', 'History - Family History - Suite with Patient - Clinic - Interact +ion Checking message provided if no procedure code is selected - Clin +ic', ); my $match = findstr(@array_of_test_names); say $match;

        Aaron B.
        Available for small or large Perl jobs; see my home node.

        In that case (and also because I'm only now reading this node), there's this based on a common prefix removal whim I had in the past (with partial credits to moritz for tidying up my original code as seen on my scratchpad):
        #!/usr/bin/perl use strict; use warnings; my @array_of_test_names = ( 'History - Family History - Suite with Pati +ent - Clinic - Select Remove(-) Icon of Relative. - Clinic', 'History - Family History - Suite with Pati +ent - Clinic - Save after Entering values in all fields - Clinic', 'History - Family History - Suite with Pati +ent - Clinic - Select Add(+) Icon of Relative. - Clinic', 'History - Family History - Suite with Pati +ent - Clinic - Multiple selection of relatives - Clinic', 'History - Family History - Suite with Pati +ent - Clinic - Interaction Checking message not provided if procedure + code is selected - Clinic', 'History - Family History - Suite with Pati +ent - Clinic - Interaction Checking message provided if no procedure +code is selected - Clinic', ); my @second_sample_array = ( 'WorkCenter - List Patients - To check the +items displayed in Filter By combo box', 'WorkCenter - List Patients - To check Filt +er By combo box when no patients are assigned on initial login', 'WorkCenter - List Patients - Order in whic +h patients are displayed', ); sub getleastcommonprefix { my @searcharray = @_; my $common = $searcharray[0]; foreach my $index (1 .. $#searcharray) { $_ = $searcharray[0] . reverse $searcharray[$index]; m/(.*)(.*)(??{quotemeta reverse $1})/s; if (length $1 < length $common) { $common = $1; } } ## end foreach my $index (1 .. $#searcharray) return $common; } ## end sub getleastcommonprefix print 'Common prefix for first sample [' . getleastcommonprefix(@array +_of_test_names) . "]\n"; print 'Common prefix for second sample [' . getleastcommonprefix(@seco +nd_sample_array) . "]\n";
        Which gives the following output:
        Common prefix for first sample [History - Family History - Suite with +Patient - Clinic - ] Common prefix for second sample [WorkCenter - List Patients - ]
Re: Find the common beginning substring in an array of strings
by cheekuperl (Monk) on Jun 06, 2012 at 16:34 UTC
    Just getting clarification:
    In the first sample array, the common beginning will be
    History - Family History - Suite with Patient - Clinic -

    and in the second one
    WorkCenter - List Patients -

    Right?