Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: An efficient way to gather a common portion of several strings' beginnings

by BrowserUk (Patriarch)
on Nov 15, 2015 at 08:16 UTC ( [id://1147718]=note: print w/replies, xml ) Need Help??


in reply to An efficient way to gather a common portion of several strings' beginnings

This is O(n) rather than the O(n2) of your solution:

#! perl -slw use strict; my @strings = ( 'string that I need to gather the common base from: number 1 and some +other junk in it', 'string that I need to gather the common base from: number 2 and some +other junk in it', 'string that I need to gather the common base number 4 and some other +junk in it', 'string that I need to gather the common base from: number 3 and some +other junk in it', ); my( $mask ) = ( $strings[ 0 ] ^ $strings[ 1 ] ) =~ m[(^\0+)]; my $common = substr $strings[ 0 ], 0, length $mask; for my $i ( 2 .. $#strings ) { if( substr( $strings[ $i ], 0, length $common ) ne $common ) { ( $mask ) = ( $strings[ 0 ] ^ $strings[ $i ] ) =~ m[(^\0+)]; $common = substr $strings[ 0 ], 0, length $mask; } } print "'$common'"; __END__ C:\test>1147616 'string that I need to gather the common base '

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: An efficient way to gather a common portion of several strings' beginnings
  • Download Code

Replies are listed 'Best First'.
Re^2: An efficient way to gather a common portion of several strings' beginnings
by AnomalousMonk (Archbishop) on Nov 15, 2015 at 17:01 UTC

    The  m[(^\0+)] match results in an undefined value for  $mask when there is no common base substring at all, which then hiccups a "Use of uninitialized..." warning in the substr expression (if warnings are enabled). Happily, this is easily fixed by using  m[(^\0*)] instead!


    Give a man a fish:  <%-{-{-{-<

      Good catch and nice solution; but if the OP is expecting there to be a common prefix to his lines, then that warning might be a godsend in the event of a dataset where that is not the case.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: An efficient way to gather a common portion of several strings' beginnings
by AnomalousMonk (Archbishop) on Nov 15, 2015 at 18:16 UTC

    This solution fails for the null-infested  (qq{\000\000\000}, qq{\000\000}, qq{\000}) list of strings (as does GrandFather's, but GrandFather (update: explicitly) assumes nulls will not be present in any strings). The same list reversed produces the proper result. Some other solutions seem to accept nulls happily.


    Give a man a fish:  <%-{-{-{-<

      This solution fails for the null-infested (qq{\000\000\000}, qq{\000\000}, qq{\000}) list of strings

      And what you got from the OPs sample data is that his data is infested with nulls?

      but GrandFather assumes nulls will not be present in any strings

      And mine doesn't?

      Oh. I see. He states that he's assuming it rather than leaving the bloody obvious to be bloody obvious. (Whatever did I do to rattle your cage recently?)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
        ... the bloody obvious ...

        At certain seasons and in certain weathers, the throbbing of the pieces of shrapnel still lodged in my feet from the unspoken assumptions, corner cases and such like of projects past leads me obsessively to ponder the bloody obvious.

        Whatever did I do to rattle your cage recently?

        What you did was to post an interesting ++reply to an engaging question. I did not mean to offer offense; I apologize if I did so.


        Give a man a fish:  <%-{-{-{-<

Re^2: An efficient way to gather a common portion of several strings' beginnings
by Anonymous Monk on Nov 15, 2015 at 18:45 UTC

    Speaking of algorithmic efficiency. Consider the case where $string[17] eq "x" and preceding values all span kilobytes. Clearly, there's some room for improvement.

      Speaking of algorithmic efficiency. Consider the case where $string17 eq "x" and preceding values all span kilobytes. Clearly, there's some room for improvement.

      Hm. You could sort the strings, but sorting is O(n log n) instead of O(n).

      Still think so?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: An efficient way to gather a common portion of several strings' beginnings
by lonewolf28 (Beadle) on Nov 17, 2015 at 00:09 UTC

    ++ Superb solution /u/BrowserUk. It took me some time to figure out. Cheers.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1147718]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-25 05:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found