Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: longest common substring (with needed tweaks)

by Lennotoecom (Pilgrim)
on Oct 27, 2013 at 17:27 UTC ( #1059925=note: print w/replies, xml ) Need Help??

in reply to longest common substring (with needed tweaks)

$_ = <DATA>; $_ = $` if /$/; @a = split //, $_; for $i (0 .. $#a){ $e = $a[$i]; $hash{$e} = 1; for $y ($i+1 .. $#a){ $e .= $a[$y]; $hash{$e} = 1; } } while(<DATA>){ $_ = $` if /$/; @a = split //, $_; %thash = (); for $i (0 .. $#a){ $e = $a[$i]; $thash{$e} = 1 if defined $hash{$e}; for $y ($i+1 .. $#a){ $e .= $a[$y]; $thash{$e} = 1 if defined $hash{$e}; } } foreach $key (keys %hash){ $hash{$key}++ if defined $thash{$key}; } } $max = ''; foreach $key (keys %hash){ if($hash{$key} == 3){ $max = $key if length($max) < length($key); } } print "$max\n"; __DATA__ strrringggg ssttrrringggg stttrrringgg
output will be
which is the longest common substring in this case so about your options, change 3 to 2 in the last cycle
and you'll get the common substring for at least to lines.
I know this is ugly. No time)
the algorithm of the monstrosity above:
1. taking first line.
2. create all possible combinations of substrings out of it and put it into hash.
3. in the cycle take each line, and create all possible substrings out of it creating temporal hash.
4. compare temp. hash with the first, increment only those which are in both.
5. at the last cycle the number in the if construction, sorts how many lines have to have desirable longest substring.

Replies are listed 'Best First'.
Re^2: longest common substring (with needed tweaks)
by R56 (Sexton) on Oct 28, 2013 at 16:37 UTC

    Great piece of code Lennotoecom :)

    Struggling a bit to understand it tho, as it's greatly simplified!

    Can you explain me this first line in detail? Never seen the $` before...

    $_ = <DATA>; $_ = $` if /$/; @a = split //, $_;

    Thank you!

      for example:
      $a = 'aa ab c c'; $a=~m/b/; now $` contains 'aa a' $& contains 'b' $' contains ' c c'
      in other words all symbols of a line before the found result
      found result,
      and all the symbols after found results
        #takes first line from the <DATA> and split values by ' ' into $lines +and $matches ($lines, $matches) = split /\s/, <DATA>; #takes the next line from the <DATA>, chop off the \n and split result +ed string #into @a array by symbols $_ = <DATA>; $_ = $` if /$/; @a = split //, $_; #in this cycle(1) we create all possible combinations of substrings ou +t of the #@a array, (out of the first line) and equals them to 1 for $i (0 .. $#a){ $e = $a[$i]; $hash{$e} = 1; for $y ($i+1 .. $#a){ $e .= $a[$y]; $hash{$e} = 1; } } #in this cycle(2) we read file line by line and for every line #we do exactly the same as the previous cycle but into #temporal hash and then in the foreach cycle(3) we increment #existed keys from the first hash if they are in the current line while(<DATA>){ $_ = $` if /$/; @a = split //, $_; %thash = (); for $i (0 .. $#a){ $e = $a[$i]; $thash{$e} = 1 if defined $hash{$e}; for $y ($i+1 .. $#a){ $e .= $a[$y]; $thash{$e} = 1 if defined $hash{$e}; } } foreach $key (keys %hash){ $hash{$key}++ if defined $thash{$key}; } } #and finally here we go through the hash #and print only those keys which have their value == $matches $max = ''; foreach $key (keys %hash){ if($hash{$key} == $matches){ print "$key\n"; # $max = $key if length($max) < length($key); } } print "$max\n"; __DATA__ 3 2 strrringggg ssttrrringggg stttrrringgg
        this whole script has a flaw:
        the whole resulting hash is build upon the first text line
        so in order to fix it in the cycle number 3 if the hash value is undefined you
        should create one, not omit like in this example

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1059925]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2018-05-20 18:34 GMT
Find Nodes?
    Voting Booth?