Re^2: longest common substring (with needed tweaks)

Great piece of code Lennotoecom :)

Struggling a bit to understand it tho, as it's greatly simplified!

Can you explain me this first line in detail? Never seen the \$` before...

```\$_ = <DATA>; \$_ = \$` if /\$/; @a = split //, \$_;

Thank you!

Re^3: longest common substring (with needed tweaks)
by Lennotoecom (Pilgrim) on Oct 28, 2013 at 17:29 UTC
for example:
```\$a = 'aa ab c c';
\$a=~m/b/;

now

\$` contains 'aa a'
\$& contains 'b'
\$' contains ' c c'
in other words all symbols of a line before the found result
found result,
and all the symbols after found results
```#takes first line from the <DATA> and split values by ' ' into \$lines
+and \$matches
(\$lines, \$matches) = split /\s/, <DATA>;

#takes the next line from the <DATA>, chop off the \n and split result
+ed string
#into @a array by symbols
\$_ = <DATA>; \$_ = \$` if /\$/; @a = split //, \$_;

#in this cycle(1) we create all possible combinations of substrings ou
+t of the
#@a array, (out of the first line) and equals them to 1
for \$i (0 .. \$#a){
\$e = \$a[\$i]; \$hash{\$e} = 1;
for \$y (\$i+1 .. \$#a){
\$e .= \$a[\$y]; \$hash{\$e} = 1;
}
}

#in this cycle(2) we read file line by line and for every line
#we do exactly the same as the previous cycle but into
#temporal hash and then in the foreach cycle(3) we increment
#existed keys from the first hash if they are in the current line

while(<DATA>){
\$_ = \$` if /\$/; @a = split //, \$_; %thash = ();

for \$i (0 .. \$#a){
\$e = \$a[\$i]; \$thash{\$e} = 1 if defined \$hash{\$e};
for \$y (\$i+1 .. \$#a){
\$e .= \$a[\$y]; \$thash{\$e} = 1 if defined \$hash{\$e};
}
}
foreach \$key (keys %hash){
\$hash{\$key}++ if defined \$thash{\$key};
}

}

#and finally here we go through the hash
#and print only those keys which have their value == \$matches
\$max = '';
foreach \$key (keys %hash){
if(\$hash{\$key} == \$matches){
print "\$key\n";
#               \$max = \$key if length(\$max) < length(\$key);
}
}

print "\$max\n";

__DATA__
3 2
strrringggg
ssttrrringggg
stttrrringgg
this whole script has a flaw:
the whole resulting hash is build upon the first text line
so in order to fix it in the cycle number 3 if the hash value is undefined you
should create one, not omit like in this example
```sub f {
@a = split //, shift; \$ih = shift;
for \$i (0 .. \$#a){
\$e = \$a[\$i]; \${\$ih}{\$e} = 1;
for \$y (\$i+1 .. \$#a){
\$e .= \$a[\$y]; \${\$ih}{\$e} = 1;
}
}
}

(\$l, \$m) = split /\s/, <DATA>;
\$_ = <DATA>; chomp; %h = (); f(\$_, \%h);

while(<DATA>){
chomp; %th = (); f(\$_, \%th);
\$h{\$_}++ foreach (keys %th);
}

foreach \$key (keys %h){
if(\$h{\$key} == \$m){
\$r[length(\$key)] = [] if ! exists \$r[length(\$key)];
push \$r[length(\$key)], \$key;
}
}

print "@{\$r[\$#r]}\n";

__DATA__
3 2
ac
bc
b
1: creates sub named f, which takes two parameters: string and a reference to a hash
that sub puts all combinations of substrings out of the given string into the hash
2: splits the first string from the file into two variables \$l \$m
3: takes the next line from the file and sends it into sub f with the reference to an empty hash %h
4: at this point the first line from the file is split on all its substrings which are put into
hash %h and have a value 1
5: then we read the rest of the file line by line and send these lines to the sub f along the reference
to an empty hash %th, right after that the two hashes are compared and the %h hash is incremented on the doubled values
6: runs through the %h hash and if the value of the key is amount of overlaps we need, then put it into an @r array of arrays
7: the last line prints all the longest overlaps with the same length

