Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Hello R56,

Three points:

(1) Unless you’re sure that the first line of the data file begins with a single digit, followed by a single whitespace character, followed by a single digit, your regex will not work. You need to add quantifiers: /^(\d+)\s+(\d+)/.

I would write the opening code like so:

#! perl use strict; use warnings; use autodie; use constant { IN_FILE => 'text.txt', SIZE => ???, MIN_MATCH => ???, }; my $size = SIZE; my $min_match = MIN_MATCH; my @array; open(my $in, '<', IN_FILE); if (defined(my $line = <$in>)) { if ($line =~ /^(\d+)\s+(\d+)/) { $size = $1; $min_match = $2; } else { push @array, $line; } push @array, $_ while <$in>; } close $in;

Obviously, you will need to supply sensible default values for SIZE and MIN_MATCH.

(2) I do not know what this means:

The first line of the input contains two numbers. First one tells us how many strings we should consider, second one tells us the minimum number of original strings where the substring should appear.

However, I suggest you postpone consideration of these constraints until you’ve tackled the more important problem of finding the longest common substring for 3 or more strings.

(3) Your post implies that the general problem can be solved by “tweaking” the algorithm for finding the longest common substring between 2 strings. However, this may be harder than you think. Consider the following 3 strings:

adam***betty ^^charlie^^^^^adam^^betty^^^^^ #####adam######charlie##

By inspection, the longest substring common to this data set is adam. But application of your 2-string algorithm to the first 2 strings gives betty only. You need an algorithm which finds:
    the set S1 of all substrings common to strings 1 and 2,
    the set S2 of all substrings common to strings 1 and 3, and
    the set S3 of all substrings common to strings 2 and 3,
and then finds the solution as the longest string in the intersection of S1, S2, and S3.

A glance at Longest_common_substring_problem suggests that the algorithm you need is far from trivial. But the dynamic programming pseudocode given there should get you started in the right direction.

Hope that helps (a little),

Update: Improved the paragraph beginning “By inspection”.

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re: longest common substring (with needed tweaks) by Athanasius
in thread longest common substring (with needed tweaks) by R56

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 05:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found