Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

If I read this correctly, you do not want (SOCIOBIOLOGIST, OLOGIST) but do want (SOCIOBIOLOGIST, BIOLOGIST) and (BIOLOGIST, OLOGIST).

This leads to a simple iterative solution, where you first find all L > S tuples with the longest L and shortest S, then "split" those tuples for cases where some W exists such that L > W > S until you have no more tuples to split. Each split replaces one entry L > S in the list with a pair of entries L > W and W > S.

I am unsure, but suspect that you may be able to get better performance by applying scalar reverse to each string and working with "mirrored" strings, turning suffix matches (hard) into prefix matches (easy).

Now that I think about it, sorting those "mirrored" strings should put all possible L > S tuples into neat contiguous groups, with the shortest S at the beginning, longest L at the end, and any W in the middle, if you use the default cmp for the sort.

Does the entire lexicon fit into memory or do you need to sort them on disk?

Edit to add: At least this stage is simple enough that I would suggest solving this with the shell sort command and awk if you have those tools. I firmly believe that every monk here should learn AWK and use it for these types of simple hobby tasks — doing so will make you a better Perl programmer. (The GNU Awk manual is positively "light" reading compared to the enormous library of POD that comes with Perl.)


In reply to Re: minimal superstrings/maximal substrings by jcb
in thread minimal superstrings/maximal substrings by stevena

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others browsing the Monastery: (3)
    As of 2019-10-19 22:59 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      Notices?