Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: minimal superstrings/maximal substrings

by jcb (Chaplain)
on Aug 20, 2019 at 01:25 UTC ( #11104714=note: print w/replies, xml ) Need Help??


in reply to minimal superstrings/maximal substrings

If I read this correctly, you do not want (SOCIOBIOLOGIST, OLOGIST) but do want (SOCIOBIOLOGIST, BIOLOGIST) and (BIOLOGIST, OLOGIST).

This leads to a simple iterative solution, where you first find all L > S tuples with the longest L and shortest S, then "split" those tuples for cases where some W exists such that L > W > S until you have no more tuples to split. Each split replaces one entry L > S in the list with a pair of entries L > W and W > S.

I am unsure, but suspect that you may be able to get better performance by applying scalar reverse to each string and working with "mirrored" strings, turning suffix matches (hard) into prefix matches (easy).

Now that I think about it, sorting those "mirrored" strings should put all possible L > S tuples into neat contiguous groups, with the shortest S at the beginning, longest L at the end, and any W in the middle, if you use the default cmp for the sort.

Does the entire lexicon fit into memory or do you need to sort them on disk?

Edit to add: At least this stage is simple enough that I would suggest solving this with the shell sort command and awk if you have those tools. I firmly believe that every monk here should learn AWK and use it for these types of simple hobby tasks — doing so will make you a better Perl programmer. (The GNU Awk manual is positively "light" reading compared to the enormous library of POD that comes with Perl.)

Replies are listed 'Best First'.
Re^2: minimal superstrings/maximal substrings
by Laurent_R (Canon) on Aug 20, 2019 at 08:42 UTC
    Does the entire lexicon fit into memory or do you need to sort them on disk?
    It most probably does. Word lists and dictionaries have at most a few hundreds of thousands of entries, and that's quite small compared to computer memory nowadays, probably even on Raspberry PI nanocomputers.

    On your edit: yes, I agree that AWK is a great tool that I have used quite a lot in the past, but, personally, I have almost stopped using AWK when I started to know Perl well enough to make it a more powerful and more flexible replacement.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11104714]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2019-10-15 11:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?