Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: finding tuples

by Roy Johnson (Monsignor)
on Jun 23, 2009 at 18:48 UTC ( #774138=note: print w/ replies, xml ) Need Help??


in reply to finding tuples

Your description of the problem is a bit puzzling. It looks like you want to run through the string in order and select four characters from it. After you've taken those four out, you want to go back and get four more from what's left, and so on, until you can't form any more tuples. The rules for selecting 4-tuples are that they must all be the same letter (chosen from a cluster of at least 4), or they must be one letter from each of four consecutive clusters of letters.

Is that right?

And then to add to the difficulty, you want to get the largest possible set. I think that is a hard problem. I don't have a solution for you, but I hope I've made your requirements clearer to others.

Update
I have come up with a script that does what I think you want, although it does not skip matches in favor of more optimal ones. It just finds the leftmost tuple until they're all gone. (update again: think it's debugged now) Anyway, it's a start.

#!perl use strict; use warnings; while (<DATA>) { chomp; my ($tuple, @set); print "Starting with $_\n"; while (($tuple) = /((\w)\2\2\2|(\w)(?:\3*)(?!\3)(\w)(?:\4*)(?!\4)(\w +)(?:\5*)(?!\5)(\w))/) { $tuple =~ y///cs; # Only one of any character if (length($tuple) == 1) { $tuple x= 4; } print "Found $tuple!\n"; # Remove tuple for my $char (split //, $tuple) { # If it is the last of its kind, # no more matches across it are possible # I put a space in there, so it won't match \w s/(?<!$char)$char(?!$char)/ / or s/$char//; } print "Next round: $_\n"; } } __DATA__ AAAAADDDDDEFFGMMSSTVVVVV AADDDEEEEFFFFGGMMMMMMMMMMSTV AADEEFFG

Caution: Contents may have been coded under pressure.


Comment on Re: finding tuples
Download Code
Re^2: finding tuples
by Anonymous Monk on Jun 23, 2009 at 19:43 UTC

    Your ruleset is basically right, except it's a set, not a string, and order does not matter. I sorted it and made it into a string for convenience because perl regex operate on strings, and there is a substring function, but not a subset function. I repeat back what you wrote with updated wording:

    I want to run through the set and select four characters from it. After I've taken those four out, I want to go back and get four more from what's left, and so on, until I can't form any more tuples. The rules for selecting 4-tuples are that they must all be the same letter (quantity of at least 4), or they must be one letter from each of four consecutive letters.

    I have tried your program, and it does only very little. I do not understand that big regex. I will explain what is missing. For the first data line, the remainder after taking out the same-tuples is ADEFFGMMSSTV. It should be split this way: ADEF;FGMS;MSTV. For the second data line, after taking out the same-tuples the remainder is ADEFFGGMMSTV. From this, no solution can be found anymore, as no matter how you turn it, there are only at most two 4-tuples left that satisfy the second condition and the last 4 are not consecutive. For the third data line, the program says nothing, it correctly recognises this set has no solution, but shouldn't it at least try to remove the obvious 4-tuples ADEF or DEFG?

      There is one algorithm called EM algorithm... it can probably give you more insight since it is related to shot-gun sequencing. I hope it can help Tell me what comes up your way Take care

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://774138]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2014-04-18 06:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (462 votes), past polls