Re: Is there a way to compare strings without using an array?
by MidLifeXis (Monsignor) on Oct 18, 2011 at 19:01 UTC
|
I need a little bit of clarification. Is this a correct interpretation of your question?
- given two strings
- find all duplicate characters
- order of matches is not important
This also looks homeworkish. If so, please state as much. In any case, please show what you have tried.
| [reply] |
|
Yes, that is correct and no it isn't a hw problem. I have individual protein families, each containing a various number of proteins. These proteins are not unique to a protein family. They can be in many families. I'm trying to find the best way to perform total protein coverage with the least number of protein families. I'm using a greedy algorithm. Instead of storing all 5million (rough guess) or so proteins in an array and checking them off once I've retrieved them I would like to put them in a really long string to save some space. I haven't tried using a string yet, because I've been working with a hash array combination.
| [reply] |
|
| [reply] |
|
Re: Is there a way to compare strings without using an array?
by jgamble (Pilgrim) on Oct 18, 2011 at 19:36 UTC
|
The keyword you're looking for is "Levenshtein".
Google (or DuckDuckGo) will find you information on his algorithm. You'll also find modules that handle that in CPAN (put "Levenshtein" in the search box, keep the "in" box to "All").
| [reply] |
|
Actually I'm not sure this will work because my string will look something like this.. 648040620,637132715,649986572 etc.
Levenshtein could take into account some unwanted modifications, such as changing the entire 9 digit number by changing a single digit
I need the numbers not to be considered as change-able
| [reply] [d/l] |
|
Wait ... you mean your proteins are not single letters? But actually a 9 digit number like 648040620? Then the regex solution as mentioned above will not work and I think you better put your data in arrays and use List::Compare to calculate the set intersection and such.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] |
|
|
|
|
|
|
| [reply] |
Re: Is there a way to compare strings without using an array?
by Anonymous Monk on Oct 18, 2011 at 19:01 UTC
|
say $line2 =~ /[$line1]/g;
say $line1 =~ /[^$line2]/g, $line2 =~ /[^$line1]/g;
| [reply] [d/l] |
|
With duplicates removed and the output sorted:
use Modern::Perl;
my $line1= 'CABGFEBFA';
my $line2= 'DBFDDF';
say sort keys %{{map {$_ => 1} $line2 =~ /[$line1]/g}};
say sort keys %{{map {$_ => 1} $line1 =~ /[^$line2]/g, $line2 =~ /[^$l
+ine1]/g}};
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] |
|
| [reply] [d/l] [select] |
|
use feature 'say';
| [reply] [d/l] |
|
|
|
|
|