Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Is there a way to compare strings without using an array?

by jgamble (Pilgrim)
on Oct 18, 2011 at 19:36 UTC ( #932225=note: print w/ replies, xml ) Need Help??


in reply to Is there a way to compare strings without using an array?

The keyword you're looking for is "Levenshtein".

Google (or DuckDuckGo) will find you information on his algorithm. You'll also find modules that handle that in CPAN (put "Levenshtein" in the search box, keep the "in" box to "All").


Comment on Re: Is there a way to compare strings without using an array?
Re^2: Is there a way to compare strings without using an array?
by Jeri (Scribe) on Oct 18, 2011 at 19:41 UTC

    oooo thanks, this might just be what I need

Re^2: Is there a way to compare strings without using an array?
by Jeri (Scribe) on Oct 18, 2011 at 19:48 UTC

    Actually I'm not sure this will work because my string will look something like this..

    648040620,637132715,649986572 etc.

    Levenshtein could take into account some unwanted modifications, such as changing the entire 9 digit number by changing a single digit

    I need the numbers not to be considered as change-able

      Wait ... you mean your proteins are not single letters? But actually a 9 digit number like 648040620? Then the regex solution as mentioned above will not work and I think you better put your data in arrays and use List::Compare to calculate the set intersection and such.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        I like the List::Compare. It looks good. I'm just worried about, because my array will be at least 5 million in length. Can you offer me any peace of mind?

        I'm not 100% sure what you want to do here Jeri. The sample below assumes you have a single string composed of 9 character sequences. When the 'for' has run the keys of %uniq will be the unique 9 character sequences.

        It would take a while and a lot of memory over 5 mill x 9 char sequences!

        my $str = join '',(648040620,637132715,649986572,648040620 ); my $proteins_count = length ($str)/9; my %uniq; do { $uniq{$_}++ unless $uniq{$_} } for unpack "(A9)$proteins_count" , $str; print "@{[keys %uniq]}\n";
        Prints ..
        637132715 648040620 649986572

        I'm going to give it a shot today. I'll let you know what happens. Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://932225]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2014-09-01 18:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (15 votes), past polls