Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Is there a way to compare strings without using an array?

by Jeri (Scribe)
on Oct 18, 2011 at 18:17 UTC ( #932212=perlquestion: print w/ replies, xml ) Need Help??
Jeri has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks! Is there a way to match stings without using an array?
my @line1 = qw(C A B G F E); my @line2 = qw(D B F); my $line1 = join '', @line1; my $line2 = join '', @line2;

The first string would be CABGFE and the second DBF

I either want to find what is similar or what is different between the two strings. In this case BF would be what is similar and CAGED would be what is different

Comment on Is there a way to compare strings without using an array?
Download Code
Re: Is there a way to compare strings without using an array?
by Anonymous Monk on Oct 18, 2011 at 19:01 UTC
    say $line2 =~ /[$line1]/g; say $line1 =~ /[^$line2]/g, $line2 =~ /[^$line1]/g;
      say for $line1 =~ /[$line2]/g;

      it complains...

      Bareword "say" not allowed while "strict subs"

      when I delete say, it complains...

      syntax error at evan.pl line 8, near "$line1 =~"

      What's the trick to this sting comparison?

      thanks!
      With duplicates removed and the output sorted:
      use Modern::Perl; my $line1= 'CABGFEBFA'; my $line2= 'DBFDDF'; say sort keys %{{map {$_ => 1} $line2 =~ /[$line1]/g}}; say sort keys %{{map {$_ => 1} $line1 =~ /[^$line2]/g, $line2 =~ /[^$l +ine1]/g}};

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Is there a way to compare strings without using an array?
by MidLifeXis (Prior) on Oct 18, 2011 at 19:01 UTC

    I need a little bit of clarification. Is this a correct interpretation of your question?

    • given two strings
    • find all duplicate characters
    • order of matches is not important

    This also looks homeworkish. If so, please state as much. In any case, please show what you have tried.

    --MidLifeXis

      Yes, that is correct and no it isn't a hw problem. I have individual protein families, each containing a various number of proteins. These proteins are not unique to a protein family. They can be in many families. I'm trying to find the best way to perform total protein coverage with the least number of protein families. I'm using a greedy algorithm. Instead of storing all 5million (rough guess) or so proteins in an array and checking them off once I've retrieved them I would like to put them in a really long string to save some space. I haven't tried using a string yet, because I've been working with a hash array combination.

        In which case, have you looked at BioPerl yet? It might have more tools that you'll be interested in. There's a large community at www.bioperl.org

        perl -e 'print qq(Just another Perl Hacker\n)' # where's the irony switch?
Re: Is there a way to compare strings without using an array?
by jgamble (Pilgrim) on Oct 18, 2011 at 19:36 UTC

    The keyword you're looking for is "Levenshtein".

    Google (or DuckDuckGo) will find you information on his algorithm. You'll also find modules that handle that in CPAN (put "Levenshtein" in the search box, keep the "in" box to "All").

      oooo thanks, this might just be what I need

      Actually I'm not sure this will work because my string will look something like this..

      648040620,637132715,649986572 etc.

      Levenshtein could take into account some unwanted modifications, such as changing the entire 9 digit number by changing a single digit

      I need the numbers not to be considered as change-able

        Wait ... you mean your proteins are not single letters? But actually a 9 digit number like 648040620? Then the regex solution as mentioned above will not work and I think you better put your data in arrays and use List::Compare to calculate the set intersection and such.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://932212]
Approved by Limbic~Region
Front-paged by Limbic~Region
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2014-09-19 10:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (135 votes), past polls