Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Is it possible to find the number of matching and non-matching positions in strings using perl code?

by moritz (Cardinal)
on May 11, 2012 at 06:46 UTC ( #969913=note: print w/ replies, xml ) Need Help??


in reply to Is it possible to find the number of matching and non-matching positions in strings using perl code?

Finding characters where two string differs can be done with bitwise operations. If you binary XOR two strings, positions where both characters are the same come out as a null byte. When doing several comparisons, one can accumulate the differing positions using binary OR:

use warnings; use strict; use 5.010; # for say() my $a='AAATGCCTT'; my $b='AAAAGCGTC'; my $c='AAAGGCGTC'; my $mask = chr(0) x length $a; for ($b, $c) { $mask |= $a ^ $_; } # just to illustrate what the mask looks like: use Data::Dumper; $Data::Dumper::Useqq = 1; # count number of 0-bytes my $matches =()= $mask =~ /\0/g; say "Matches: ", $matches; say "Non-matches: ", length($a) - $matches;

This approach should scale well for longer strings, since the binary operations are faster than looping over all characters.


Comment on Re: Is it possible to find the number of matching and non-matching positions in strings using perl code?
Download Code
Re^2: Is it possible to find the number of matching and non-matching positions in strings using perl code?
by BrowserUk (Pope) on May 11, 2012 at 07:29 UTC

    Nice explanation++.

    One minor change: s/binary operations are faster than looping over all characters/looping over the characters in C is faster than looping over the characters in Perl/


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      The question is whether or how much C-level looping is there actually. Some CISC processors include string operators so the XOR might actually be a single instruction.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        Some CISC processors include string operators so the XOR might actually be a single instruction.

        They still loop, just at the microcode level.

        There is always a conditional test; address register(s) get incremented; and often a count register gets decremented; for some number of repetitions or until some condition is met. It's still a 'loop' by any standard.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re^2: Is it possible to find the number of matching and non-matching positions in strings using perl code?
by sauoq (Abbot) on May 11, 2012 at 11:26 UTC
      But the for loop just obfuscates things without adding anything at all.

      It adds generality beyond three strings to compare.

      Also, calling your variable $mask is questionable as you don't really intend to use it as a mask.

      So what do you suggest instead? Your usage of $bits isn't any better, because you don't care about bits, but bytes. But $bytes also wouldn't explain the purpose of the variable.

        It adds generality beyond three strings to compare.

        I might give you that if you wrote it that way. for ($b, $c) isn't any more general than my single statement. We'd both have to go change our code if we suddenly had to compare 4 strings.

        So what do you suggest instead? Your usage of $bits isn't any better, because you don't care about bits, but bytes.

        I'm not suggesting it's a good name for anything more than a throwaway example, but I wouldn't say it isn't any better. I do actually care about bits as I'm using it with bitwise operators. And the real point is that calling it $mask implies that you intend to use it as a mask. When you don't, confusion results.

        -sauoq
        "My two cents aren't worth a dime.";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://969913]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-12-26 07:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (168 votes), past polls