Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: The sum of absolute differences in the counts of chars in two strings.

by Limbic~Region (Chancellor)
on Nov 19, 2011 at 17:24 UTC ( #938986=note: print w/ replies, xml ) Need Help??


in reply to The sum of absolute differences in the counts of chars in two strings.

BrowserUk,
How long are the strings (min, max, avg)? Will they be the same length? Are the characters single byte (8 bits)? Assuming 1 byte characters, is the alphabet of possible characters restricted or can we assume all 256 possibilities?

Also, can you give a manual example? I am concerned about misunderstanding the requirements.

Cheers - L~R


Comment on Re: The sum of absolute differences in the counts of chars in two strings.
Re^2: The sum of absolute differences in the counts of chars in two strings.
by BrowserUk (Pope) on Nov 19, 2011 at 18:24 UTC

    The strings could contain anything and be of any length, but the example I'm working with is genomic data and less than 100 chars.

    A worked example:

    aaaaacaacaaagcc :: a=>10 c=>4 g=>1 t=>0 acaggtgacaaaaaa :: a=>9 c=>2 g=>3 t=>1 absolute diffs :: 1 2 2 1 sum of diffs :: 6

    It would be wrong to assume an alphabet of 4 even for genomic data.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      How about count others (not agct) as an exception?

      my $aa="AGCTAAABBBCCC"; my %k=(a=>"a",g=>"g",c=>"c",t=>"t",else=>"[^agct]"); my %all; while( my($k,$pattern)=each %k ){ $all{$k}++ while ($aa =~ m/$pattern/gi); }
      BrowserUk,
      I don't understand the example. Should that t => 0 and t => 1 be 1 not 2?

      Update: I am not going to have a chance to play with any of my ideas so I will just share them here in case they are of any help. I was hoping that there would be a way to "cancel out terms" such that there was less work to do. The two ideas I had for that would be performing bitwise operations on the strings to find out which characters were in common and only counting the remaining ones. The second idea I had would be to process the string in chunks rather than characters. If I were trying to do a generic solution though - I would like go with Inline::C (array vs hash) incrementing values for the first string, decrementing values for the second string and summing the 256 indices for the result in the end.

      Cheers - L~R

        Should that t => 0 and t => 1 be 1 not 2?

        Yes. Now corrected.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://938986]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2014-07-30 04:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls