Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hello, I wonder if anyone could be so kind as to help me with a bit of a problem that has been frustrating the hell out of me for a while now! I'm sure it's relatively simple, but I am quite new to perl and my brain is frazzled...

The root of the problem is pretty basic; I have a data file that looks like this:

tomD gly4 phil aesG tomD gly4 phil aesG phil aesG tomD gly4 etc...

and what I would like to extract is the number of times along each line that each pairwise combination of names occurs, ie the number of times each pair of names occurs, summed over the whole file (apologies if this is a bit confusing...).

To clarify by example: for line 1 tomD_gly4 occurs once, tomD_phil occurs once and gly4_phil also occurs once, where an underscore between the two names simply indicates the relevant pairwise combo. Similarly, for line 2 aesG_tomD = 1, aesG_gly4 = 1, aesG_phil = 1, tomD_gly4 = 1, tomD_phil = 1 and gly4_phil = 1. So if the file were just these two lines then I would like an output that looked something like:

aesG_tomD = 1 aesG_gly4 = 1 aesG_phil = 1 tomD_gly4 = 2 tomD_phil = 2 gly4_phil = 2

thereby counting the number of pairwise occurrences of each possible combination across the file. To give it a bit of context, I'm counting the number of genes that are shared between two genomes. I've been wrestling with hash counts and what not but I can't think of a way to do it without having to declare a separate variable for each pairwise combination, a la:

my $tomD_phil; etc... foreach (@line_of_input_file) { if (($_ =~ /tomD/) and ($_ =~ /phil/)) { $tomD_phil++ } etc... }

but as the number of things increases the number of combo's increases exponentially and this becomes really repetitive and unfeasable (not to mention probably totally unnecessary and very amateurish...).

Any help would be fantastically appreciated, and I can give more info if it would help!

Cheers!


In reply to counting pairwise incidences by reubs85

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (9)
    As of 2014-07-30 06:12 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (229 votes), past polls