Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Hello, I wonder if anyone could be so kind as to help me with a bit of a problem that has been frustrating the hell out of me for a while now! I'm sure it's relatively simple, but I am quite new to perl and my brain is frazzled...

The root of the problem is pretty basic; I have a data file that looks like this:

tomD gly4 phil aesG tomD gly4 phil aesG phil aesG tomD gly4 etc...

and what I would like to extract is the number of times along each line that each pairwise combination of names occurs, ie the number of times each pair of names occurs, summed over the whole file (apologies if this is a bit confusing...).

To clarify by example: for line 1 tomD_gly4 occurs once, tomD_phil occurs once and gly4_phil also occurs once, where an underscore between the two names simply indicates the relevant pairwise combo. Similarly, for line 2 aesG_tomD = 1, aesG_gly4 = 1, aesG_phil = 1, tomD_gly4 = 1, tomD_phil = 1 and gly4_phil = 1. So if the file were just these two lines then I would like an output that looked something like:

aesG_tomD = 1 aesG_gly4 = 1 aesG_phil = 1 tomD_gly4 = 2 tomD_phil = 2 gly4_phil = 2

thereby counting the number of pairwise occurrences of each possible combination across the file. To give it a bit of context, I'm counting the number of genes that are shared between two genomes. I've been wrestling with hash counts and what not but I can't think of a way to do it without having to declare a separate variable for each pairwise combination, a la:

my $tomD_phil; etc... foreach (@line_of_input_file) { if (($_ =~ /tomD/) and ($_ =~ /phil/)) { $tomD_phil++ } etc... }

but as the number of things increases the number of combo's increases exponentially and this becomes really repetitive and unfeasable (not to mention probably totally unnecessary and very amateurish...).

Any help would be fantastically appreciated, and I can give more info if it would help!


In reply to counting pairwise incidences by reubs85

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (9)
    As of 2017-11-20 12:22 GMT
    Find Nodes?
      Voting Booth?
      In order to be able to say "I know Perl", you must have:

      Results (286 votes). Check out past polls.