|We don't bite newbies here... much|
Hello, I wonder if anyone could be so kind as to help me with a bit of a problem that has been frustrating the hell out of me for a while now! I'm sure it's relatively simple, but I am quite new to perl and my brain is frazzled...
The root of the problem is pretty basic; I have a data file that looks like this:
and what I would like to extract is the number of times along each line that each pairwise combination of names occurs, ie the number of times each pair of names occurs, summed over the whole file (apologies if this is a bit confusing...).
To clarify by example: for line 1 tomD_gly4 occurs once, tomD_phil occurs once and gly4_phil also occurs once, where an underscore between the two names simply indicates the relevant pairwise combo. Similarly, for line 2 aesG_tomD = 1, aesG_gly4 = 1, aesG_phil = 1, tomD_gly4 = 1, tomD_phil = 1 and gly4_phil = 1. So if the file were just these two lines then I would like an output that looked something like:
thereby counting the number of pairwise occurrences of each possible combination across the file. To give it a bit of context, I'm counting the number of genes that are shared between two genomes. I've been wrestling with hash counts and what not but I can't think of a way to do it without having to declare a separate variable for each pairwise combination, a la:
but as the number of things increases the number of combo's increases exponentially and this becomes really repetitive and unfeasable (not to mention probably totally unnecessary and very amateurish...).
Any help would be fantastically appreciated, and I can give more info if it would help!