Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Consolidating biological data into occurance of each biological unit per sample in table

by Cristoforo (Deacon)
on Jan 14, 2013 at 16:11 UTC ( #1013237=note: print w/ replies, xml ) Need Help??


in reply to Consolidating biological data into occurance of each biological unit per sample in table

I am new to programming and have been successfully manipulating my data in perl until now.
Hello again ejohnston7!

Considering your remark, (above), about being new to programming, it seems like the code I gave may have had some new and unfamiliar things in it that you may not understand. I know when I started Perl, I didn't get some of the common idioms that experienced programmers in Perl had been using. In this case, the use of map, (apply a change to a list to get back a new list with the changes), and the Hash of Hash data structure and how to access it.

Here is the comma separated value approach, (below), and I'll try to follow it up with some explanation. If you have any questions, do come back and someone will try to explain what you ask about.

#!/usr/bin/perl use strict; use warnings; my %data; while (<DATA>) { chomp; my (undef, $sample, @fields) = split /[\t;]/; for (@fields) { my ($type, $value) = split /__/; $data{$type}{$sample}{$value}++ if $value; } } for my $type (keys %data) { my $entity = $data{$type}; my @samples = sort keys %$entity; my %seen; my @keys = grep !$seen{$_}++, map keys %$_, values %$entity; open my $fh, '>', "$type.csv" or die "Unable to create '$type.csv' +. $!"; print $fh join(",", ' ', @samples), "\n"; for my $key (@keys) { print $fh join(",", $key, map $entity->{$_}{$key} || 0, @sampl +es), "\n"; } close $fh or die "Unable to close '$type.csv'. $!"; } __DATA__ occurence1 A a__bear;c__black occurence2 B a__wolf;c__grey occurence3 A a__wolf;c__white occurence4 A a__bear;c__ occurence5 C a__wolf;c__grey occurence6 C a__bear;c__brown occurence7 A a__wolf;c__ occurence8 B a__wolf;c__ occurence9 C a__bear;c__black occurence10 C a__wolf;c__ occurence11 A a__wolf;c__red occurence12 B a__wolf;c__grey occurence13 C a__wolf;c__grey occurence14 C a__wolf;c__grey occurence15 B a__bear;c__brown occurence16 C a__bear;c__brown occurence17 A a__bear;c__ occurence18 A a__bear;c__brown occurence19 C a__wolf;c__white occurence20 B a__wolf;c__grey occurence21 B a__bear;c__ occurence22 B a__wolf;c__grey occurence23 A a__wolf;c__grey occurence24 A a__bear;c__brown occurence25 C a__bear;c__brown occurence26 A a__bear;c__brown occurence27 C a__bear;c__ occurence28 C a__bear;c__brown occurence29 B a__wolf;c__red occurence30 B a__wolf;c__grey
Files created by the program above and readable by Excel are:
C:\Old_Data\perlp>type a.csv ,A,B,C bear,6,2,6 wolf,4,7,5 C:\Old_Data\perlp>type c.csv ,A,B,C white,1,0,1 black,1,0,1 brown,3,1,4 red,1,1,0 grey,1,5,3

The first thing I'd like to do is provide a picture of what the %data hash contains using Data::Dumper. (I got this by placing the statement use Data::Dumper; print Dumper \%data; right after the while loop and before the for loop. I use Data::Dumper alot to see what exactly is in a data structure I created to see if everything is allright.

C:\Old_Data\perlp>perl t5.pl $VAR1 = { 'c' => { 'A' => { 'white' => 1, 'black' => 1, 'brown' => 3, 'red' => 1, 'grey' => 1 }, 'C' => { 'white' => 1, 'black' => 1, 'brown' => 4, 'grey' => 3 }, 'B' => { 'red' => 1, 'brown' => 1, 'grey' => 5 } }, 'a' => { 'A' => { 'bear' => 6, 'wolf' => 4 }, 'C' => { 'bear' => 6, 'wolf' => 5 }, 'B' => { 'bear' => 2, 'wolf' => 7 } } };
I created a hash of a hash of a hash, (with this statement, $data{$type}{$sample}{$value}++ if $value;).

$type could be 'a' or 'c', (from your sample data). $sample is 'A', 'B' or 'C' and $value would be the name of the animal or the color. (Note that the statement ends with if $value;. In your explanation of the problem, you didn't want to count values that had no name.
occurence7    A    a__wolf;c__

There is no color here so it wouldn't be added to the hash.

while (<DATA>) is shorthand for while (defined $_ = <DATA>).

chomp with no argument chomps $_ by default.

Likewise, split without an argument operates on $_ as well, split /[\t;]/.

In the for loop, for (@fields), each element of the array being iterated over is assigned to $_, not the same $_ from the while loop but $_ localized to the for loop. They do not clash.

Thats just some of the explanation, but enough to help you begin to understand hopefully. I have to leave now, but ask any questions about what you don't understand.

Hope this explains a little for you.


Comment on Re: Consolidating biological data into occurance of each biological unit per sample in table
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1013237]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-07-29 19:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (226 votes), past polls