c4onastick has asked for the wisdom of the Perl Monks concerning the following question:
Greetings Monks,
I've got some data in tab-separated values that comes in the form:
Data Set 1 | Data Set 2 | Data Set 3 |
X units | Y units | X units | Y units | X units | Y units |
x1 | y1 | x1 | y1 | x1 | y1 |
x2 | y2 | x2 | y2 | x2 | y2 |
x3 | y3 | x3 | y3 | x3 | y3 |
... | ... | ... | ... | ... | ... |
I'd like to get that in to either a hash or an array by reading the file in from STDIN. I need to be able to query the data asking things like, "What's Data Set 1's y value at Data Set 3's x3 value?" Wherein lies the problem, I need to keep the "Data Set" labels associated with the data. I started writing it like this:
use warnings;
my %data_sets;
while(<>){
# I did the $line thing because there's some garbage at the top of
+the file that I don't want
my $line++;
if($line == 2){
while(/\G([^\t]+)/g){
$data_sets{$_} = {};
}
}
if($line > 3){
$data_sets{}; #<- ran into the problem right here! The script do
+esn't know what data set its on!
}
So my question is: what would be the appropriate complex internal data structure to parse the data into (I was going with a hash of hashes, maybe a hash of arrays?) I should mention that the reason I have this problem is that Data Set 1's x1 does not necessarily equal Data Set 2's x1. I've looked at Data::Table, and for the analysis that I need to do on the data, it doesn't quite seem to fit my needs. Thanks in advance for your wisdom.
Re: Help munging tabular data
by BrowserUk (Patriarch) on Aug 02, 2007 at 00:13 UTC
|
Several structures are possible. I've used a HoHoA here. You might want to strip spaces from the hash keys if you have to make many references to them in your code.
#! perl -slw
use strict;
use Data::Dump qw[ pp ];
my %data;
chomp( my @L1Headers = split "\t", scalar <DATA> );
chomp( my @L2Headers = split "\t", scalar <DATA> );
@data{ @L1Headers } = map{
+{ map{ $_ => undef } @L2Headers[ 0, 1 ] }
} 1 .. @L1Headers;
while( <DATA> ) {
chomp;
my @values = split "\t";
for my $l1 ( @L1Headers ) {
for my $l2 ( keys %{ $data{ $l1 } } ) {
push @{ $data{ $l1 }{ $l2 } }, shift @values;
}
}
}
print pp \%data;
## "What's Data Set 1's y value at Data Set 3's x3 value?"
for ( 0 .. $#{ $data{ 'Data Set 1' }{ 'X units' } } ) {
$data{ 'Data Set 3' }{ 'X units' }[ $_ ] eq 'x3'
and print "DS3-X == x3, DS1-Y == ",
$data{ 'Data Set 1' }{ 'Y units' }[ $_ ];
}
__DATA__
Data Set 1 Data Set 2 Data Set 3
X units Y units X units Y units X units Y units
x1 y1 x1 y1 x1 y1
x2 y2 x2 y2 x2 y2
x3 y3 x3 y3 x3 y3
Outputs c:\test>junk2
{
"Data Set 1" =>{ "X units" =>["x1","x2","x3"], "Y units" =>["y1","y2
+","y3"] },
"Data Set 2" =>{ "X units" =>["x1","x2","x3"], "Y units" =>["y1","y2
+","y3"] },
"Data Set 3" =>{ "X units" =>["x1","x2","x3"], "Y units" =>["y1","y2
+","y3"] },
}
DS3-X == x3, DS1-Y == y3
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
Re: Help munging tabular data
by GrandFather (Saint) on Aug 02, 2007 at 00:58 UTC
|
Depends how you want to access it more than anything else. Either a HoHoA, a HoAoA or a HoAoH would fit the bill. This is a HoHoA variant:
use strict;
use warnings;
my %data_sets;
my @dataHeaders = split /,/, ($_ = <DATA>, chomp, $_);
my @setHeaders = split /,/, ($_ = <DATA>, chomp, $_);
splice @setHeaders, 2; # Assume paired columns and the same for all da
+ta sets
while (<DATA>) {
chomp;
my @fields = split /,/;
Outer: for my $dataSet (@dataHeaders) {
for my $field (@setHeaders) {
last Outer unless @fields;
push @{$data_sets{$dataSet}{$field}}, shift @fields;
}
}
}
my $lu = $data_sets{'Data Set 3'}{'X units'}[2]; # Data Set 3's x3 val
+ue
my @matches = grep {$data_sets{'Data Set 1'}{'X units'}[$_] eq $lu}
0 .. $#{$data_sets{'Data Set 1'}{'X units'}};
for my $index (@matches) {
print "Data Set 1 element X" .
($index + 1) . "'s Y = $data_sets{'Data Set 1'}{'Y units'}[$in
+dex]\n";
}
__DATA__
Data Set 1,Data Set 2,Data Set 3
X units,Y units,X units,Y units,X units,Y units
1,y1,x1,y1,2,y1
2,y2,x2,y2,3,y2
3,y3,x3,y3,1,y3
Prints:
Data Set 1 element X1's Y = y1
The s/\t/,/g was to avoid unnoticed editor foibles.
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
|
| [reply] |
Re: Help munging tabular data
by aquarium (Curate) on Aug 02, 2007 at 00:13 UTC
|
| [reply] |
Re: Help munging tabular data
by Cristoforo (Curate) on Aug 04, 2007 at 03:42 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my @data;
chomp(my @sets = split "\t", <DATA>);
<DATA>;
while (<DATA>) {
my @xys = split;
for my $dataset (0..@xys/2-1) {
push @{ $data[$dataset] }, [splice @xys, 0, 2];
}
}
my $dbh = DBI->connect("dbi:SQLite:dbname=datapoints_01.lite","","") o
+r die $!;
$dbh->do(qq{ CREATE TABLE data_sets
(id INTEGER,
name TEXT)
});
my $sth = $dbh->prepare(q{INSERT INTO data_sets VALUES(?, ?)}) or die
+$dbh->errstr;
my $id;
for my $name (@sets) {
$sth->execute(++$id, $name) or die $dbh->errstr;
}
$dbh->do(qq{ CREATE TABLE data_pts
(set_id INTEGER,
id INTEGER,
x INTEGER,
y INTEGER)
});
$sth = $dbh->prepare(q{INSERT INTO data_pts VALUES(?, ?, ?, ?)})
or die $dbh->errstr;
for my $set_id (0..$#data) {
for my $point_id (0..$#{ $data[$set_id] }) {
$sth->execute($set_id+1, $point_id+1, @{ $data[$set_id][$point
+_id] })
or die $dbh->errstr;
}
}
$sth->finish;
$dbh->disconnect or die $dbh->errstr;
__DATA__
Data Set 1 Data Set 2 Data Set 3
X units Y units X units Y units X units Y units
1 2 10 20 100 200
3 4 30 40 300 400
5 6 50 60 500 600
__END__
C:\perlp>sqlite3 datapoints_01.lite
SQLite version 3.3.8
Enter ".help" for instructions
sqlite> select * from data_sets;
1|Data Set 1
2|Data Set 2
3|Data Set 3
sqlite> select * from data_pts;
1|1|1|2
1|2|3|4
1|3|5|6
2|1|10|20
2|2|30|40
2|3|50|60
3|1|100|200
3|2|300|400
3|3|500|600
sqlite>
| [reply] [d/l] |
|
|