Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Keep It Simple, Stupid
 
PerlMonks  

Combine data in table

by DrAxeman (Beadle)
on Aug 07, 2005 at 16:54 UTC ( #481701=perlquestion: print w/ replies, xml ) Need Help??
DrAxeman has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to combined data in a csv file. The data looks like:
colIP,colPartition,colPartitionSize,colPartitionFree 10.160.0.10,C:,8400,4600 10.160.0.11,C:,19500,16000 10.160.0.11,D:,185500,97700 10.160.0.12,C:,19500,15100 10.160.0.12,D:,48200,45900

What I am looking for is the deletion of the 2nd column, and then for all the rows who are the same in the first field, a sum of the 3rd and 4th fields. There could be 1 to 4 rows with the same data in field 1. Output should look like.

colIP,colPartitionSize,colPartitionFree 10.160.0.10,8400,4600 10.160.0.11,205000,113700 10.160.0.12,67700,61000

If this were columns I think I could do it, but since it's rows I'm not sure of the SQL.

Comment on Combine data in table
Select or Download Code
Re: Combine data in table
by davido (Archbishop) on Aug 07, 2005 at 17:03 UTC

    Is the dataset so large that you need a relational database's SQL to manage it, or could you simply use Text::CSV to translate the CSV file into an easily manipulated datastructure?


    Dave

      I've been using DBI and TEXT::CSV for my work. The files are not that large.
      I'd suggest that the size of the dataset is not the deciding factor bewtween using something like Text::CSV_XS and DBD::CSV, although it could be a factor in deciding between DBD::CSV and more robust DBMSs like SQLite or PostgreSQL or MySQL. IMNSHO, the deciding factors in choosing between a Perl parsing approach and a database approach would be the comfort level of the programmer with SQL and the kind and complexities of operations. Sure, parsing modules and parsing without modules can delete, and insert, and update, and sum, and average, and modify column structure, etc. But when one finds oneself doing many of those operations, a database approach may be called for.
Re: Combine data in table
by davidrw (Prior) on Aug 07, 2005 at 17:40 UTC
    It's just a single SQL statement, which should be easy since you have it in DBD::CSV already..
    SELECT colIP, SUM(colPartitionSize) as size, SUM(colPartitionFree) as free FROM table_name GROUP BY colIP
    Otherwise, to do it in perl, you would loop over the rows, hashing on the IP. Something like:
    my $rows = ...; # ref to AoH my %ips; foreach my $row (@$rows){ $ips{ $row->{colIP} }->{ colPartitionSize } += $row->{colPartitionSi +ze}; $ips{ $row->{colIP} }->{ colPartitionFree } += $row->{colPartitionFr +ee}; } print Dumper \%ips;
      This is what I did:

      # Connect to the database my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => "file.csv"}; my $row = "IP,Partition,PartitionSize,PartitionFree"; my %ips; foreach my $row (@$row){ $ips{ $row->{colIP} }->{ colPartitionSize } += $row->{colPartitionSi +ze}; $ips{ $row->{colIP} }->{ colPartitionFree } += $row->{colPartitionFr +ee}; } print Dumper \%ips;

      But I am getting:
      Can't use string ("IP,Partition,PartitionSize,Parti") as an ARRAY ref +while "strict refs" in use at ./psfile.pl line 39.
        I would recommend using the "SELECT ... GROUP BY" SQL statement -- it's very conscise and exactly what you want, and leverages the fact you're using DBD/CSV ..


        As for your code in this post, though, you have my $row = "IP,Partition,PartitionSize,PartitionFree"; which just sets some string as the value for $row .. you need it to be all the data as an array of hashrefs ..
        my $rows = $dbh->selectall_arrayref("SELECT * FROM your_table", {Slice +=>{}}, ); my %ips; foreach my $row (@$rows){ $ips{ $row->{colIP} }->{ colPartitionSize } += $row->{colPartitionSi +ze}; $ips{ $row->{colIP} }->{ colPartitionFree } += $row->{colPartitionFr +ee}; } print Dumper \%ips;
        (Note that this is inefficient, as well -- would be better to use a $sth and fetch one row at a time instead of sucking in all the rows at once)
Re: Combine data in table
by AReed (Pilgrim) on Aug 07, 2005 at 18:04 UTC
    One possible way in perl:
    use strict; use warnings; my %ips; # Deal with the column headers my $cols = <>; chomp ($cols); my @cols = split(',',$cols); print "$cols[0],$cols[2],$cols[3]\n"; while (<>) { my ($address, undef, $size, $free) = split(','); $ips{$address}{'size'} += $size; $ips{$address}{'free'} += $free; } # Note that the order of the lines in the output may not reflect # the order of the addresses in the input file. for my $address(sort keys %ips) { print "$address,$ips{$address}{'size'},$ips{$address}{'free'}\n"; }
Re: Combine data in table
by jZed (Prior) on Aug 07, 2005 at 18:22 UTC
    The solution is even easier than the other (good) suggestions in this thread. Use the new CREATE TABLE AS SELECT syntax to combine the three operations (remove a column, sum the columns, create a new table with the sums) into a single statement. This is a complete script that does exactly that, it creates a new CSV file called "partition_new.csv" with exactly the result set you specified:
    #!/usr/bin/perl -w use strict; use DBI; my $dbh=DBI->connect("dbi:CSV(RaiseError=1):csv_eol=\n"); $dbh->{csv_tables}->{partition}->{file}='partition.csv'; $dbh->{csv_tables}->{partition_new}->{file}='partition_new.csv'; $dbh->do("DROP TABLE IF EXISTS partition_new"); $dbh->do(" CREATE TABLE partition_new AS SELECT colIP , SUM(colPartitionSize) AS colPartitionSize , SUM(colPartitionFree) AS colPartitionFree FROM partition GROUP BY colIP "); print `cat partition_new.csv`;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://481701]
Approved by polettix
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2014-04-19 20:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (483 votes), past polls