Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Pre-process csv files before using

by DrAxeman (Beadle)
on Aug 06, 2005 at 17:42 UTC ( #481525=perlquestion: print w/replies, xml ) Need Help??
DrAxeman has asked for the wisdom of the Perl Monks concerning the following question:

I have a a script that parses a csv file and calculates teh averages of the columns and then outputs it in csv format. During this process, I'd like to strip out certain columns (before processing the averages). I'm stumped on how to do this. I also need to strip periods (.) from the header row. How can I do this?
#!/usr/bin/perl use strict; use warnings; use DBI; use Text::CSV; use Data::Types qw(:all); # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @cols = @{$sth->{NAME}}; shift @cols; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my @avgs = $dbh->selectrow_array($avgSQL); my %avgsHash; @avgsHash{ @cols } = @avgs; # uses a hash slice to populate %avgs +Hash print "Col,Avg\n"; printf("%s,%f\n", $_, $avgsHash{$_}) for @cols;

Replies are listed 'Best First'.
Re: Pre-process csv files before using
by jZed (Prior) on Aug 06, 2005 at 18:12 UTC
    To strip the periods, use a regex like s/\.//g;. Then save the file. Then open the file with DBD::CSV and use this kind of approach to create a new file that only has the columns you want:
    # my $dbh = DBI->connect( ... ); my $wanted_cols = join ',',qw( colname1 colname2 ...); $dbh->do(" CREATE TABLE new_table AS SELECT ($wanted_cols) FROM old_table ");
    That will make a new CSV file that is an exact duplicate of the original except leaving out the columns you don't want. The "CREATE TABLE AS SELECT ..." syntax is only available in the newest version of SQL::Statement (the SQL engine for DBD::CSV).
Re: Pre-process csv files before using
by sk (Curate) on Aug 06, 2005 at 18:19 UTC
    Will you know which columns numbers you want to keep? If so, you can just return the required portion of the array and use it for average calculation.

    perl -e '@array = qw (hello world 1 2 3); @req = qw (2 3 4); $str = pr +int join (",",@array[@req]), $/;' __END__ 1,2,3

    In the above code i have an array called array then i have a required columns array called req. @array[@req] returns a list with values from the required columns.

    Regarding the removal of "." from the header, at what point do you want to do it? Before you start reading the CSV into DBI or when you write out? Before you use it in DBI would require you to do an inplace edit. Writing out the correct header (without period) should be very easy. A simple regex on the your "variable column" array.

    perl -e '@wperiod = qw (hi.1 hi.2 hi.3); print join (",", map {s/\.//g +; $_} @wperiod), $/;' __END__ hi1,hi2,hi3


      Is there any way that I can add a line after
      <code> shift @cols; <code> That will look at the @cols array and then strip column names that meet a specific regular expression?
        Would something like this do?

        perl -e'@cols =qw (need1 dont1 need2 dont2 need3); for (@cols) { push( +@req,$_) unless /dont.*/;} print join (",",@req),$/;' __END__ need1,need2,need3

        You mentiioned regex so the above should help you. However if the names you want to choose are sent as input then you might want to construct a hash from the list and then drop the ones that should be excluded by checking the hash


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://481525]
Approved by kwaping
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2017-08-23 22:30 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (360 votes). Check out past polls.