Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Pre-process csv files before using

by DrAxeman (Beadle)
on Aug 06, 2005 at 17:42 UTC ( #481525=perlquestion: print w/ replies, xml ) Need Help??
DrAxeman has asked for the wisdom of the Perl Monks concerning the following question:

I have a a script that parses a csv file and calculates teh averages of the columns and then outputs it in csv format. During this process, I'd like to strip out certain columns (before processing the averages). I'm stumped on how to do this. I also need to strip periods (.) from the header row. How can I do this?
#!/usr/bin/perl use strict; use warnings; use DBI; use Text::CSV; use Data::Types qw(:all); # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @cols = @{$sth->{NAME}}; shift @cols; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my @avgs = $dbh->selectrow_array($avgSQL); my %avgsHash; @avgsHash{ @cols } = @avgs; # uses a hash slice to populate %avgs +Hash print "Col,Avg\n"; printf("%s,%f\n", $_, $avgsHash{$_}) for @cols;

Comment on Pre-process csv files before using
Download Code
Re: Pre-process csv files before using
by jZed (Prior) on Aug 06, 2005 at 18:12 UTC
    To strip the periods, use a regex like s/\.//g;. Then save the file. Then open the file with DBD::CSV and use this kind of approach to create a new file that only has the columns you want:
    # my $dbh = DBI->connect( ... ); my $wanted_cols = join ',',qw( colname1 colname2 ...); $dbh->do(" CREATE TABLE new_table AS SELECT ($wanted_cols) FROM old_table ");
    That will make a new CSV file that is an exact duplicate of the original except leaving out the columns you don't want. The "CREATE TABLE AS SELECT ..." syntax is only available in the newest version of SQL::Statement (the SQL engine for DBD::CSV).
Re: Pre-process csv files before using
by sk (Curate) on Aug 06, 2005 at 18:19 UTC
    Will you know which columns numbers you want to keep? If so, you can just return the required portion of the array and use it for average calculation.

    perl -e '@array = qw (hello world 1 2 3); @req = qw (2 3 4); $str = pr +int join (",",@array[@req]), $/;' __END__ 1,2,3

    In the above code i have an array called array then i have a required columns array called req. @array[@req] returns a list with values from the required columns.

    Regarding the removal of "." from the header, at what point do you want to do it? Before you start reading the CSV into DBI or when you write out? Before you use it in DBI would require you to do an inplace edit. Writing out the correct header (without period) should be very easy. A simple regex on the your "variable column" array.

    perl -e '@wperiod = qw (hi.1 hi.2 hi.3); print join (",", map {s/\.//g +; $_} @wperiod), $/;' __END__ hi1,hi2,hi3

    -SK

      Is there any way that I can add a line after
      <code> shift @cols; <code> That will look at the @cols array and then strip column names that meet a specific regular expression?
        Would something like this do?

        perl -e'@cols =qw (need1 dont1 need2 dont2 need3); for (@cols) { push( +@req,$_) unless /dont.*/;} print join (",",@req),$/;' __END__ need1,need2,need3

        You mentiioned regex so the above should help you. However if the names you want to choose are sent as input then you might want to construct a hash from the list and then drop the ones that should be excluded by checking the hash

        .

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://481525]
Approved by kwaping
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2014-07-29 05:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls