Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^6: Pre-process csv files before using

by DrAxeman (Beadle)
on Aug 06, 2005 at 21:46 UTC ( #481556=note: print w/ replies, xml ) Need Help??


in reply to Re^5: Pre-process csv files before using
in thread Pre-process csv files before using

I put the header data in a code tag, but it didn't get wrapped. Corion and I spoke about it.

@origcols is the problem. I am getting that from reading my csv file. I can't hardcode it since the column headers are not constants. I tried to do the following:

#!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.*|MSTCPLoop.*/ ; } print ("@cols \n");

It errors out with:
Use of uninitialized value in join or string at ./runsql3.sql line 19.
Use of uninitialized value in join or string at ./runsql3.sql line 23.

It also prints out the column headers except the one with the period in it. My guess is that there is an error happening with the SQL SELECT statement. I don't think it likes the period.

Just to clear things up, here is all my code:
#!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.*|MSTCPLoop.*/ ; }; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my @avgs = $dbh->selectrow_array($avgSQL); my %avgsHash; @avgsHash{ @cols } = @avgs; print "Col,Avg\n"; printf("%s,%f\n", $_, $avgsHash{$_}) for @cols;
Everything works except when there is a . in one of the headers. Then it fails.

Data in the columns that I am averaging can contain decimals, otherwise I'd just strip out all periods before starting.
I'm trying to add the slpit statement, but am not sure where/how to use it.


Comment on Re^6: Pre-process csv files before using
Select or Download Code
Re^7: Pre-process csv files before using
by sk (Curate) on Aug 06, 2005 at 23:20 UTC
    I have not used DBI::CSV to comment on issues with "." in names. It seems logical that it will break coz of dots as . is typically not a valid name charcater.

    Anyways you need to do an inplace edit of your CSV file before you pass it on to DBI::CSV. The code below demonstrates how to do an inplace edit of your file. This will create a backup of the file (mydata.csv.bak) but you can set the $^I flag to "" to pure inplace effect.

    WARNING: In Place edits are dangerous. Test out the code on a sample file before you run it on the original file

    { local $^I = ".bak"; local *ARGV; @ARGV = "mydata.csv"; while (<>) { s/\.//g if ($. == 1); print; } }
    Input
    PDHCSV40EasternDaylightTime.240,ERWWCOMMUNITIESMemory.PagesPERsec,ERWW +COMMUNITIESNetworkI nterfaceEthernetAdapterModuleBytesTotalPERsec,ERWW.COMMUNITIESNetworkI +nterface.EthernetAd apterCurrentBandwidth 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13

    Note the header - I have threw in some dots there

    After I run the program my mydata.csv becomes this -

    PDHCSV40EasternDaylightTime240,ERWWCOMMUNITIESMemoryPagesPERsec,ERWWCO +MMUNITIESNetworkInt erfaceEthernetAdapterModuleBytesTotalPERsec,ERWWCOMMUNITIESNetworkInte +rfaceEthernetAdapte rCurrentBandwidth 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13

    The dots in the header are gone but the ones in the data still remain. Now your CSV file is ready for DBI.

    Hope this helps!

    cheers

    SK

      I am working on test data. I can destroy this data as many times as I like! :-)

      As for the code, it's perfect! Thank you very much!
      Just one quick question. How about if I wanted to keep the original file intact, and have this create a new file with the modified data (over-writting an existing file if it exists)?

      Say it reads, data.csv and writes a new file named <origional-file-name>-ready.csv (or something like that). This way, if there is an error, I can re-run the perl script with out having to re-copy the data.
      --Scratch that--
      I figured out that there is a rename() function.
        You are making it complicated by doing a rename and in place edit.

        I would do -

        open (IN, 'origdata.csv'); # Open the file for read open (OUT,'>','copy.csv'); # Open the file for write. Will overwrite +if exists while(<IN>) { # read contents of input file # <IN>, typucally used inside a while will populate variable $_ # more proecessing print OUT ($_); # Write out to new file }

        I think you should start reading more about how to work with files and other perl idioms.

        I would recommend that you start with "Learning perl book". Also there are lot of stuff in Categorized Questions and Answers section!

        -SK

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://481556]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2014-07-12 16:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (240 votes), past polls