http://www.perlmonks.org?node_id=481534


in reply to Re^3: Pre-process csv files before using
in thread Pre-process csv files before using

The problem is that I don't nessessarily know what all the column names are. I know that any column that ends with "Bandwidth" or has "MSTCPLoopback" in it I don't want.

I'm trying to approach this from:
shift @cols; #Remove the first column for (@cols) { delete $cols[$regex here] }
Then start my sql stuff.

Replies are listed 'Best First'.
Re^5: Pre-process csv files before using
by davidrw (Prior) on Aug 06, 2005 at 19:26 UTC
    You can just grep out the ones you want.
    my @cols = ...; shift @cols; # blindy throw away first column @cols = grep( $_ !~ /(Bandwidth|MSTCPLoopback)$/ , @cols); # exclude +ones ending with "Bandwidth" or "MSTCPLoopback"
    re: your code, note that delete is for hashes .. if you wanted to modify your code, you would use the slice function, but grep is much more powerful and perl-ish.
Re^5: Pre-process csv files before using
by sk (Curate) on Aug 06, 2005 at 20:34 UTC
    I am not going to do Re:Re:Re now as it makes it hard to read. This reply is for this node Re^8: Pre-process csv files before using

    Could you please wrap your column header info inside code tags? Long lines don't wrap otherwise. Thanks!

    #!/usr/bin/perl use strict; use warnings; my $str = <DATA>; my @origcols = split /,/,$str; my @cols = (); foreach (@origcols) { $_ =~ s/\.//g; push ( @cols , $_ ) unless /Bandwidth.*|MSTCPLoop.*/ ; } map {print $_,$/ } @cols; __DATA__ PDHCSV40EasternDaylightTime.240,ERWW.COMMUNITIES.MemoryPagesPER.sec,ER +WWCOMMUNITIESNetwor kInterfaceEthernetAdapterModuleBytesTotalPERsec,ERWWCOMMUNITIESNetwork +InterfaceEthernetAd apterCurrentBandwidth

    Output

    PDHCSV40EasternDaylightTime240 ERWWCOMMUNITIESMemoryPagesPERsec ERWWCOMMUNITIESNetworkInterfaceEthernetAdapterModuleBytesTotalPERsec

    as you could see the periods are gone and no warnings. Also the column with bandwidth is not listd. Can you make sure your origcols array is correctly populated?

    -SK

      I put the header data in a code tag, but it didn't get wrapped. Corion and I spoke about it.

      @origcols is the problem. I am getting that from reading my csv file. I can't hardcode it since the column headers are not constants. I tried to do the following:
      #!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.*|MSTCPLoop.*/ ; } print ("@cols \n");

      It errors out with:
      Use of uninitialized value in join or string at ./runsql3.sql line 19.
      Use of uninitialized value in join or string at ./runsql3.sql line 23.

      It also prints out the column headers except the one with the period in it. My guess is that there is an error happening with the SQL SELECT statement. I don't think it likes the period.

      Just to clear things up, here is all my code:
      #!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.*|MSTCPLoop.*/ ; }; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my @avgs = $dbh->selectrow_array($avgSQL); my %avgsHash; @avgsHash{ @cols } = @avgs; print "Col,Avg\n"; printf("%s,%f\n", $_, $avgsHash{$_}) for @cols;
      Everything works except when there is a . in one of the headers. Then it fails.

      Data in the columns that I am averaging can contain decimals, otherwise I'd just strip out all periods before starting.
      I'm trying to add the slpit statement, but am not sure where/how to use it.
        I have not used DBI::CSV to comment on issues with "." in names. It seems logical that it will break coz of dots as . is typically not a valid name charcater.

        Anyways you need to do an inplace edit of your CSV file before you pass it on to DBI::CSV. The code below demonstrates how to do an inplace edit of your file. This will create a backup of the file (mydata.csv.bak) but you can set the $^I flag to "" to pure inplace effect.

        WARNING: In Place edits are dangerous. Test out the code on a sample file before you run it on the original file

        { local $^I = ".bak"; local *ARGV; @ARGV = "mydata.csv"; while (<>) { s/\.//g if ($. == 1); print; } }
        Input
        PDHCSV40EasternDaylightTime.240,ERWWCOMMUNITIESMemory.PagesPERsec,ERWW +COMMUNITIESNetworkI nterfaceEthernetAdapterModuleBytesTotalPERsec,ERWW.COMMUNITIESNetworkI +nterface.EthernetAd apterCurrentBandwidth 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13

        Note the header - I have threw in some dots there

        After I run the program my mydata.csv becomes this -

        PDHCSV40EasternDaylightTime240,ERWWCOMMUNITIESMemoryPagesPERsec,ERWWCO +MMUNITIESNetworkInt erfaceEthernetAdapterModuleBytesTotalPERsec,ERWWCOMMUNITIESNetworkInte +rfaceEthernetAdapte rCurrentBandwidth 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13

        The dots in the header are gone but the ones in the data still remain. Now your CSV file is ready for DBI.

        Hope this helps!

        cheers

        SK

Re^5: Pre-process csv files before using
by sk (Curate) on Aug 06, 2005 at 19:20 UTC
    The code I gave you above should do what you are looking for.

    perl -e'@cols =qw (field1 fieldband field3 fieldMSTCP okfield5); for ( +@cols) { push(@req,$_) unless /band.*|MSTCP.*/;} print join (",",@req +),$/;' __END__ field1,field3,okfield5

    Here i am excluding names with band and MSTCP field names. Since I don't have any field names I am just hard coding it here

    . -SK

      I'm getting an odd error with this
      shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//; push ( @cols , $_ ) unless /Bandwidth.*|MSTCPLoop.*/ ; }

      It says:
      Use of uninitialized value in substitution (s///) at ./runsql3.sql lin +e 20. Use of uninitialized value in pattern match (m//) at ./runsql3.sql lin +e 21.
      Line 20 is my "s/\.//" line
        I would check if there are proper values in your @origcols array.

        Do something like this before you get into the loop to modify it

        map { print $_,$/ } @origcols;
        If you see proper output then need to look at other places. Example of your header data will be helpful!

        Actually davidrw's grep solution is much cleaner!