Re^4: Pre-process csv files before using

in reply to Re^3: Pre-process csv files before using
in thread Pre-process csv files before using

The problem is that I don't nessessarily know what all the column names are. I know that any column that ends with "Bandwidth" or has "MSTCPLoopback" in it I don't want.

I'm trying to approach this from:

shift @cols; #Remove the first column
for (@cols) {
   delete $cols[$regex here]
   }
[download]

Then start my sql stuff.

Comment on Re^4: Pre-process csv files before using Download Code

Replies are listed 'Best First'.
Re^5: Pre-process csv files before using by davidrw (Prior) on Aug 06, 2005 at 19:26 UTC
You can just grep out the ones you want. `my @cols = ...; shift @cols; # blindy throw away first column @cols = grep( $_ !~ /(Bandwidth\|MSTCPLoopback)$/ , @cols); # exclude +ones ending with "Bandwidth" or "MSTCPLoopback"` [download] re: your code, note that `delete` is for hashes .. if you wanted to modify your code, you would use the `slice` function, but `grep` is much more powerful and perl-ish.	[reply] [d/l] [select]
Re^5: Pre-process csv files before using by sk (Curate) on Aug 06, 2005 at 20:34 UTC
I am not going to do Re:Re:Re now as it makes it hard to read. This reply is for this node Re^8: Pre-process csv files before using Could you please wrap your column header info inside code tags? Long lines don't wrap otherwise. Thanks! `#!/usr/bin/perl use strict; use warnings; my $str = <DATA>; my @origcols = split /,/,$str; my @cols = (); foreach (@origcols) { $_ =~ s/\.//g; push ( @cols , $_ ) unless /Bandwidth.\|MSTCPLoop./ ; } map {print $_,$/ } @cols; __DATA__ PDHCSV40EasternDaylightTime.240,ERWW.COMMUNITIES.MemoryPagesPER.sec,ER +WWCOMMUNITIESNetwor kInterfaceEthernetAdapterModuleBytesTotalPERsec,ERWWCOMMUNITIESNetwork +InterfaceEthernetAd apterCurrentBandwidth` [download] Output `PDHCSV40EasternDaylightTime240 ERWWCOMMUNITIESMemoryPagesPERsec ERWWCOMMUNITIESNetworkInterfaceEthernetAdapterModuleBytesTotalPERsec` [download] as you could see the periods are gone and no warnings. Also the column with bandwidth is not listd. Can you make sure your origcols array is correctly populated? -SK	[reply] [d/l] [select]
Re^6: Pre-process csv files before using by DrAxeman (Scribe) on Aug 06, 2005 at 21:46 UTC
I put the header data in a code tag, but it didn't get wrapped. Corion and I spoke about it. @origcols is the problem. I am getting that from reading my csv file. I can't hardcode it since the column headers are not constants. I tried to do the following: #!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.\|MSTCPLoop./ ; } print ("@cols \n"); [download] It errors out with: Use of uninitialized value in join or string at ./runsql3.sql line 19. Use of uninitialized value in join or string at ./runsql3.sql line 23. It also prints out the column headers except the one with the period in it. My guess is that there is an error happening with the SQL SELECT statement. I don't think it likes the period. Just to clear things up, here is all my code: #!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.\|MSTCPLoop./ ; }; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my @avgs = $dbh->selectrow_array($avgSQL); my %avgsHash; @avgsHash{ @cols } = @avgs; print "Col,Avg\n"; printf("%s,%f\n", $_, $avgsHash{$_}) for @cols; [download] Everything works except when there is a . in one of the headers. Then it fails. Data in the columns that I am averaging can contain decimals, otherwise I'd just strip out all periods before starting. I'm trying to add the slpit statement, but am not sure where/how to use it.	[reply] [d/l] [select]
Re^7: Pre-process csv files before using by sk (Curate) on Aug 06, 2005 at 23:20 UTC
I have not used DBI::CSV to comment on issues with "." in names. It seems logical that it will break coz of dots as . is typically not a valid name charcater. Anyways you need to do an inplace edit of your CSV file before you pass it on to DBI::CSV. The code below demonstrates how to do an inplace edit of your file. This will create a backup of the file (mydata.csv.bak) but you can set the `$^I` flag to "" to pure inplace effect. WARNING: In Place edits are dangerous. Test out the code on a sample file before you run it on the original file `{ local $^I = ".bak"; local ARGV; @ARGV = "mydata.csv"; while (<>) { s/\.//g if ($. == 1); print; } }` [download] Input* `PDHCSV40EasternDaylightTime.240,ERWWCOMMUNITIESMemory.PagesPERsec,ERWW +COMMUNITIESNetworkI nterfaceEthernetAdapterModuleBytesTotalPERsec,ERWW.COMMUNITIESNetworkI +nterface.EthernetAd apterCurrentBandwidth 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13` [download] Note the header - I have threw in some dots there After I run the program my mydata.csv becomes this - `PDHCSV40EasternDaylightTime240,ERWWCOMMUNITIESMemoryPagesPERsec,ERWWCO +MMUNITIESNetworkInt erfaceEthernetAdapterModuleBytesTotalPERsec,ERWWCOMMUNITIESNetworkInte +rfaceEthernetAdapte rCurrentBandwidth 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13` [download] The dots in the header are gone but the ones in the data still remain. Now your CSV file is ready for DBI. Hope this helps! cheers SK	[reply] [d/l] [select]
Re^8: Pre-process csv files before using by DrAxeman (Scribe) on Aug 07, 2005 at 02:00 UTC
Re^8: Pre-process csv files before using by DrAxeman (Scribe) on Aug 07, 2005 at 02:44 UTC
Re^9: Pre-process csv files before using by sk (Curate) on Aug 07, 2005 at 03:47 UTC
Re^5: Pre-process csv files before using by sk (Curate) on Aug 06, 2005 at 19:20 UTC
The code I gave you above should do what you are looking for. `perl -e'@cols =qw (field1 fieldband field3 fieldMSTCP okfield5); for ( +@cols) { push(@req,$_) unless /band.\|MSTCP./;} print join (",",@req +),$/;' __END__ field1,field3,okfield5` [download] Here i am excluding names with band and MSTCP field names. Since I don't have any field names I am just hard coding it here . -SK	[reply] [d/l]
Re^6: Pre-process csv files before using by DrAxeman (Scribe) on Aug 06, 2005 at 20:03 UTC
I'm getting an odd error with this `shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//; push ( @cols , $_ ) unless /Bandwidth.\|MSTCPLoop./ ; }` [download] It says: `Use of uninitialized value in substitution (s///) at ./runsql3.sql lin +e 20. Use of uninitialized value in pattern match (m//) at ./runsql3.sql lin +e 21.` [download] Line 20 is my "s/\.//" line	[reply] [d/l] [select]
Re^7: Pre-process csv files before using by sk (Curate) on Aug 06, 2005 at 20:11 UTC
I would check if there are proper values in your `@origcols` array. Do something like this before you get into the loop to modify it `map { print $_,$/ } @origcols;` [download] If you see proper output then need to look at other places. Example of your header data will be helpful! Actually davidrw's grep solution is much cleaner!	[reply] [d/l] [select]
Re^8: Pre-process csv files before using by DrAxeman (Scribe) on Aug 06, 2005 at 20:22 UTC

In Section Seekers of Perl Wisdom