http://www.perlmonks.org?node_id=996485


in reply to This looks like whitespace in my CSV but doesn't seem to be

This one way to delete leading and trailing spaces from CSV input.
#!/usr/bin/perl -w use strict; use Text::CSV; use Data::Dumper; # a core module use Data::Dump qw(pp); open my $io, '<', \q{"My Company","Gavin","Henry ","ghenry@ghenry.co.uk"," 1.00"}; my $csv = Text::CSV->new(); # nothing "fancy" needed here # defaults are fine while ( my $row = $csv->getline($io) ) { my @fields =map { s/^\s*//; #delete leading spaces s/\s*$//; #delete trailing spaces $_; #this is the "return value" from map #map returns the value of the last #statement - without this $_; you #get the zero or non-zero scalar value #of the last s/// statement } @$row; print "Using pp from Data::Dump...\n"; print pp \@fields; print "\n"; print "Using Dumper from Data::Dumper...\n"; print Dumper \@fields; print "\n"; print "5th field is $fields[4]\n"; } __END__ Prints: Using pp from Data::Dump... ["My Company", "Gavin", "Henry", "ghenry\@ghenry.co.uk", "1.00"] Using Dumper from Data::Dumper... $VAR1 = [ 'My Company', 'Gavin', 'Henry', 'ghenry@ghenry.co.uk', '1.00' ]; 5th field is 1.00

Replies are listed 'Best First'.
Re^2: This looks like whitespace in my CSV but doesn't seem to be
by Tux (Canon) on Sep 30, 2012 at 18:00 UTC

    Text:CSV_XS has a native builtin way to delete trailing and leading whitespace:

    my $csv= text::CSV_XS->new ({ binary => 1, allow_whitespace => 1, auto +_diag => 1 });

    But that would not help in this case for two reasons

    • It does not strip whitespace inside quotation, but only surrounding sep_char:
      allow_whitespace When this option is set to true, whitespace (TAB's and SPAC +E's) surrounding the separation character is removed when parsin +g. If either TAB or SPACE is one of the three major characters "sep_char", "quote_char", or "escape_char" it will not be considered whitespace.
    • Thee whitespace stripped is only space or TAB's, and not non-breaking Unicode space stuff

    Spreadsheet::Read however offers to strip leading and trailing whitespace from every field. I could extend that on request to allow it to strip Unicode whitespace too.

    strip If set, "ReadData ()" will remove trailing- and/or leading- whitespace from every field. strip leading strailing ----- ------- --------- 0 n/a n/a 1 strip n/a 2 n/a strip 3 strip strip

    Enjoy, Have FUN! H.Merijn
      This sounds just fine.

      I haven't worked with any CSV files with leading spaces. That idea appears to be uncommon. But your suggestion sounds good.

Re^2: This looks like whitespace in my CSV but doesn't seem to be
by Anonymous Monk on Sep 30, 2012 at 09:47 UTC