Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Issue parsing CSV into hashes?

by tx2010 (Novice)
on Sep 21, 2010 at 16:45 UTC ( [id://861113]=perlquestion: print w/replies, xml ) Need Help??

tx2010 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, wise ones- I'm hoping you can help with what seems like it *should* be a rather simple issue. I have a text file of comma separated key:value pairs, like this: "evt":"Login","time":"now","msg":"Login success, welcome back!" I have been trying to use Text::CSV_XS to parse this twice- once to split the line into "key":"value" pairs, and a second time to split these into a hash. My issue is that CSV_XS seems to choke. Here's my test snippet:
use Text::CSV_XS; use strict; use warnings; my $csvfile = shift or die "No filename specified"; my $csv = Text::CSV_XS->new( { 'quote_char' => '"', 'sep_char' => ":", 'binary' => 1, } ); my @columns; open(FILE, $csvfile) or die "Can't open $csvfile: $!"; while (<FILE>) { $csv->parse($_) or die "parse() failed: " . $csv->error_input( +); my @data = $csv->fields(); for my $i (0..$#data) { push @{$columns[$i]}, $data[$i]; } } close(FILE); my %hash = map {shift @$_ => $_} @columns; use Data::Dumper; print Dumper(\%hash);
# output: parse() failed: "evt":"Login","time":"now","msg":"Login success, welcome back!" I can't figure out what is going on, since I've done this a million times. The only difference here is that my data isn't usually in "quoted" pairs, but that should make it easier! Thanks for any help!

Replies are listed 'Best First'.
Re: Issue parsing CSV into hashes?
by dasgar (Priest) on Sep 21, 2010 at 17:10 UTC

    Either I'm missing something in your code or your code is missing a step. Conceptually, here's how I'd approach the task.

    1. Process file by line.
    2. For each line, parse the line to get the key/value pairs.
    3. For each key/value pair, parse to separate the key and the value.
    4. Add the key and value pair to the hash.

    I think that I see steps 1, 3, and 4 in your code, but I don't see step 2. That might not be the source of your issue, but it might help.

      thanks, you are correct and that's how I'm attacking it. This was just to show the failure on a single line, without even bothering with key:value pairs
Re: Issue parsing CSV into hashes?
by stephen (Priest) on Sep 21, 2010 at 18:02 UTC
    The parse is failing because of the quotation marks. From the manual: "Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes."

    Change your initialization to:

    my $csv = Text::CSV_XS->new( { 'allow_loose_quotes' => 1, } );
    and it should parse.

    Resetting 'sep_char' to ':' won't work, because that will make CSV split on ':' characters, and you want it to split on commas.

    I suspect that for what you're doing, you're not actually getting any value from Text::CSV_XS. I think you'd do better just to use split(), but that's up to you.

    stephen

Re: Issue parsing CSV into hashes?
by BioLion (Curate) on Sep 21, 2010 at 17:37 UTC

    I agree with dasgar - you are only doing one split! Either do it as dasgar suggests, or keep it simple and just use split:

    DB<5> $s = q/"foo:bar","test:boing!","whirrr:clunk"/; DB<6> %hsh = split /[:,]/, $s; ## do both the required splits in one + step DB<7> x %hsh 0 '"whirrr' 1 'clunk"' 2 '"foo' 3 'bar"' 4 '"test' 5 'boing!"'

    Admittedly this test will be caught out by delimiters inside the quotes and spaces between the key/val pairs, but it is a start as to how this problem could be solved... TIMTOWTDI!

    Just a something something...
Re: Issue parsing CSV into hashes?
by johngg (Canon) on Sep 21, 2010 at 19:14 UTC

    Just using split as suggested by others. Potentially rather fragile but this works for your example text. Some form of parsing solution would be more robust, especially if you have to cope with spaces around delimiters or escaped embedded quotes.

    knoppix@Microknoppix:~$ perl -E ' > $str = q{"evt":"Login","time":"now","msg":"Login success, welcome ba +ck!"}; > %hash = > map { split m{:} } > split m{(?<="),(?=")}, $str; > say qq{$_ => $hash{ $_ }} for keys %hash;' perl: warning: Setting locale failed. "msg" => "Login success, welcome back!" "evt" => "Login" "time" => "now" knoppix@Microknoppix:~$

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Issue parsing CSV into hashes?
by Gangabass (Vicar) on Sep 22, 2010 at 10:28 UTC

    Without @columns (look at allow_loose_quotes, escape_char and especially at sep_char)

    use Text::CSV_XS; use strict; use warnings; my $csvfile = shift or die "No filename specified"; my $csv = Text::CSV_XS->new( { quote_char => '"', sep_char => ",", binary => 1, allow_loose_quotes => 1, escape_char => '\\'} ); my @columns; open(FILE, $csvfile) or die "Can't open $csvfile: $!"; my %hash; while (<FILE>) { $csv->parse($_) or die "parse() failed: " . $csv->error_input( +); my @data = $csv->fields(); foreach my $pair (@data) { my ( $key, $value ) = split /:/, $pair; $hash{$key} = $value; } } close(FILE); use Data::Dumper; print Dumper(\%hash);
Re: Issue parsing CSV into hashes?
by Tux (Canon) on Sep 22, 2010 at 12:36 UTC

    Rule 1: thou shallt not read the lines yourself! When you read with the diamond operator, (embedded) line separation gets lost. So instead of:

    my $csv = Text::CSV_XS->new ({ quote_char => '"', sep_char => ":", binary => 1, }); my @columns; open (FILE, $csvfile) or die "Can't open $csvfile: $!"; while (<FILE>) { $csv->parse ($_) or die "parse() failed: " . $csv->error_input (); my @data = $csv->fields (); for my $i (0..$#data) { push @{$columns[$i]}, $data[$i]; }

    Use:

    my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, allow_loose_quotes => 1, }); my @columns; open my $fh, "<", $file or die "$file: $!"; while (my $row = $csv->getline ($fh)) { for (@$row) { my ($key, $value) = split m/"?:"?/, $_, 2; # ... }

    Enjoy, Have FUN! H.Merijn
Re: Issue parsing CSV into hashes?
by ikegami (Patriarch) on Sep 27, 2010 at 20:45 UTC

    CSV files are two dimensional (lines of fields). Your data is also two dimensional. To parse your data as a CSV file, each hash element would need to be parsed as a line.

    use strict; use warnings; use Data::Dumper qw( Dumper ); use Text::CSV_XS 0.74; # eol bug fix. my $csv = Text::CSV_XS->new({ binary => 1, sep_char => ':', eol => ',', }); while (<DATA>) { chomp; my %h; open(my $fh, '<', \$_) or die; while (my $row = $csv->getline($fh)) { $h{ $row->[0] } = $row->[1]; } $csv->eof or $csv->error_diag(); print(Dumper(\%h)); } __DATA__ "evt":"Login","time":"now","msg":"Login success, welcome back!"
    $VAR1 = { 'msg' => 'Login success, welcome back!', 'time' => 'now', 'evt' => 'Login' };

    One catch: eol doesn't work for anything but "\n" in 0.73, and 0.74 isn't out yet. (The bug has been fixed, but a release hasn't been created yet.)

         One catch: eol doesn't work for anything but "\n" in 0.73, and 0.74 isn't out yet. (The bug has been fixed, but a release hasn't been created yet.)

      Almost true. It didn't work with eol's not having a trailing \r or \n, which was caused by the underlying implementation that used perl's internal getline () mechanism without modifying $/ locally.

      I plan to release version 0.74 this week, some documentation changes are pending. The _PP counterpart is also ready as we speak.

      ikegami, you should start using the auto_diag attribute :)


      Enjoy, Have FUN! H.Merijn
        auto_diag: nice.

        Perl doesn't know or care about "\r", so it must be a special case. I tested it, revealing that "\r" was broken and still is. I reopened the ticket.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://861113]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-19 16:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found