tx2010 has asked for the wisdom of the Perl Monks concerning the following question:
Hello, wise ones- I'm hoping you can help with what seems like it *should* be a rather simple issue.
I have a text file of comma separated key:value pairs, like this:
"evt":"Login","time":"now","msg":"Login success, welcome back!"
I have been trying to use Text::CSV_XS to parse this twice- once to split the line into "key":"value" pairs, and a second time to split these into a hash.
My issue is that CSV_XS seems to choke. Here's my test snippet:
use Text::CSV_XS;
use strict;
use warnings;
my $csvfile = shift or die "No filename specified";
my $csv = Text::CSV_XS->new( { 'quote_char' => '"',
'sep_char' => ":",
'binary' => 1, } );
my @columns;
open(FILE, $csvfile) or die "Can't open $csvfile: $!";
while (<FILE>) {
$csv->parse($_) or die "parse() failed: " . $csv->error_input(
+);
my @data = $csv->fields();
for my $i (0..$#data) {
push @{$columns[$i]}, $data[$i];
}
}
close(FILE);
my %hash = map {shift @$_ => $_} @columns;
use Data::Dumper;
print Dumper(\%hash);
# output:
parse() failed: "evt":"Login","time":"now","msg":"Login success, welcome back!"
I can't figure out what is going on, since I've done this a million times.
The only difference here is that my data isn't usually in "quoted" pairs, but that should make it easier!
Thanks for any help!
Re: Issue parsing CSV into hashes?
by dasgar (Priest) on Sep 21, 2010 at 17:10 UTC
|
Either I'm missing something in your code or your code is missing a step. Conceptually, here's how I'd approach the task.
- Process file by line.
- For each line, parse the line to get the key/value pairs.
- For each key/value pair, parse to separate the key and the value.
- Add the key and value pair to the hash.
I think that I see steps 1, 3, and 4 in your code, but I don't see step 2. That might not be the source of your issue, but it might help.
| [reply] |
|
thanks, you are correct and that's how I'm attacking it. This was just to show the failure on a single line, without even bothering with key:value pairs
| [reply] |
Re: Issue parsing CSV into hashes?
by stephen (Priest) on Sep 21, 2010 at 18:02 UTC
|
my $csv = Text::CSV_XS->new( {
'allow_loose_quotes' => 1,
} );
and it should parse.
Resetting 'sep_char' to ':' won't work, because that will make CSV split on ':' characters, and you want it to split on commas.
I suspect that for what you're doing, you're not actually getting any value from Text::CSV_XS. I think you'd do better just to use split(), but that's up to you.
| [reply] [d/l] |
Re: Issue parsing CSV into hashes?
by BioLion (Curate) on Sep 21, 2010 at 17:37 UTC
|
I agree with dasgar - you are only doing one split! Either do it as dasgar suggests, or keep it simple and just use split:
DB<5> $s = q/"foo:bar","test:boing!","whirrr:clunk"/;
DB<6> %hsh = split /[:,]/, $s; ## do both the required splits in one
+ step
DB<7> x %hsh
0 '"whirrr'
1 'clunk"'
2 '"foo'
3 'bar"'
4 '"test'
5 'boing!"'
Admittedly this test will be caught out by delimiters inside the quotes and spaces between the key/val pairs, but it is a start as to how this problem could be solved... TIMTOWTDI!
Just a something something...
| [reply] [d/l] |
Re: Issue parsing CSV into hashes?
by johngg (Canon) on Sep 21, 2010 at 19:14 UTC
|
Just using split as suggested by others. Potentially rather fragile but this works for your example text. Some form of parsing solution would be more robust, especially if you have to cope with spaces around delimiters or escaped embedded quotes.
knoppix@Microknoppix:~$ perl -E '
> $str = q{"evt":"Login","time":"now","msg":"Login success, welcome ba
+ck!"};
> %hash =
> map { split m{:} }
> split m{(?<="),(?=")}, $str;
> say qq{$_ => $hash{ $_ }} for keys %hash;'
perl: warning: Setting locale failed.
"msg" => "Login success, welcome back!"
"evt" => "Login"
"time" => "now"
knoppix@Microknoppix:~$
I hope this is helpful.
| [reply] [d/l] |
Re: Issue parsing CSV into hashes?
by Gangabass (Vicar) on Sep 22, 2010 at 10:28 UTC
|
Without @columns (look at allow_loose_quotes, escape_char and especially at sep_char)
use Text::CSV_XS;
use strict;
use warnings;
my $csvfile = shift or die "No filename specified";
my $csv = Text::CSV_XS->new( { quote_char => '"',
sep_char => ",",
binary => 1,
allow_loose_quotes => 1,
escape_char => '\\'} );
my @columns;
open(FILE, $csvfile) or die "Can't open $csvfile: $!";
my %hash;
while (<FILE>) {
$csv->parse($_) or die "parse() failed: " . $csv->error_input(
+);
my @data = $csv->fields();
foreach my $pair (@data) {
my ( $key, $value ) = split /:/, $pair;
$hash{$key} = $value;
}
}
close(FILE);
use Data::Dumper;
print Dumper(\%hash);
| [reply] [d/l] [select] |
Re: Issue parsing CSV into hashes?
by Tux (Canon) on Sep 22, 2010 at 12:36 UTC
|
Rule 1: thou shallt not read the lines yourself! When you read with the diamond operator, (embedded) line separation gets lost. So instead of:
my $csv = Text::CSV_XS->new ({
quote_char => '"',
sep_char => ":",
binary => 1,
});
my @columns;
open (FILE, $csvfile) or die "Can't open $csvfile: $!";
while (<FILE>) {
$csv->parse ($_) or die "parse() failed: " . $csv->error_input ();
my @data = $csv->fields ();
for my $i (0..$#data) {
push @{$columns[$i]}, $data[$i];
}
Use:
my $csv = Text::CSV_XS->new ({
binary => 1,
auto_diag => 1,
allow_loose_quotes => 1,
});
my @columns;
open my $fh, "<", $file or die "$file: $!";
while (my $row = $csv->getline ($fh)) {
for (@$row) {
my ($key, $value) = split m/"?:"?/, $_, 2;
# ...
}
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
Re: Issue parsing CSV into hashes?
by ikegami (Patriarch) on Sep 27, 2010 at 20:45 UTC
|
CSV files are two dimensional (lines of fields). Your data is also two dimensional. To parse your data as a CSV file, each hash element would need to be parsed as a line.
use strict;
use warnings;
use Data::Dumper qw( Dumper );
use Text::CSV_XS 0.74; # eol bug fix.
my $csv = Text::CSV_XS->new({
binary => 1,
sep_char => ':',
eol => ',',
});
while (<DATA>) {
chomp;
my %h;
open(my $fh, '<', \$_) or die;
while (my $row = $csv->getline($fh)) {
$h{ $row->[0] } = $row->[1];
}
$csv->eof
or $csv->error_diag();
print(Dumper(\%h));
}
__DATA__
"evt":"Login","time":"now","msg":"Login success, welcome back!"
$VAR1 = {
'msg' => 'Login success, welcome back!',
'time' => 'now',
'evt' => 'Login'
};
One catch: eol doesn't work for anything but "\n" in 0.73, and 0.74 isn't out yet. (The bug has been fixed, but a release hasn't been created yet.)
| [reply] [d/l] [select] |
|
One catch: eol doesn't work for anything but "\n" in 0.73, and 0.74 isn't out yet. (The bug has been fixed, but a release hasn't been created yet.)
Almost true. It didn't work with eol's not having a trailing \r or \n, which was caused by the underlying implementation that used perl's internal getline () mechanism without modifying $/ locally.
I plan to release version 0.74 this week, some documentation changes are pending. The _PP counterpart is also ready as we speak.
ikegami, you should start using the auto_diag attribute :)
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |
|
|