in reply to Re: Multiple double quotes within csv in thread Multiple double quotes within csv
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ',' });
while (my $line = <DATA>)
if ($csv->parse($line)) {
my @fields = $csv->fields();
print
print "$fields[0],";
print "$fields[1],";
print "$fields[2]\n";
} else {
warn "Line could not be parsed: $line\n";
}
}
__DATA__
0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
I was hoping to get the output
0,"Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
but also acceptable as
0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
Re^3: Multiple double quotes within csv
by Tux (Canon) on May 13, 2017 at 07:29 UTC
|
use Text::CSV;
my $csv = Text::CSV->new ({
auto_diag => 1,
allow_loose_quotes => 1, # optional, also works without this attr
+ibute
allow_loose_escapes => 1,
});
while (my $row = $csv->getline (*DATA)) {
say for @$row;
}
__END__
0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
will produce
0
"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052
9
| [reply] [d/l] [select] |
|
I tested this and verified that:
my $csv = Text::CSV_XS->new({allow_loose_escapes => 1,});
parses the OP's example correctly with my code using Text::CSV_XS.
Input Line:
8,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
Output Line:
8|"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9
Apparently although the OP's CSV line is technically incorrect according to the CSV spec, some smart folks who wrote Text::CSV anticipated this and have an option for it.
Thank you Tux!
As a comment: I always use the pipe, | character when I generate "CSV" files. Using a different character than "comma" is allowed by the spec. What is shown as the "output" line above, would be my "input" line with no need to use a complex module to parse things (in most, but not all cases). Of course
we have to deal with what we get from others.. such is the nature of the beast. | [reply] [d/l] [select] |
|
Thank you to all responders. As noted, we have to deal with data we are given. All help is much appreciated and with a couple changes it appears that parsing of all data is successful.
Again, thanx.
| [reply] |
|
Your line 4 from your experiment set however still cannot be parsed, it throws the error 2027 "EIQ - Quoted field not terminated".
| [reply] [d/l] |
|
Re^3: Multiple double quotes within csv
by NetWallah (Canon) on May 13, 2017 at 00:46 UTC
|
Your code does not compile.
Here is some Working code, and the output it produces:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ',', quote_char => undef , esc
+ape_char=>undef });
while (my $line = <DATA>){
if ($csv->parse($line)) {
my @fields = $csv->fields();
print "$fields[0],";
print "$fields[1],";
print "$fields[2]\n";
} else {
warn "Line could not be parsed: '$line'\n";
my ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
print "DIAG:(CDE=$cde, STR=$str, POS=$pos, REC=$rec, FLD=$fld)\n
+"
}
}
__DATA__
0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
>perl test2.pl
0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
The quotes at the start of Rat Control are problematic, and produce this error on default settings:
DIAG:(CDE=2023, STR=EIQ - QUO character not allowed, POS=4, REC=1, FLD=2)
...Disinformation is not as good as datinformation.
Don't document the program; program the document.
| [reply] [d/l] [select] |
|
#!/usr/bin/perl
use strict;
use warnings;
# OP's code from question at #1190163
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ',' });
while (my $line = <DATA>) { # added open curly
if ($csv->parse($line)) {
(my @fields) = $csv->fields(); # enclosed my @fields in () s
+o $csv does not mask earlier
print
print "$fields[0],";
print "$fields[1],";
print "$fields[2]\n";
} else {
warn "Line could not be parsed: $line\n";
}
}
__DATA__
0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
If what you posted -- complete with strict and warnings -- that your attempt to run your code should have at least hinted at what was wrong. Adding use diagnostics (or perhaps use diagnostics -verbose or filtering your program thru splain would have at least allowed you to post code without errors... something the Monks hold to be an indicator that you're serious about learning rather than merely using us for human debuggers.
As to the underlying problems, attend carefully to Marshall's thorough examination and exposition ... and join him in thanks to Tux for the module.
| [reply] [d/l] [select] |
|
Don't set quote_char and escape_char to undef, as that will cause any field that contains a sep_char to beak your data. For the question at hand, options like allow_loose_quotes and allow_loose_escapes are usually the way to go. In ths particular cae, setting escape_char to undef might work, but never set quote_char to undef to "fix" these kind of situations.
| [reply] [d/l] [select] |
|
Thank you (++)> I wasn't familiar with Text::CSV, and unaware of "allow_loose_quotes".
This site is always a source of enlightenment! It has kept me returning for over 12 years.
...Disinformation is not as good as datinformation.
Don't document the program; program the document.
| [reply] |
Re^3: Multiple double quotes within csv
by Marshall (Canon) on May 13, 2017 at 01:22 UTC
|
Update: I think I'm closer:
RFC-4180, paragraph "If double-quotes are used to enclose fields,
then a double-quote appearing inside a field must be escaped by
preceding it with another double quote."
So:
2,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
is malformed, incorrect CSV, this should be:
2,"""Rat Control"" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
I made some experiments. Here are my (updated) results:
#!/usr/bin/perl
use strict;
use warnings;
$|=1; ## turn off buffering for STDOUT
use Text::CSV_XS qw( csv );
my $csv = Text::CSV_XS->new(); #using the defaults
while (my $line = <DATA>)
{
if ($csv->parse($line)) {
my @fields = $csv->fields();
print join ("|",@fields),"\n";
} else {
warn "Line could not be parsed: $line\n";
}
}
=Prints:
1|Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9
2|"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9
3|Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9
Line could not be parsed: 4,"Rat Control" <sip:+15559999999@192.168 .5
+.233>;tag=gK004bb052,9
5|123,456|abc
6|Rat|xyz
7|Rat Control|xyz
Line could not be parsed: 8,""Rat Control" <sip:+15559999999@192.168 .
+5.233>;tag=gK004bb052",9
=cut
__DATA__
1,Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9
2,"""Rat Control"" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
3,Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9
4,"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9
5,"123,456",abc
6,"Rat",xyz
7,"Rat Control",xyz
8,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
I do not understand why Line 4 which starts with unnecessary quotes is not parsed? update: But could be that the double quotes must apply to the whole field and therefore the syntax in line 2 must be used.See Line 5 which has an embedded comma and requires the quotes and is parsed correctly. See Lines 6, 7. I don't think the starting quotes are the issue, it appears that other "special" characters in Field2 are causing the problem. | [reply] [d/l] [select] |
Re^3: Multiple double quotes within csv
by Anonymous Monk on May 13, 2017 at 01:37 UTC
|
binary => 1,
allow_loose_quotes => 1,
blank_is_undef => 1,
escape_char => undef,
| [reply] [d/l] |
|
|