Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Multiple double quotes within csv

by mhooper (Novice)
on May 13, 2017 at 00:22 UTC ( [id://1190172]=note: print w/replies, xml ) Need Help??


in reply to Re: Multiple double quotes within csv
in thread Multiple double quotes within csv

#!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new({ sep_char => ',' }); while (my $line = <DATA>) if ($csv->parse($line)) { my @fields = $csv->fields(); print print "$fields[0],"; print "$fields[1],"; print "$fields[2]\n"; } else { warn "Line could not be parsed: $line\n"; } } __DATA__ 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
I was hoping to get the output 0,"Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9 but also acceptable as 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9

Replies are listed 'Best First'.
Re^3: Multiple double quotes within csv
by Tux (Canon) on May 13, 2017 at 07:29 UTC
    use Text::CSV; my $csv = Text::CSV->new ({ auto_diag => 1, allow_loose_quotes => 1, # optional, also works without this attr +ibute allow_loose_escapes => 1, }); while (my $row = $csv->getline (*DATA)) { say for @$row; } __END__ 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9

    will produce

    0 "Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052 9
      I tested this and verified that:
      my $csv = Text::CSV_XS->new({allow_loose_escapes => 1,});
      parses the OP's example correctly with my code using Text::CSV_XS.
      Input Line: 8,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9 Output Line: 8|"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9
      Apparently although the OP's CSV line is technically incorrect according to the CSV spec, some smart folks who wrote Text::CSV anticipated this and have an option for it.

      Thank you Tux!

      As a comment: I always use the pipe, | character when I generate "CSV" files.
      Using a different character than "comma" is allowed by the spec. What is shown as the "output" line above, would be my "input" line with no need to use a complex module to parse things (in most, but not all cases). Of course we have to deal with what we get from others.. such is the nature of the beast.

        Thank you to all responders. As noted, we have to deal with data we are given. All help is much appreciated and with a couple changes it appears that parsing of all data is successful. Again, thanx.
        Your line 4 from your experiment set however still cannot be parsed, it throws the error 2027 "EIQ - Quoted field not terminated".
Re^3: Multiple double quotes within csv
by NetWallah (Canon) on May 13, 2017 at 00:46 UTC
    Your code does not compile.

    Here is some Working code, and the output it produces:

    #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new({ sep_char => ',', quote_char => undef , esc +ape_char=>undef }); while (my $line = <DATA>){ if ($csv->parse($line)) { my @fields = $csv->fields(); print "$fields[0],"; print "$fields[1],"; print "$fields[2]\n"; } else { warn "Line could not be parsed: '$line'\n"; my ($cde, $str, $pos, $rec, $fld) = $csv->error_diag (); print "DIAG:(CDE=$cde, STR=$str, POS=$pos, REC=$rec, FLD=$fld)\n +" } } __DATA__ 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
    >perl test2.pl 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
    The quotes at the start of Rat Control are problematic, and produce this error on default settings:
    DIAG:(CDE=2023, STR=EIQ - QUO character not allowed, POS=4, REC=1, FLD=2)

            ...Disinformation is not as good as datinformation.               Don't document the program; program the document.

      Re the good Abbot NetWallah's observation that your code "doesn't compile" -- spot on and ++ even though his code could still encounter problems with slight variation in the non-conformity of the CSV (discussion below) -- here's why ...and how to fix compilation failure part of the problem:

      #!/usr/bin/perl use strict; use warnings; # OP's code from question at #1190163 use Text::CSV; my $csv = Text::CSV->new({ sep_char => ',' }); while (my $line = <DATA>) { # added open curly if ($csv->parse($line)) { (my @fields) = $csv->fields(); # enclosed my @fields in () s +o $csv does not mask earlier print print "$fields[0],"; print "$fields[1],"; print "$fields[2]\n"; } else { warn "Line could not be parsed: $line\n"; } } __DATA__ 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9

      If what you posted -- complete with strict and warnings -- that your attempt to run your code should have at least hinted at what was wrong. Adding use diagnostics (or perhaps use diagnostics -verbose or filtering your program thru splain would have at least allowed you to post code without errors... something the Monks hold to be an indicator that you're serious about learning rather than merely using us for human debuggers.

      As to the underlying problems, attend carefully to Marshall's thorough examination and exposition ... and join him in thanks to Tux for the module.

      Don't set quote_char and escape_char to undef, as that will cause any field that contains a sep_char to beak your data. For the question at hand, options like allow_loose_quotes and allow_loose_escapes are usually the way to go. In ths particular cae, setting escape_char to undef might work, but never set quote_char to undef to "fix" these kind of situations.

        Thank you (++)> I wasn't familiar with Text::CSV, and unaware of "allow_loose_quotes".
        This site is always a source of enlightenment! It has kept me returning for over 12 years.

                ...Disinformation is not as good as datinformation.               Don't document the program; program the document.

Re^3: Multiple double quotes within csv
by Marshall (Canon) on May 13, 2017 at 01:22 UTC
    Update: I think I'm closer:
    RFC-4180, paragraph "If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote."

    So:

    2,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
    is malformed, incorrect CSV, this should be:
    2,"""Rat Control"" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
    I made some experiments. Here are my (updated) results:
    #!/usr/bin/perl use strict; use warnings; $|=1; ## turn off buffering for STDOUT use Text::CSV_XS qw( csv ); my $csv = Text::CSV_XS->new(); #using the defaults while (my $line = <DATA>) { if ($csv->parse($line)) { my @fields = $csv->fields(); print join ("|",@fields),"\n"; } else { warn "Line could not be parsed: $line\n"; } } =Prints: 1|Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9 2|"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9 3|Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9 Line could not be parsed: 4,"Rat Control" <sip:+15559999999@192.168 .5 +.233>;tag=gK004bb052,9 5|123,456|abc 6|Rat|xyz 7|Rat Control|xyz Line could not be parsed: 8,""Rat Control" <sip:+15559999999@192.168 . +5.233>;tag=gK004bb052",9 =cut __DATA__ 1,Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9 2,"""Rat Control"" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9 3,Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9 4,"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9 5,"123,456",abc 6,"Rat",xyz 7,"Rat Control",xyz 8,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
    I do not understand why Line 4 which starts with unnecessary quotes is not parsed? update: But could be that the double quotes must apply to the whole field and therefore the syntax in line 2 must be used.See Line 5 which has an embedded comma and requires the quotes and is parsed correctly. See Lines 6, 7. I don't think the starting quotes are the issue, it appears that other "special" characters in Field2 are causing the problem.
Re^3: Multiple double quotes within csv
by Anonymous Monk on May 13, 2017 at 01:37 UTC
    Instead of sep_char try
    binary => 1, allow_loose_quotes => 1, blank_is_undef => 1, escape_char => undef,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1190172]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2024-04-23 10:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found