Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Replace commas with spaces between quotes, parsing CSV

by Marshall (Canon)
on Apr 15, 2016 at 19:28 UTC ( [id://1160577]=note: print w/replies, xml ) Need Help??


in reply to Re: Replace commas with spaces between quotes, parsing CSV
in thread Replace commas with spaces between quotes, parsing CSV

I ran this on the OP's data set and it looks like it gives the correct answer, but also throws a couple of warnings in the process. I haven't quite figured out why? Update, new code, try to eliminate warnings... Original code...
#!usr/bin/perl use warnings; use strict; $|=1; my $line = '1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U" +,16,IFC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, +PURE",4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D00 +1EF449E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512 +050524F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 +9:30'; my $i=0; my $line2 = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; print "line 1=$line\n"; print "line 2=$line2\n"; __END__ Odd number of elements in anonymous hash at C:\Projects_Perl\handledou +blequotecsv.pl line 9. Odd number of elements in anonymous hash at C:\Projects_Perl\handledou +blequotecsv.pl line 9. line 1=1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,I +FC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE" +,4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF44 +9E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D51205052 +4F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 line 2=1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC + 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/2 +8/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3A +B97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F44 +5F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30
New version with OP's input line:
#!usr/bin/perl use warnings; use strict; $|=1; my $line = '1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U" +,16,IFC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, +PURE",4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D00 +1EF449E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512 +050524F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 +9:30'; my $i=0; my $line2 = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; print "line 1=$line\n"; print "line 2=$line2\n"; my $j=0; # recoded into a slightly different way... my @csv; foreach (split /"/,$line) { s/,/ /g if $j++ % 2; push @csv, $_; } print "\n\n$line\n"; print @csv,"\n"; __END__ first version, 2 warnings.... Odd number of elements in anonymous hash at C:\Projects_Perl\handledou +blequotecsv.pl line 9. Odd number of elements in anonymous hash at C:\Projects_Perl\handledou +blequotecsv.pl line 9. line 1=1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,I +FC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE" +,4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF44 +9E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D51205052 +4F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 line 2=1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC + 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/2 +8/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3A +B97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F44 +5F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 2nd version no warnings.... 1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,IFC 8112 +NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE",4/28/2 +015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97 +F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F5 +04F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC 8112NP +,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/28/2015, +1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97F0B9 +8C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F504F5 +331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 Process completed successfully
Of course @csv chould be joined back together since it was split on '"' instead of ',' and then re-split on comma. This algorithm does appear to work when reformulated and does its job of getting rid of commas within double quotes without run-time warnings. Replacing the "," with a space might not be the "best". possible improvement is to reduce multiple spaces to single spaces.

Replies are listed 'Best First'.
Re^3: Replace commas with spaces between quotes, parsing CSV
by BigRedEO (Acolyte) on Apr 15, 2016 at 19:43 UTC

    I had tried throwing his line in my script here -

    my $filename = 'tested.csv'; open my $FH, $filename or die "Could not read from $filename <$!>, program halting."; # Read the header line. chomp(my $line = <$FH>); my $i = 0; $line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; my @fields = split(/,/, $line); print Dumper(@fields), $/; my @data; # Read the lines one by one. while($line = <$FH>) {

    and got this when I tried to run the script -

    Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5.
Re^3: Replace commas with spaces between quotes, parsing CSV
by BigRedEO (Acolyte) on Apr 15, 2016 at 20:02 UTC
    Curious - what is the line with the pipe here doing - $|=1;
      The line $|=1 turns off buffering on STDOUT. I always use that when debugging code. STDERR is unbuffered by default and STDOUT is buffered by default.

      What that means is that when writing to STDOUT, you may have to write say 4Kbytes of data before it actually gets output to the screen or the hard disk. This is what you want for the program to run fast. However, that means that the unbuffered STDERR messages come out immediately and "out of order" with the STDOUT printout. In other words, it can happen that the error message appears first and then much later the stuff associated with the error appears.

      There is a big performance penalty for using $|=1, especially to the HD. The disk system works most efficiently with "blocks". A typical set up will be 512 byte fundamental "writeable" units on the HD, these are grouped together 8 at a time, to make a 4K block. And that is what is used to write to the disk and what is tracked by the directory system.

      $|=1 will force a write for every "print" statement instead of have this buffered up to some larger number that the I/O system would rather deal with. You will find that even a one bye text file takes a minimum size unit on the disk.

      Update with a demo:

      #!/usr/bin/perl use warnings; use strict; my $x = undef; print "asdwf\n"; print "n,zxncv\n"; print "$x\n"; =Prints Use of uninitialized value $x in concatenation (.) or string at C:\Pro +jects_Perl\demounbuffer.pl line 8. asdwf n,zxncv =cut $|=1; $x = undef; print "asdwf\n"; print "n,zxncv\n"; print "$x\n"; =Prints asdwf n,zxncv Use of uninitialized value $x in concatenation (.) or string at C:\Pro +jects_Perl\demounbuffer.pl line 20. =cut
      In the first code, the error message comes out first although this is the third print statement!!! In the second code (the same code, but with $|=1;), the error comes out 3rd in the print order! That more realistically presents the time line order of what actually happened.

      I also show another "Perl trick", I used the PerlDoc feature to insert some explanation into the code. This is normally used to insert documentation at the beginning of the code, but I "cheated".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1160577]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-19 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found