Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Converting File Delimiters

by kcott (Archbishop)
on Aug 09, 2012 at 00:24 UTC ( [id://986392]=note: print w/replies, xml ) Need Help??


in reply to Converting File Delimiters

Text::CSV can help in parsing the quoted fields with commas.

Quoting the fields containing commas in the original data stops those commas from being interpreted as separator characters. Changing the separator to a pipe character (|) removes that requirement. Consider whether |ccc, ddd| is sufficient for your needs or do you really want |"ccc, ddd"|.

If the former, this skeleton code shows the technique:

#!/usr/bin/env perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV::->new() or die Text::CSV::->error_diag(); while (my $row = $csv->getline(\*DATA)) { print join('|' => @$row), "\n"; } __DATA__ aaa,bbb,"ccc, ddd",fff

Output:

$ pm_csv_to_psv.pl aaa|bbb|ccc, ddd|fff

If you want to retain the quotation marks, you can change the print line to:

print join('|' => map { /,/ ? '"'.$_.'"' : $_ } @$row), "\n";

(There may be a more elegant way to do that.)

Output:

$ pm_csv_to_psv.pl aaa|bbb|"ccc, ddd"|fff

-- Ken

Replies are listed 'Best First'.
Re^2: Converting File Delimiters
by mmueller44 (Novice) on Aug 09, 2012 at 22:22 UTC

    I used your example and added file references but the script is not outputting all of the rows.

    #!/usr/bin/env perl use strict; use warnings; use Text::CSV; my $FileIn = 'aaa.csv'; my $FileOut = 'Converted.txt'; my $csv = Text::CSV::->new() or die Text::CSV::->error_diag(); #open my $FH, "<", "aaa.csv"; open my $FH, "<", $FileIn; open (OUTFILE, "+>$FileOut"); while (my $row = $csv->getline($FH)) { print OUTFILE join('|' => @$row), "\n"; };

    Not sure what I'm doing wrong?

      "I used your example and added file references but the script is not outputting all of the rows."

      You haven't shown any input or output!

      I am unable to reproduce your problem using your code above (with just the filenames changed). My script (pm_csv_to_psv_fhs.pl) has:

      my $FileIn = './pm_csv_to_psv_fhs.in'; my $FileOut = './pm_csv_to_psv_fhs.out';

      Here's a verbatim run showing input and (before and after) output:

      ken@ganymede: ~/tmp $ cat pm_csv_to_psv_fhs.in a,b,"c,d",e "f,g,h",i,j,"k,l" m,n,o,p,q,r,s,t "u,v,w,x,y,z" ken@ganymede: ~/tmp $ cat pm_csv_to_psv_fhs.out cat: pm_csv_to_psv_fhs.out: No such file or directory ken@ganymede: ~/tmp $ pm_csv_to_psv_fhs.pl ken@ganymede: ~/tmp $ cat pm_csv_to_psv_fhs.out a|b|c,d|e f,g,h|i|j|k,l m|n|o|p|q|r|s|t u,v,w,x,y,z ken@ganymede: ~/tmp $

      Please show equivalent information for a run of your script.

      Here's some other points to consider: all documented in open.

      • Check if you're successfully opening the files: open ... or die "Can't open ...: $!";
      • You've used the recommended 3-argument form for the input file. Why not for the output file?
      • The mode for the output file is '+>'. Why? Perhaps you wanted append mode ('>>') or read-write mode without clobbering the file first ('+<').
      • You only show code for writing to the output file. If you really do want read-write mode, where's the code for the reading part?

      Given you're new to Perl, you may be finding the documentation for open to be a little heavy going. If so, read perlopentut first - it provides a gentler introduction to the subject.

      -- Ken

        Ok, I figured out that the script is failing on the first row that contains the double-quotes around the field with the comma contained within the field.

        See rows two and three in the data below. This is a very small sample, actual file has more columns and 25000 rows .

        988A5521, 98_1V_HB, Hel Product Pool (Pds Dd), false, store, false 988A5707, 98_1V_HB, "Chinook, IPT ME Support", false, store, false 988A5708, 98_1V_HB, "Chinook, ME Factory Supt", false, store, false 988A5761, 98_1V_HB, Tandem Rotor Configuration, false, store, false

        In my case it stops on row 68, which is row 2 in the above sample. Would like it to look like sample below

        988A5521|98_1V_HB|Hel Product Pool (Pds Dd)|false| store|false 988A5707|98_1V_HB|Chinook, IPT ME Support|false|store|false 988A5708|98_1V_HB|Chinook, ME Factory Supt|false|store|false 988A5761|98_1V_HB|Tandem Rotor Configuration|false|store|false

        Any Ideas why it would die on the row with the double-quotes? Thanks, Mike

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://986392]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-23 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found