Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Converting File Delimiters

by mmueller44 (Novice)
on Aug 08, 2012 at 23:27 UTC ( [id://986386]=perlquestion: print w/replies, xml ) Need Help??

mmueller44 has asked for the wisdom of the Perl Monks concerning the following question:

How can I convert delimiters within a text file from "," to "|", but leave the "," within the quoted field?

i.e. aaa,bbb,"ccc, ddd", fff would convert to aaa|bbb|"ccc, ddd"|fff

I have already tried the Perl command line:

 perl -pi.bak -e "s/, /|/g" aaa.csv

But this replaced all of the commas, even within the quoted field

I'm new to Perl! Thanks, Mike

Replies are listed 'Best First'.
Re: Converting File Delimiters
by kcott (Archbishop) on Aug 09, 2012 at 00:24 UTC

    Text::CSV can help in parsing the quoted fields with commas.

    Quoting the fields containing commas in the original data stops those commas from being interpreted as separator characters. Changing the separator to a pipe character (|) removes that requirement. Consider whether |ccc, ddd| is sufficient for your needs or do you really want |"ccc, ddd"|.

    If the former, this skeleton code shows the technique:

    #!/usr/bin/env perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV::->new() or die Text::CSV::->error_diag(); while (my $row = $csv->getline(\*DATA)) { print join('|' => @$row), "\n"; } __DATA__ aaa,bbb,"ccc, ddd",fff

    Output:

    $ pm_csv_to_psv.pl aaa|bbb|ccc, ddd|fff

    If you want to retain the quotation marks, you can change the print line to:

    print join('|' => map { /,/ ? '"'.$_.'"' : $_ } @$row), "\n";

    (There may be a more elegant way to do that.)

    Output:

    $ pm_csv_to_psv.pl aaa|bbb|"ccc, ddd"|fff

    -- Ken

      I used your example and added file references but the script is not outputting all of the rows.

      #!/usr/bin/env perl use strict; use warnings; use Text::CSV; my $FileIn = 'aaa.csv'; my $FileOut = 'Converted.txt'; my $csv = Text::CSV::->new() or die Text::CSV::->error_diag(); #open my $FH, "<", "aaa.csv"; open my $FH, "<", $FileIn; open (OUTFILE, "+>$FileOut"); while (my $row = $csv->getline($FH)) { print OUTFILE join('|' => @$row), "\n"; };

      Not sure what I'm doing wrong?

        "I used your example and added file references but the script is not outputting all of the rows."

        You haven't shown any input or output!

        I am unable to reproduce your problem using your code above (with just the filenames changed). My script (pm_csv_to_psv_fhs.pl) has:

        my $FileIn = './pm_csv_to_psv_fhs.in'; my $FileOut = './pm_csv_to_psv_fhs.out';

        Here's a verbatim run showing input and (before and after) output:

        ken@ganymede: ~/tmp $ cat pm_csv_to_psv_fhs.in a,b,"c,d",e "f,g,h",i,j,"k,l" m,n,o,p,q,r,s,t "u,v,w,x,y,z" ken@ganymede: ~/tmp $ cat pm_csv_to_psv_fhs.out cat: pm_csv_to_psv_fhs.out: No such file or directory ken@ganymede: ~/tmp $ pm_csv_to_psv_fhs.pl ken@ganymede: ~/tmp $ cat pm_csv_to_psv_fhs.out a|b|c,d|e f,g,h|i|j|k,l m|n|o|p|q|r|s|t u,v,w,x,y,z ken@ganymede: ~/tmp $

        Please show equivalent information for a run of your script.

        Here's some other points to consider: all documented in open.

        • Check if you're successfully opening the files: open ... or die "Can't open ...: $!";
        • You've used the recommended 3-argument form for the input file. Why not for the output file?
        • The mode for the output file is '+>'. Why? Perhaps you wanted append mode ('>>') or read-write mode without clobbering the file first ('+<').
        • You only show code for writing to the output file. If you really do want read-write mode, where's the code for the reading part?

        Given you're new to Perl, you may be finding the documentation for open to be a little heavy going. If so, read perlopentut first - it provides a gentler introduction to the subject.

        -- Ken

Re: Converting File Delimiters
by BrowserUk (Patriarch) on Aug 08, 2012 at 23:44 UTC

    $s = 'aaa,bbb,"ccc, ddd", fff'; print join '|', $s =~ m[("[^"]+"|[^,]+)]g; aaa|bbb|"ccc, ddd"| fff

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      That is so much nicer than what I came up with, but I'll add mine, too (otherwise it would have felt like a total waste of time):
      #!/usr/bin/env perl use strict; use warnings; use feature 'say'; my $string = qq(aaa,bbb,"ccc,ddd",fff,ggg,"hhh,iii",jjj); my @split_on_comma = split /,/, $string; my @quoted; my @ready_to_join; for (@split_on_comma) { if ( /^"/ .. /"$/ ) { s/"//g; #remove this if you actually want "s in your output push @quoted, $_; } else { push @ready_to_join, join ',', @quoted if scalar @quoted; @quoted = (); push @ready_to_join, $_; } } my $pipe_delim = join '|', @ready_to_join; say $pipe_delim;
      Output: aaa|bbb|ccc,ddd|fff|ggg|hhh,iii|jjj

        This code is what I want, but how do I get it to use a file for input and output?

        I have tried various revisions but not having any luck

        Thanks, Mike

      The above as a one-liner

      perl -ple"$_ = join '|', m[(\x22[^\x22]+\x22|[^,]+)]g" in.csv > piped. +csv

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

      On a client's system, where I'd have to run the change management gauntlet to install Text::CSV, I remembered this node. I had to adjust it to handle empty numeric fields (,,,):

      $_ = '"",,,"abc",0,,'; print join '|', m/ "[^"]+" # quoted string | [^,]+ # or non-commas | (?<=,)(?=,|$) # or nothing, surrounded by commas or EOL /gx; # prints: ""|||"abc"|0||

        Nice extension!


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re: Converting File Delimiters
by aaron_baugher (Curate) on Aug 09, 2012 at 00:39 UTC

    You could let a module do the work of keeping the quotes and delimiters straight:

    #!/usr/bin/env perl use Modern::Perl; use Text::CSV; my $ic = Text::CSV->new() or die Text::CSV->error_diag(); my $oc = Text::CSV->new({sep_char => '|', eol => $/ }) or die Text::CSV->error_diag(); open my $if, '<', 'infile' or die $!; open my $of, '>', 'outfile' or die $!; while( my $r = $ic->getline($if)){ $oc->print($of, $r); } close $if; close $of;

    Update: I'm not normally a "use modules for everything" kind of guy, but this is one task where I think it's a no-brainer. Note that all the other solutions above will give you a borked output file if any of your non-quoted fields contain a pipe character. Of course, you may know that will never be an issue with your data. But in general, letting the module handle quoting on both input and output is the safest bet.

    Aaron B.
    Available for small or large Perl jobs; see my home node.

      Your code work great, except for it dies on the first row containing double-quotes on the 3rd field in the input file.

      It processes the first 67 rows, puts double-quotes around all fields and changes the delimiter to |.

      i.e. 988A1093|" 98_08_NR"|" Environmental Affairs"|" false"|" store"|" false"

      Not sure how to handle. Thanks, Mike

        What does "dies on the first row" mean? Does it give you an error message? Can you show the input line that it dies on, perhaps with a couple lines before and after it for context? Also, check the documentation for Text::CSV for info on using binary or different encodings, in case your input has any characters that are not simple ASCII text.

        Aaron B.
        Available for small or large Perl jobs; see my home node.

Re: Converting File Delimiters
by CountZero (Bishop) on Aug 09, 2012 at 14:00 UTC
    As a one-liner using App::CCSV:
    perl -MApp::CCSV -ne "$csv->sep_char('|'); csay @f; $csv->sep_char(',' +)" <comma.csv >pipe.CSV

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      I'm getting an error running the code below

      perl -MApp::CCSV -ne "$csv->sep_char('|'); csay @f; $csv->sep_char(','+)" <aaa.txt >pipe.txt

      syntax error at -e line 1, at EOF Execution of -e aborted due to compilation errors.

      I think it has something to do with the double quotes

        I think it has something to do with the double quotes

        Yes, and with your shell. As a rule of thumb, you need to quote one-liners with double quotes on Windows, and with single quotes on Unix systems including Linux, *BSD, Mac OS X and the cygwin shell.

        If you use double quotes on Unix, the shell interpolates the quoted string, and because you rarely have an environment variable named "csv", each "$csv" is replaced with an empty string.

        This does not happen on Windows with command.com and cmd.exe, because their quoting rules are, um, different. Strange may be a better word, bug-compatible back to MS-DOS 1.0 describes what actually happens.

        And don't forget that Windows does not expand wildcards for you. This is not a problem with this one-liner, but often you want to use Win32::Autoglob:

        X:\>perl -E "say for @ARGV" *.exe *.exe X:\>perl -MWin32::Autoglob -E "say for @ARGV" *.exe cksum.exe cp.exe cut.exe date.exe sort.exe sum.exe X:\>

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Converting File Delimiters
by mmueller44 (Novice) on Aug 09, 2012 at 02:58 UTC

    Never mind, found the way to install Text/CSV module using Active Perl v5.14.

    Had to use the ppm command. From DOS command prompt: ppm install Text-CSV

    Thanks for all of the support and examples!

    Hopefully I will get it working tomorrow. Thanks, Mike

Re: Converting File Delimiters
by mmueller44 (Novice) on Aug 09, 2012 at 01:52 UTC

    I need the Text/CSV.pm to run the Text::CSV, where can I download the module?

      Use your favorite CPAN utility. On Unix, cpan, cpanm, cpanp, or others. On Windows, I think there are similar tools.

      Aaron B.
      Available for small or large Perl jobs; see my home node.

Re: Converting File Delimiters
by mmueller44 (Novice) on Aug 08, 2012 at 23:57 UTC

    Ok, so how do I get it to use a text file as input and output the changes to another file?

    Mike

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://986386]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2024-05-17 06:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found