mmueller44 has asked for the wisdom of the Perl Monks concerning the following question:
How can I convert delimiters within a text file from "," to "|", but leave the "," within the quoted field? i.e. aaa,bbb,"ccc, ddd", fff would convert to aaa|bbb|"ccc, ddd"|fff I have already tried the Perl command line: perl -pi.bak -e "s/, /|/g" aaa.csvBut this replaced all of the commas, even within the quoted field I'm new to Perl! Thanks, Mike
Re: Converting File Delimiters
by kcott (Archbishop) on Aug 09, 2012 at 00:24 UTC
|
Text::CSV can help in parsing the quoted fields with commas.
Quoting the fields containing commas in the original data stops those commas from being interpreted as separator characters. Changing the separator to a pipe character (|) removes that requirement. Consider whether |ccc, ddd| is sufficient for your needs or do you really want |"ccc, ddd"|.
If the former, this skeleton code shows the technique:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV::->new() or die Text::CSV::->error_diag();
while (my $row = $csv->getline(\*DATA)) {
print join('|' => @$row), "\n";
}
__DATA__
aaa,bbb,"ccc, ddd",fff
Output:
$ pm_csv_to_psv.pl
aaa|bbb|ccc, ddd|fff
If you want to retain the quotation marks, you can change the print line to:
print join('|' => map { /,/ ? '"'.$_.'"' : $_ } @$row), "\n";
(There may be a more elegant way to do that.)
Output:
$ pm_csv_to_psv.pl
aaa|bbb|"ccc, ddd"|fff
| [reply] [d/l] [select] |
|
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $FileIn = 'aaa.csv';
my $FileOut = 'Converted.txt';
my $csv = Text::CSV::->new() or die Text::CSV::->error_diag();
#open my $FH, "<", "aaa.csv";
open my $FH, "<", $FileIn;
open (OUTFILE, "+>$FileOut");
while (my $row = $csv->getline($FH))
{
print OUTFILE join('|' => @$row), "\n";
};
Not sure what I'm doing wrong? | [reply] [d/l] |
|
my $FileIn = './pm_csv_to_psv_fhs.in';
my $FileOut = './pm_csv_to_psv_fhs.out';
Here's a verbatim run showing input and (before and after) output:
ken@ganymede: ~/tmp
$ cat pm_csv_to_psv_fhs.in
a,b,"c,d",e
"f,g,h",i,j,"k,l"
m,n,o,p,q,r,s,t
"u,v,w,x,y,z"
ken@ganymede: ~/tmp
$ cat pm_csv_to_psv_fhs.out
cat: pm_csv_to_psv_fhs.out: No such file or directory
ken@ganymede: ~/tmp
$ pm_csv_to_psv_fhs.pl
ken@ganymede: ~/tmp
$ cat pm_csv_to_psv_fhs.out
a|b|c,d|e
f,g,h|i|j|k,l
m|n|o|p|q|r|s|t
u,v,w,x,y,z
ken@ganymede: ~/tmp
$
Please show equivalent information for a run of your script.
Here's some other points to consider: all documented in open.
-
Check if you're successfully opening the files: open ... or die "Can't open ...: $!";
-
You've used the recommended 3-argument form for the input file. Why not for the output file?
-
The mode for the output file is '+>'. Why? Perhaps you wanted append mode ('>>') or read-write mode without clobbering the file first ('+<').
-
You only show code for writing to the output file. If you really do want read-write mode, where's the code for the reading part?
Given you're new to Perl, you may be finding the documentation for open to be a little heavy going. If so, read perlopentut first - it provides a gentler introduction to the subject.
| [reply] [d/l] [select] |
|
|
|
Re: Converting File Delimiters
by BrowserUk (Patriarch) on Aug 08, 2012 at 23:44 UTC
|
$s = 'aaa,bbb,"ccc, ddd", fff';
print join '|', $s =~ m[("[^"]+"|[^,]+)]g;
aaa|bbb|"ccc, ddd"| fff
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] |
|
That is so much nicer than what I came up with, but I'll add mine, too (otherwise it would have felt like a total waste of time):
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
my $string = qq(aaa,bbb,"ccc,ddd",fff,ggg,"hhh,iii",jjj);
my @split_on_comma = split /,/, $string;
my @quoted;
my @ready_to_join;
for (@split_on_comma) {
if ( /^"/ .. /"$/ ) {
s/"//g; #remove this if you actually want "s in your output
push @quoted, $_;
}
else {
push @ready_to_join, join ',', @quoted if scalar @quoted;
@quoted = ();
push @ready_to_join, $_;
}
}
my $pipe_delim = join '|', @ready_to_join;
say $pipe_delim;
Output: aaa|bbb|ccc,ddd|fff|ggg|hhh,iii|jjj | [reply] [d/l] [select] |
|
This code is what I want, but how do I get it to use a file for input and output? I have tried various revisions but not having any luck Thanks, Mike
| [reply] |
|
perl -ple"$_ = join '|', m[(\x22[^\x22]+\x22|[^,]+)]g" in.csv > piped.
+csv
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] |
|
On a client's system, where I'd have to run the change management gauntlet to install Text::CSV, I remembered this node. I had to adjust it to handle empty numeric fields (,,,):
$_ = '"",,,"abc",0,,';
print join '|',
m/
"[^"]+" # quoted string
| [^,]+ # or non-commas
| (?<=,)(?=,|$) # or nothing, surrounded by commas or EOL
/gx;
# prints:
""|||"abc"|0||
| [reply] [d/l] |
|
Nice extension!
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP Neil Armstrong
| [reply] |
Re: Converting File Delimiters
by aaron_baugher (Curate) on Aug 09, 2012 at 00:39 UTC
|
#!/usr/bin/env perl
use Modern::Perl;
use Text::CSV;
my $ic = Text::CSV->new()
or die Text::CSV->error_diag();
my $oc = Text::CSV->new({sep_char => '|', eol => $/ })
or die Text::CSV->error_diag();
open my $if, '<', 'infile' or die $!;
open my $of, '>', 'outfile' or die $!;
while( my $r = $ic->getline($if)){
$oc->print($of, $r);
}
close $if;
close $of;
Update: I'm not normally a "use modules for everything" kind of guy, but this is one task where I think it's a no-brainer. Note that all the other solutions above will give you a borked output file if any of your non-quoted fields contain a pipe character. Of course, you may know that will never be an issue with your data. But in general, letting the module handle quoting on both input and output is the safest bet.
Aaron B.
Available for small or large Perl jobs; see my home node.
| [reply] [d/l] |
|
Your code work great, except for it dies on the first row containing double-quotes on the 3rd field in the input file. It processes the first 67 rows, puts double-quotes around all fields and changes the delimiter to |. i.e. 988A1093|" 98_08_NR"|" Environmental Affairs"|" false"|" store"|" false" Not sure how to handle. Thanks, Mike
| [reply] |
|
What does "dies on the first row" mean? Does it give you an error message? Can you show the input line that it dies on, perhaps with a couple lines before and after it for context?
Also, check the documentation for Text::CSV for info on using binary or different encodings, in case your input has any characters that are not simple ASCII text.
Aaron B.
Available for small or large Perl jobs; see my home node.
| [reply] |
|
|
|
Re: Converting File Delimiters
by CountZero (Bishop) on Aug 09, 2012 at 14:00 UTC
|
perl -MApp::CCSV -ne "$csv->sep_char('|'); csay @f; $csv->sep_char(','
+)" <comma.csv >pipe.CSV
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] [d/l] |
|
I'm getting an error running the code below
perl -MApp::CCSV -ne "$csv->sep_char('|'); csay @f; $csv->sep_char(','+)" <aaa.txt >pipe.txt
syntax error at -e line 1, at EOF
Execution of -e aborted due to compilation errors. I think it has something to do with the double quotes
| [reply] [d/l] |
|
I think it has something to do with the double quotes
Yes, and with your shell. As a rule of thumb, you need to quote one-liners with double quotes on Windows, and with single quotes on Unix systems including Linux, *BSD, Mac OS X and the cygwin shell.
If you use double quotes on Unix, the shell interpolates the quoted string, and because you rarely have an environment variable named "csv", each "$csv" is replaced with an empty string.
This does not happen on Windows with command.com and cmd.exe, because their quoting rules are, um, different. Strange may be a better word, bug-compatible back to MS-DOS 1.0 describes what actually happens.
And don't forget that Windows does not expand wildcards for you. This is not a problem with this one-liner, but often you want to use Win32::Autoglob:
X:\>perl -E "say for @ARGV" *.exe
*.exe
X:\>perl -MWin32::Autoglob -E "say for @ARGV" *.exe
cksum.exe
cp.exe
cut.exe
date.exe
sort.exe
sum.exe
X:\>
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] |
Re: Converting File Delimiters
by mmueller44 (Novice) on Aug 09, 2012 at 02:58 UTC
|
Never mind, found the way to install Text/CSV module using Active Perl v5.14. Had to use the ppm command. From DOS command prompt: ppm install Text-CSV Thanks for all of the support and examples! Hopefully I will get it working tomorrow. Thanks, Mike
| [reply] |
Re: Converting File Delimiters
by mmueller44 (Novice) on Aug 09, 2012 at 01:52 UTC
|
I need the Text/CSV.pm to run the Text::CSV, where can I download the module?
| [reply] |
|
Use your favorite CPAN utility. On Unix, cpan, cpanm, cpanp, or others. On Windows, I think there are similar tools.
Aaron B.
Available for small or large Perl jobs; see my home node.
| [reply] [d/l] [select] |
Re: Converting File Delimiters
by mmueller44 (Novice) on Aug 08, 2012 at 23:57 UTC
|
Ok, so how do I get it to use a text file as input and output the changes to another file? Mike
| [reply] |
|
|