ZlR has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks & Monkettes,
Here i am once again with another too hard for me problem !
I was given numerous csv files, and they use a comma as a field separator.
To my surprise, some fields contain commas as a value. Since I intend to split on comas this kinda wrecks it all up. Those fields get enclosed in quotes, so one can see that the comas inside are values, and not a field separator.
Maybe I could use a csv module, but that's too easy ! But seriously, it got me thinking on this regexp problem :
# a 4 field coma separated csv line
$l = 'w,ww,"a,bb,ccc,3 ,ee,",4' ;
How can i turn it into :
# a 4 field coma splitable csv line
'w,ww,"a-bb-ccc-3 -ee-",4' ;
I do not have a single idea so far ... I guess it would be like running a regexp ( s/,/-/g ) on a match ( m/"(.*)"/ ), but i don't know how to do that.
Anyone knows how ?
Thanks !
Re: Replacing commas in a substring when it is between quotes (2 ways)
by tye (Sage) on Mar 28, 2012 at 19:38 UTC
|
Using Text::CSV seems like a more likely way to get it right on the first try. But your idea isn't hard to do:
s{(".*?")}{ my $s = $1; $s =~ s/,/-/g; $s }ge
You can also replace the regex with something that handles escapes.
Just for variety, here is another approach:
my $in = 0;
s{(")|,}{ if($1){ $in= !$in; $1 }elsif($in){ '-' }else{ ',' } }ge
| [reply] [d/l] [select] |
|
| [reply] |
Re: Replacing comas in a substring when it is between quotes
by AnomalousMonk (Archbishop) on Mar 28, 2012 at 18:42 UTC
|
Yes, use a CSV module, but if you just gotta know (I use single-quotes instead of double-quotes to keep Windoze happy):
>perl -wMstrict -le
"my $s = q{w,ww,'a,bb,ccc,3 ,ee,',4,'a,\'b,c\',d',zz};
print qq{[$s]};
;;
$s =~ s{ ( ' [^\\']* (?: \\. [^\\']*)* ' ) }
{ (my $o = $1) =~ s{,}{-}xmsg; $o; }xmspge;
print qq{[$s]};
"
[w,ww,'a,bb,ccc,3 ,ee,',4,'a,\'b,c\',d',zz]
[w,ww,'a-bb-ccc-3 -ee-',4,'a-\'b-c\'-d',zz]
| [reply] [d/l] |
Re: Replacing comas in a substring when it is between quotes
by johngg (Canon) on Mar 28, 2012 at 21:42 UTC
|
Another way is to use the third argument of split to attack the string from both ends to isolate the element in double quotes.
knoppix@Microknoppix:~$ perl -E '
> $l = q{w,ww,"a,bb,ccc,3 ,ee,",4};
> @flds = split m{,}, $l, 3;
> push @flds, reverse
> map { scalar reverse $_ }
> split m{,}, reverse( pop @flds ), 2;
> $flds[ 2 ] =~ s{,}{-}g;
> $l = join q{,}, @flds;
> say $l;'
w,ww,"a-bb-ccc-3 -ee-",4
knoppix@Microknoppix:~$
I hope this is of interest.
| [reply] [d/l] |
|
yep ! got me to look up split and find was that this third argument does ! thanks !!
| [reply] |
Re: Replacing comas in a substring when it is between quotes
by Tux (Canon) on Mar 29, 2012 at 06:50 UTC
|
As Text::CSV is a pure-perl module (copying Text::CSV_XS which is C/XS), you can also start digging into CSV_PP.pm and see how the module solves this. Then again, if this urge to find out how to solve this (interesting) specific problem stems from a real-life situation, I'd still tell you to not dig and just use the module.
Enjoy, Have FUN! H.Merijn
| [reply] |
Re: Replacing comas in a substring when it is between quotes
by Anonymous Monk on Mar 28, 2012 at 18:14 UTC
|
I could use a csv module, but that's too easy
It is easy to use the module, and that's why you should just do it.
Then later when you find that O.M.G., sometimes they're putting escaped Quotes inside the quotes too... no problem, it is already covered by the module.
| [reply] |
Re: Replacing comas in a substring when it is between quotes
by Anonymous Monk on Mar 28, 2012 at 18:34 UTC
|
| [reply] |
Re: Replacing comas in a substring when it is between quotes
by JavaFan (Canon) on Mar 29, 2012 at 10:39 UTC
|
$_ = 'w,ww,"a,bb,ccc,3 ,ee,",4';
@f=qw=,
-=;say join"",map{@f=@f[1,0]if/"/;/,/?$f[0]:$_}split//;
__END__
w,ww,"a-bb-ccc-3 -ee-",4
| [reply] [d/l] |
|
|