Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Replacing comas in a substring when it is between quotes

by ZlR (Chaplain)
on Mar 28, 2012 at 18:08 UTC ( #962237=perlquestion: print w/ replies, xml ) Need Help??
ZlR has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks & Monkettes,

Here i am once again with another too hard for me problem ! I was given numerous csv files, and they use a comma as a field separator. To my surprise, some fields contain commas as a value. Since I intend to split on comas this kinda wrecks it all up. Those fields get enclosed in quotes, so one can see that the comas inside are values, and not a field separator.

Maybe I could use a csv module, but that's too easy ! But seriously, it got me thinking on this regexp problem :

# a 4 field coma separated csv line $l = 'w,ww,"a,bb,ccc,3 ,ee,",4' ;

How can i turn it into :

# a 4 field coma splitable csv line 'w,ww,"a-bb-ccc-3 -ee-",4' ;

I do not have a single idea so far ...
I guess it would be like running a regexp ( s/,/-/g ) on a match ( m/"(.*)"/ ), but i don't know how to do that.

Anyone knows how ? Thanks !

Comment on Replacing comas in a substring when it is between quotes
Select or Download Code
Re: Replacing comas in a substring when it is between quotes
by Anonymous Monk on Mar 28, 2012 at 18:14 UTC
    I could use a csv module, but that's too easy

    It is easy to use the module, and that's why you should just do it.

    Then later when you find that O.M.G., sometimes they're putting escaped Quotes inside the quotes too... no problem, it is already covered by the module.

Re: Replacing comas in a substring when it is between quotes
by Anonymous Monk on Mar 28, 2012 at 18:34 UTC
Re: Replacing comas in a substring when it is between quotes
by AnomalousMonk (Monsignor) on Mar 28, 2012 at 18:42 UTC

    Yes, use a CSV module, but if you just gotta know (I use single-quotes instead of double-quotes to keep Windoze happy):

    >perl -wMstrict -le "my $s = q{w,ww,'a,bb,ccc,3 ,ee,',4,'a,\'b,c\',d',zz}; print qq{[$s]}; ;; $s =~ s{ ( ' [^\\']* (?: \\. [^\\']*)* ' ) } { (my $o = $1) =~ s{,}{-}xmsg; $o; }xmspge; print qq{[$s]}; " [w,ww,'a,bb,ccc,3 ,ee,',4,'a,\'b,c\',d',zz] [w,ww,'a-bb-ccc-3 -ee-',4,'a-\'b-c\'-d',zz]
Re: Replacing commas in a substring when it is between quotes (2 ways)
by tye (Cardinal) on Mar 28, 2012 at 19:38 UTC

    Using Text::CSV seems like a more likely way to get it right on the first try. But your idea isn't hard to do:

    s{(".*?")}{ my $s = $1; $s =~ s/,/-/g; $s }ge

    You can also replace the regex with something that handles escapes.

    Just for variety, here is another approach:

    my $in = 0; s{(")|,}{ if($1){ $in= !$in; $1 }elsif($in){ '-' }else{ ',' } }ge

    - tye        

      Thanks Tye !! That's nice and neat, just what i need and a real time saver right now :)

      I had no idea one could use commands from inside the substitute side, that's great.

Re: Replacing comas in a substring when it is between quotes
by johngg (Abbot) on Mar 28, 2012 at 21:42 UTC

    Another way is to use the third argument of split to attack the string from both ends to isolate the element in double quotes.

    knoppix@Microknoppix:~$ perl -E ' > $l = q{w,ww,"a,bb,ccc,3 ,ee,",4}; > @flds = split m{,}, $l, 3; > push @flds, reverse > map { scalar reverse $_ } > split m{,}, reverse( pop @flds ), 2; > $flds[ 2 ] =~ s{,}{-}g; > $l = join q{,}, @flds; > say $l;' w,ww,"a-bb-ccc-3 -ee-",4 knoppix@Microknoppix:~$

    I hope this is of interest.

    Cheers,

    JohnGG

      yep ! got me to look up split and find was that this third argument does ! thanks !!
Re: Replacing comas in a substring when it is between quotes
by Tux (Monsignor) on Mar 29, 2012 at 06:50 UTC

    As Text::CSV is a pure-perl module (copying Text::CSV_XS which is C/XS), you can also start digging into CSV_PP.pm and see how the module solves this. Then again, if this urge to find out how to solve this (interesting) specific problem stems from a real-life situation, I'd still tell you to not dig and just use the module.


    Enjoy, Have FUN! H.Merijn
Re: Replacing comas in a substring when it is between quotes
by JavaFan (Canon) on Mar 29, 2012 at 10:39 UTC
    $_ = 'w,ww,"a,bb,ccc,3 ,ee,",4'; @f=qw=, -=;say join"",map{@f=@f[1,0]if/"/;/,/?$f[0]:$_}split//; __END__ w,ww,"a-bb-ccc-3 -ee-",4

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://962237]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (14)
As of 2014-07-28 18:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (206 votes), past polls