Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Anything that is not ',' except .... RegEx question

by theorbtwo (Prior)
on Feb 27, 2008 at 14:40 UTC ( #670646=note: print w/ replies, xml ) Need Help??


in reply to Anything that is not ',' except .... RegEx question

Any decent CSV module will deal with commas embedded inside the data sections of CSV files.

However, to *somewhat* more directly speak to the question you asked, instead of the question you should have asked, I think you've made a basic mistake with regexes. The inside of a [...] is a bunch of chars, not a bunch of strings, and certianly not another regex. [\d|\.] doesn't mean a digit or a dot, it means a digit, a pipe, or a dot. [\d.] would be a digit or a dot.

Anyway, writing your own CSV parser isn't going to be the best way to solve this problem. Instead, use somebody else's CSV parser, then deal with the data you get back from it.


Comment on Re: Anything that is not ',' except .... RegEx question
Select or Download Code
Re^2: Anything that is not ',' except .... RegEx question
by Win (Novice) on Feb 27, 2008 at 14:50 UTC
    I think that something like the following:
    my $line =~ s/, City of/c City of/g; my $line =~ s/, County of/c County of/g; if ( $line =~ m/^([^,]+),([^,]+),[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+,,[\d|\.] ++,[\d|\.]+,[\d|\.]+,[\d|\.]+,,[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+/ ) { my $line =~ s/c City of/, City of/g; my $line =~ s/c County of/, County of/g;
    would be easier.

      And here, ladies and gentlemen we have, once again, our prime exhibit of stupidity and unwillingness to learn. There really is no other like him in the monastery.

      You have doubts? Perhaps you think I am being uncouth and should admonish the poor creature more kindly? I will put your minds at ease with just two simple exhibits:

      Exhibit A (posted by the kindly orbtwo):

      The inside of a [...] is a bunch of chars, not a bunch of strings, and certianly not another regex. [\d|\.] doesn't mean a digit or a dot, it means a digit, a pipe, or a dot. [\d.] would be a digit or a dot.

      To which our valiant Sir Doesntwannaknow replied (Exhibit B):

      ... something like the following: ...
      m/^([^,]+),([^,]+),[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+,,[\d|\.],[\d|\. +]+,[\d|\.]+,[\d|\.]+,,[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+/

      I rest my case and fall over laughing.


      All Wins posts are stupid.
        The line in question works as intended apart from the problem of the comma followed by " City of". In other words I have a nearly working solution and I don't want to do anything too radical.

      I was greatly entertained by how you ignored every last shred of advice in this topic. You didn't even take the part about [\d|\.] not meaning what you think it means, and used it anyway.

      Either way i would talk to the person sending you this CSV since its not a legal CSV if they are putting data with commas in it without quoting those fields.

      BTW you can store repeated elements of your regex in a named variable. Like:

      my $num = qr/[\d.]+/; if ($line = m/^([^,]+),([^,]+),$num,$num,$num,$num,,$num,$num,$num,$nu +m,,$num,$num,$num,$num/) {

      That way if you want to update that pattern you can update it just once.


      ___________
      Eric Hodges

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://670646]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (18)
As of 2014-10-20 18:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (89 votes), past polls