Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Welcome to the Monastery
 
PerlMonks  

Anything that is not ',' except .... RegEx question

by Win (Novice)
on Feb 27, 2008 at 12:16 UTC ( #670605=perlquestion: print w/ replies, xml ) Need Help??
Win has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I would like a regex that will pick out a string that does not contain a comma except where ", City of" or ", County of" is present.

I tried the following:
if ( $_ =~ m/^([^,]+),([^,|,\sCity\sof|,\sCounty\sof]+),[\d|\.]+,[\d|\.]+, +[\d|\.]+,[\d|\.]+,,[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+,,[\d|\.]+,[\d| +\.]+,[\d|\.]+,[\d|\.]+/ ) {
without luck. Any help appreciated.

Update: I am working with a CSV file. However I am not sure if the CSV modules suggested below would get around the problem of commas existing where they shouldn't.

Comment on Anything that is not ',' except .... RegEx question
Download Code
Re: Anything that is not ',' except .... RegEx question
by dave_the_m (Parson) on Feb 27, 2008 at 12:40 UTC
    It's a lot easier to do it using multiple patterns:
    print "$_\n" if !/,/ or /, City of/ or /, County of/;

    Dave.

Re: Anything that is not ',' except .... RegEx question
by McDarren (Abbot) on Feb 27, 2008 at 12:42 UTC
    I declare this to be an XY Problem.

    My guess would be that you have some CSV data where the data itself contains commas, and you are tying yourself in knots trying to parse it with a regex.

    If that is the case, then I would point you to something like Text::CSV

    - Darren

      Hi,
      if it is really some cvs stuff, I'd also throw DBD::CSV in - it allows for working with csv files by using SQL statements. Quite cool IMHO.
      Regards,
      svenXY
Re: Anything that is not ',' except .... RegEx question
by Punitha (Priest) on Feb 27, 2008 at 12:47 UTC

    Hi Win, you can try like this also,

    use strict; while(<DATA>){ chomp; if(($_!~/,/i)||($_=~/,(?= (?:city of|country of))/i)){ print "$_\n" } }

    Punitha

      Anyone,

      Is it possible to incorporate this into the regex I have shown in the question please? One issue I can see with the regex above is that there needs to be a space between the "," and the follow string City of|County of.
Re: Anything that is not ',' except .... RegEx question
by jdporter (Canon) on Feb 27, 2008 at 14:38 UTC
    I am working with a CSV file. However I am not sure if the CSV modules suggested below would get around the problem of commas existing where they shouldn't.

    Why don't you at least try it, then? There's a good chance it will make you problem a lot simpler (because parsing CSV properly is harder than it looks), and in either case, you'll have learned something about the problem and about Perl.

    A word spoken in Mind will reach its own level, in the objective world, by its own weight

      No, it's Win. He'll bull through his way, ignore the advice he asked for, and complain when his way doesn't work and we won't help him make it work. When it eventually dawns on him that he's going about it the wrong way he'll expect people here to hand him a working solution on a silver platter with a buttered scone on the side.

      Update: See? Did I call it or what . . .

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

Re: Anything that is not ',' except .... RegEx question
by theorbtwo (Prior) on Feb 27, 2008 at 14:40 UTC

    Any decent CSV module will deal with commas embedded inside the data sections of CSV files.

    However, to *somewhat* more directly speak to the question you asked, instead of the question you should have asked, I think you've made a basic mistake with regexes. The inside of a [...] is a bunch of chars, not a bunch of strings, and certianly not another regex. [\d|\.] doesn't mean a digit or a dot, it means a digit, a pipe, or a dot. [\d.] would be a digit or a dot.

    Anyway, writing your own CSV parser isn't going to be the best way to solve this problem. Instead, use somebody else's CSV parser, then deal with the data you get back from it.

      I think that something like the following:
      my $line =~ s/, City of/c City of/g; my $line =~ s/, County of/c County of/g; if ( $line =~ m/^([^,]+),([^,]+),[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+,,[\d|\.] ++,[\d|\.]+,[\d|\.]+,[\d|\.]+,,[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+/ ) { my $line =~ s/c City of/, City of/g; my $line =~ s/c County of/, County of/g;
      would be easier.

        And here, ladies and gentlemen we have, once again, our prime exhibit of stupidity and unwillingness to learn. There really is no other like him in the monastery.

        You have doubts? Perhaps you think I am being uncouth and should admonish the poor creature more kindly? I will put your minds at ease with just two simple exhibits:

        Exhibit A (posted by the kindly orbtwo):

        The inside of a [...] is a bunch of chars, not a bunch of strings, and certianly not another regex. [\d|\.] doesn't mean a digit or a dot, it means a digit, a pipe, or a dot. [\d.] would be a digit or a dot.

        To which our valiant Sir Doesntwannaknow replied (Exhibit B):

        ... something like the following: ...
        m/^([^,]+),([^,]+),[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+,,[\d|\.],[\d|\. +]+,[\d|\.]+,[\d|\.]+,,[\d|\.]+,[\d|\.]+,[\d|\.]+,[\d|\.]+/

        I rest my case and fall over laughing.


        All Wins posts are stupid.

        I was greatly entertained by how you ignored every last shred of advice in this topic. You didn't even take the part about [\d|\.] not meaning what you think it means, and used it anyway.

        Either way i would talk to the person sending you this CSV since its not a legal CSV if they are putting data with commas in it without quoting those fields.

        BTW you can store repeated elements of your regex in a named variable. Like:

        my $num = qr/[\d.]+/; if ($line = m/^([^,]+),([^,]+),$num,$num,$num,$num,,$num,$num,$num,$nu +m,,$num,$num,$num,$num/) {

        That way if you want to update that pattern you can update it just once.


        ___________
        Eric Hodges

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://670605]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2014-04-21 13:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (495 votes), past polls