Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Regular expression double grouping negation headache

by JayBonci (Curate)
on Jun 29, 2002 at 12:21 UTC ( #178232=perlquestion: print w/replies, xml ) Need Help??

JayBonci has asked for the wisdom of the Perl Monks concerning the following question:

Hello kind monks. I am writing a quick and dirty script to do array command line processing. I want it to be able to accept arguments in the form of:

args.pl FOO=BAR TEST=VISION EVERYTHING=TWO

I'm pleased to say that it works with this script:
my $defaults = {}; %defaults = map {(($_ =~ /([^\=]+)\=([^\s]+)/)?("$1" => "$2"):())} @AR +GV; print join(",", keys %defaults);
This is a build script for the curious. I do NOT want to use a module, because I'm not going to learn why this script isn't working, so I thank you, but please none of those suggestions. This is for theory at this point. Now, I want to add in the functionality to be able to take backslashed spaces off of the command line (as in a directory with spaces in it, or other sort of item, or anything else to make it more extensible). Like:

args.pl QUICK=BROWN\ FOX JUMPED=OVER\ THE\ LAZY DOG

Validly finding QUICK and JUMPED as keys, but ignoring DOG.
However, I can't seem to get it to work with the regular expression. Boiling it down:
$_ =~ /([^\=]+)\=([^\s]+)/
is where this hell lies. The way I'm thinking of it, I see it as:
  • At least one non equals characters, grouped
  • An equals character
  • At least one non-whitespace character, grouped
So, substituting this in a couple of ways:
$_ =~ /([^\=]+)\=([[^\s]|[\\\s]]+)/)
  • At least one non equals characters, grouped
  • An equals character
  • At least one (non-whitespace or backslashed whitespace character), grouped
That didn't work

Neither did:
$_ =~ /([^\=]+)\=([^[[^\\]\s]]+)/
Same as above except the last is:
    ....
  • At least one (non- (non-backslashed whitespace)), grouped
This has the disadvantage of possibly being outright wrong (matching two characters?), but it's something I gave a shot to. So before I take up the shotgun and going blasting into the night for answers, can someone point me in the right direction? Am I looking for simply getting my negations right, or I am looking for more powerful regular expression gear. Thanks a bunch.

    --jb

Replies are listed 'Best First'.
Re: Regular expression double grouping negation headache
by dreadpiratepeter (Priest) on Jun 29, 2002 at 14:10 UTC
    Ok, color me confused.
    Your script as written dones exactly what you claim that it doesn't when I run it (under linux). However, your output fails to expose the flaw in your parsing. The script throws away some of the values. You only capture the value up to the first space, but spaces are a valid character in the value.
    I simplified your expression a little. I removed unnessasary backslashes, parens, and quotes and I replaced [^\s] with \S. The new regex is:
    %defaults = map {/([^=]+)=(\S+)/?($1=>$2):()} @ARGV;

    Since you already know that (thanks to the shells processing of your arguments) that everything to the right of the equals is a valid part of the value you can correct the problem by simplifying to this:
    %defaults = map {/([^=]+)=(.*)/?($1=>$2):()} @ARGV;
    And, I would also simplify the left hand side to be:
    %defaults = map {/(.*?)=(.*)/?($1=>$2):()} @ARGV;
    And, actually, you can takes advantage of the fact that calling a regex in list context returns the submatches as a list and use:
    %defaults = map {(/(.*?)=(.*)/)} @ARGV;
    Hope this helps

    -pete
    "Pain heals. Chicks dig scars. Glory lasts forever."
      Okay, so we've got;

      %defaults = map {(/(.*?)=(.*)/)} @ARGV;
      Which is cute. Except it fails under some conditions. Let's say you want to give the variable 'foo' the value of 'moo shoo= coo'. I'd assume it would be written 'foo=moo shoo\= coo'. Except your example doesn't allow for this.

      So I tried a little, and this is what I got;

      I was working from the param line as a single scalar variable for simplicity, and my first efforts seemed moderately successful;

      $line = 'foo=moo sho=clue\ woo\=moo'; @pairs = split(/(?<!\\)\s/, $line);


      However, this system has the problem of only checking whether the previous character is a backslash - it doesn't allow for backslashed backslahes. The expression would need to understand that '\\=' signified a backslash and then a non-escape literal. But it couldn't just check for two slashes and cancel, for it should accept '\\\=' as a a backslashed backslash and a backslashed equals symbol. It would, effectively, need to look behind for an even number of slashes, and only slash on that.

      I had;
      @pairs = split(/(?<!(\\\\+))\s/, $line); Which seemed perfect. Except we're not allowed variable length lookbehind, because it's not been implemented yet. Which was very annoying to find out. So, we could match the proceeding backslashes normally, but strap them back on. But that would be horrible. I have tried further routes, but found nothing. *sigh*

      tlhf
      xxx
        No, it would be written foo=moo\ shoo=coo and will be parsed right by my code. The big limitation is that you can't have an = in the key part but I can't see that being much of a problem. And actually an assertion would handle that case.

        -pete
        "Pain heals. Chicks dig scars. Glory lasts forever."
        Let's say you want to give the variable 'foo' the value of 'moo shoo= coo'. I'd assume it would be written 'foo=moo shoo\= coo'.

        I would think that would be foo=moo\ shoo=coo in a Unixish shell and "foo=moo shoo=coo" using an MSish one. In either case /(.*?)=(.*)/ does the job just fine since it will unconditionally use the first equals sign as the delimiter of the variable name, and then just slurp the rest of the string as the value, equals signs or not.

        An inadquacy does therefor arise when one wants an equals sign in the variable name, but I don't know if allowing for that special case is even desired, and even if so whether it's worth going through the tremendous pain of handling escapes properly (a problem if I've repeatedly broken my teeth on - it's not possible without a true, if simple, parser).

        Makeshifts last the longest.

(jeffa) Re: Regular expression double grouping negation headache
by jeffa (Bishop) on Jun 29, 2002 at 14:00 UTC
    I bashed out some regexes to no avail - not an easy problem (well unless you are japhy or I0 - UPDATE: or The dreadpiratepeter ... man that was so simple!). Couple of suggestions if you want to continue this route:
    1. [^\s] is the same thing as \S
    2. you might be better off joining @ARGV into a string:
      $_ = join ' ', @ARGV; my %defaults; $defaults{$1} = $2 while /([^\=]+)\=(\S+)/g;
      but that doesn't work for escaped spaces ...
    After getting tired of trying the regex way, i opted for something different:
    my %defaults = (); my ($k,$v); for (@ARGV) { if (($k,$v) = split('=',$_,2)) { $defaults{$k} = $v; } else { $defaults{$k} += " $v"; } }
    This worked on the following execution:
    ./foo.pl QUICK=BROWN\ FOX JUMPED=OVER\ THE\ LAZY\ DOG
    
    Not that each space is escaped except the one before a 'key', otherwise this does not work.

    Now, go ye forth and use a CPAN module to do this! ;) At least break one open and see how they solve the problem, which should hint at the complexity involved: what about multiple options? flag options? what about dashes in front of the keys? etc. Learning is good, but sometimes getting stuff done is better.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Regular expression double grouping negation headache
by BrowserUk (Pope) on Jun 29, 2002 at 14:22 UTC

    I'm probably way off here but are you certain that the command line processing isn't interferring here?

    Using your exact script on my win32 system, typing:

    C:\test>178232.pl "QUICK=BROWN FOX" "JUMPED=OVER THE LAZY" DOG QUICK,JUMPED

    Of course, the brain-dead CMD required me to quote the parameters with spaces, but the fact that it then works could mean that its your CLP that is interfering here.

    Try inserting:

    local($,=",",$\="\n"); # << Corrected typo/brain fade print @ARGV;

    at the top of your prog and see what your getting in?

    BTW. Is there any specific reason you are escaping the "=" with backslashes? Your regex seems to work fine without them.

    update: correct minor typo. (And again!) Ditto!.

      local(",=",","$\"="\n");
      Would you please break this down for me? I don't understand it is supposed to do.

      ---
      "A Jedi uses the Force for knowledge and defense, never for attack."

        It should (of course) have been (and now is:)

        local ($,=",", $\="\n");

        Thanks for drawing it to my attention.

Re: Regular expression double grouping negation headache
by flocto (Pilgrim) on Jun 30, 2002 at 09:12 UTC

    Maybe I am missing the point here, but if you were using split you wound't have to use any of these regexen. (m/(.+?)=(.+)/ as seen above isn't a nice way of dealing with this..) To abstract your problem: You want to eat up everything until an equal sign, and then the rest of the parameter being the key; If there is no equal sign take the entire argument as a key without value. Here's my solution:

    my %d; foreach (@ARGV) { my ($k, $v) = split (m/=/, $_, 2); $d{$k} = (defined $v ? $v : ""); } #this is for checking only.. foreach (keys %d) { print "$_:" . $d{$_} . "\n"; }

    And it does the job.. If you would rather stick to your own code, you might modify it to liik like this:

    my $defaults = {}; %defaults = map {(($_ =~ /([^\=]+)\=(.+)/)?($1 => $2):($_ => ''))} @AR +GV; print join(",", keys %defaults);

    Regards,
    -octo-

      That was my original thought (connection problems last night prevented me from posting it though)
      my %defaults = map {/=/ ? split(/=/,$_,2) : ()} @ARGV;

      -Blake

Re: Regular expression double grouping negation headache
by Ay_Bee (Monk) on Jun 30, 2002 at 15:58 UTC
    I normally hesitate to post my thoughts in such learned company, but here goes.
    under Win32 perl test.pl QUICK=BROWN\ FOX JUMPED=OVER\ THE\ LAZY DOG produced an @ARGV of $ARGV[0] = 'QUICK=BROWN\\'; $ARGV[1] = 'FOX'; $ARGV[2] = 'JUMPED=OVER\\'; $ARGV[3] = 'THE\\'; $ARGV[4] = 'LAZY'; $ARGV[5] = 'DOG'; So to achieve the result required I tried my $line = join ' ', @ARGV; $line =~ s/\s(?=\S+=)/\t/g; @pairs = split(/\t/, $line); %defaults = map {/([^=]+) # everything up to the = sign into + $1 = # the = sign ((\S+\\\s)+ # all words followed by "\ " \S+) # word after the last "\ " /x?($1=>$2):()} @pairs; for ( keys %defaults) { print $_,"=",$defaults{$_},"\n"; }; which produced QUICK=BROWN\ FOX JUMPED=OVER\ THE\ LAZY
    Ay_Bee
    -_-_-_-_-_-_-_-_-_-_-_- My memory concerns me - but I forget why !!!
Re: Regular expression double grouping negation headache
by meonkeys (Chaplain) on Aug 17, 2004 at 19:06 UTC
    Just an afterthought: Getopt::Declare could be used for command-line argument processing.
    #!/usr/bin/perl -w use strict; use Getopt::Declare; my $args = new Getopt::Declare <<'EOSPEC' or die; [strict] FOO <value> first parameter TEST <number> second parameter EVERYTHING <boolean> third parameter EOSPEC for (qw(FOO TEST EVERYTHING)) { print "$_ ... ".$args->{$_}."\n" if defined($args->{$_}); }

    ---
    "A Jedi uses the Force for knowledge and defense, never for attack."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://178232]
Approved by wil
Front-paged by wil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2021-10-22 10:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My first memorable Perl project was:







    Results (85 votes). Check out past polls.

    Notices?