Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

This regexp made simpler

by rovf (Priest)
on Apr 25, 2010 at 10:24 UTC ( #836760=perlquestion: print w/ replies, xml ) Need Help??
rovf has asked for the wisdom of the Perl Monks concerning the following question:

I would like to match the following: Match A, optionally followed by any number of characters not containing Z, followed by Z, with the additional restriction that *if* there are characters between A and Z, the character immediately following A must be a space. In case of a match, grab what is between A and Z.

Examples for matching strings:

  • AZ
  • A SOMETHING Z
Example for non-matching strings:
  • ASOMETHINGZ
The obvious solution, as far I can see, goes like this:
if(/^A (?: Z |(\s.*?) ) Z$/x) { $grabbed=$1//'' }
I don't like the repeating of Z in this pattern. Any suggestion for a more elegant way to do this?

Update: I just came up with this:
if(/^A ( \s (?:.*?) )? Z$/x) { $grabbed=$1//'' }
This seems to fulfil the condition. Still, if you have nice alternatives, I would like to see them. Plus, is there a way to write the regexp so that $1 is always defined in case of a match? Right now it is undef if the string to match is AZ (that's why I have to use the // operator).
-- 
Ronald Fischer <ynnor@mm.st>

Comment on This regexp made simpler
Select or Download Code
Re: This regexp made simpler
by FunkyMonk (Canon) on Apr 25, 2010 at 10:42 UTC
    Not extensively tested, but does
    my @strings = ('AZ', 'A SOMETHING Z', 'ASOMETHINGZ', 'A Z', 'A ZZ', ' +AA ZZ', 'AAZZ'); for (@strings) { if (/A( .*?)?Z/) { my $grabbed = $1 // ''; say "'$_' grabbed '$grabbed'"; } else { say "'$_' did not match" } } __END__ 'AZ' grabbed '' 'A SOMETHING Z' grabbed ' SOMETHING ' 'ASOMETHINGZ' did not match 'A Z' grabbed ' ' 'A ZZ' grabbed ' ' 'AA ZZ' grabbed ' ' 'AAZZ' grabbed ''

    do what you want?

    Update

    What should 'A ZZ', 'AAZZ' and 'AA ZZ' match? (added these as test cases)


    Unless I state otherwise, all my code runs with strict and warnings

      Contrary to my interpretation of the requirements of the OP, both your regex and the updated regex of rovf's OP allow a 'Z' between the first 'A' and the final 'Z', and also still need to have an undefined  $1 rationalized to an empty string.

      What should 'A ZZ', 'AAZZ' and 'AA ZZ' match? (added these as test cases)
      They would (and should) not match at all...


      -- 
      Ronald Fischer <ynnor@mm.st>
        I should have taken more notice of the anchors in your OP :(
        Updating my post to accommodate the anchors and your update:
        my @strings = ('AZ', 'A SOMETHING Z', 'ASOMETHINGZ', 'A Z', 'A ZZ', ' +AA ZZ', 'AAZZ', 'A Z'); for (@strings) { if (/^A( [^Z]*)?Z$/) { my $grabbed = $1 // ''; say "'$_' grabbed '$grabbed'"; } else { say "'$_' did not match" } } __END__ 'AZ' grabbed '' 'A SOMETHING Z' grabbed ' SOMETHING ' 'ASOMETHINGZ' did not match 'A Z' grabbed ' ' 'A ZZ' did not match 'AA ZZ' did not match 'AAZZ' did not match 'A Z' grabbed ' '


        Unless I state otherwise, all my code runs with strict and warnings
Re: This regexp made simpler
by AnomalousMonk (Abbot) on Apr 25, 2010 at 11:02 UTC
    >perl -wMstrict -le "for (@ARGV) { if(/^A (?: Z | (\s.*?)) Z$/x) { my $grabbed = $1 // ''; print qq{matched '$_' grabbed '$grabbed'}; } } " AZ AZZ AXZ "A SOMETHING Z" ASOMETHINGZ matched 'AZZ' grabbed '' matched 'A SOMETHING Z' grabbed ' SOMETHING '

    I wonder why it is necessary to match something like 'AZZ' and yet grab an undefined value from it, which must later be rationalized Update: to an empty string. (Additionally, the regex Update: first regex of the OP does not match 'AZ', which seems to be required by the OP.)

    Wouldn't it make more sense only to grab stuff from strings that match? E.g., "if there is anything between A and Z, it must begin with a space and be followed by zero or more non-Z characters". (Has the advantage of matching 'AZ', no  defined test needed.)

    >perl -wMstrict -le "for (@ARGV) { if(/^A ((?: \s [^Z]*)?) Z$/x) { print qq{matched '$_' grabbed '$1'}; } } " AZ AZZ AXZ "A ZZ" "A SOMETHING Z" ASOMETHINGZ "A Z" "A Z" matched 'AZ' grabbed '' matched 'A SOMETHING Z' grabbed ' SOMETHING ' matched 'A Z' grabbed ' ' matched 'A Z' grabbed ' '

    Updates:

    1. However, the 'Z' still needs to be repeated in the regex! Oh, well...
    2. Added "A Z" and "A  Z" test cases to my solution.

      I wonder why it is necessary to match something like 'AZZ' and yet grab an undefined value from it.
      Good point. This made me rethink my problem. In my case, the grabbed part is not really kept in a variable (I wrote it in that way in the hope to make the whole posting simpler), but within a substituion (to be precise, an insertion): I need to change a text AXZ into AXIZ, where the X is optional. In otherwords, I have to insert I in front of the Z, so in the substitution I use

      s/..../A$1IZ/
      , and if I know that $1 is always defined, I don't have to care about interpolating an undefined value. In hindsight, I now see that I should better have written

      s/^(A(?:\s.*?)?(Z))/$1I$2/
      . :-(
      -- 
      Ronald Fischer <ynnor@mm.st>
Re: This regexp made simpler
by BrowserUk (Pope) on Apr 25, 2010 at 11:08 UTC

    printf( "\n$_: " ), m[A( [^Z]+)Z] and print "'$1'" for 'AZ', 'A SOMETHING Z', 'ASOMETHINGZ', 'A Z';; AZ: A SOMETHING Z: ' SOMETHING ' ASOMETHINGZ: A Z: ' '

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      But that doesn't match 'AZ', which the OP seems to require, and also doesn't match 'A Z' (single space between first and final characters), which also seems to be required.

        A simple variation fixes that:
        /A( [^Z]*)?Z/

        It surprises me how many monks in this thread seem to think that expressing the "no Z between ..." condition with .*? is a good idea.

        Ah yes, missed that. Maybe this improved version.

        printf( "\n$_: " ), m[A( [^Z]*|)Z] and print "'$1'" for 'AZ', 'A SOMETHING Z', 'ASOMETHINGZ', 'A Z', 'A Z';; AZ: '' A SOMETHING Z: ' SOMETHING ' ASOMETHINGZ: A Z: ' ' A Z: ' '

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: This regexp made simpler
by rubasov (Friar) on Apr 25, 2010 at 12:48 UTC
    More variations on the theme (if I got it right - the Z is repeated though). The second one does not use captures at all.
    while (<DATA>) { print; #s/^A(|\s[^Z]*)Z$/A$1IZ/; s/^A(?:|\s[^Z]*)\K(?=Z$)/I/; print; } __DATA__ AZ AZZ A SOMETHING Z ASOMETHINGZ A Z A ZZ AAZZ AA ZZ
Re: This regexp made simpler
by Marshall (Prior) on Apr 27, 2010 at 21:26 UTC
    Another way to go using rubasov's data set plus a completely illegal line (ZA). The below is more "wordy" than other solutions, but I think what it does and how it does it is clear. If for example, RESULT=" " should be disallowed, there is a clear place to do that modification.
    #!/usr/bin/perl -w use strict; while (<DATA>) { chomp; my $result = is_match($_); defined($result) ? print "$_:\tRESULT=\"$result\"\n" : print "$_:\tRESULT=NO MATCH\n"; } sub is_match { my $term = shift; my $inner = ($term =~ m/^A(.*)Z$/)[0]; return undef if (!defined($inner)); return $inner if $inner eq ""; return $inner if $inner =~ m/^\s/; return undef; } =prints: AZ: RESULT="" AZZ: RESULT=NO MATCH A SOMETHING Z: RESULT=" SOMETHING " ASOMETHINGZ: RESULT=NO MATCH A Z: RESULT=" " A ZZ: RESULT=" Z" AAZZ: RESULT=NO MATCH AA ZZ: RESULT=NO MATCH ZA: RESULT=NO MATCH =cut __DATA__ AZ AZZ A SOMETHING Z ASOMETHINGZ A Z A ZZ AAZZ AA ZZ ZA
    Update: I looked at the OP's spec again and it appears that this tweaking of is_match() would be better?:
    sub is_match { my $term = shift; my $inner = ($term =~ m/^A(.*)Z$/)[0]; return undef if (!defined($inner)); return undef if $inner eq ""; return $inner if $inner =~ m/^\s+\S/; return undef; } prints:..... AZ: RESULT=NO MATCH AZZ: RESULT=NO MATCH A SOMETHING Z: RESULT=" SOMETHING " ASOMETHINGZ: RESULT=NO MATCH A Z: RESULT=NO MATCH A ZZ: RESULT=" Z" AAZZ: RESULT=NO MATCH AA ZZ: RESULT=NO MATCH ZA: RESULT=NO MATCH

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://836760]
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2014-12-20 17:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (97 votes), past polls