Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Regex result being defined when it shouldn't be(?)

by haukex (Archbishop)
on Nov 14, 2017 at 15:36 UTC ( [id://1203389]=note: print w/replies, xml ) Need Help??


in reply to Regex result being defined when it shouldn't be(?)

I haven't fully evaluated or tested your code, but a couple of comments and, if I understood correctly, the answer to your question:

  • For longer regexes, next to /x as you're already doing, I strongly recommend using different delimiters and especially named capture groups (perlre) and %+, as in:
    my $regex = qr{ (?<foo> fo+ ) }msx; "barfooobar" =~ $regex; print "<",$+{foo},">\n"; # prints "<fooo>"
  • In the regex you showed, if the overall regex matches, then ([@%\$]?) and ([\d]*) will not return undef but the empty string "", because those capture groups will always match at least () (that is, the empty string "").
  • Your expressions like if( defined ($sigil = $1) eq "" ) don't make much sense to me, because you're testing the return value of defined, a boolean value, against the empty string. If you just want to check for definedness, then write if( defined($sigil = $1) ), and if you want the assignment to $sigil to happen only if $1 is defined, then write if( defined $1 ) { $sigil = $1; ...
  • You might be interested in my module Config::Perl ;-)

Replies are listed 'Best First'.
Re^2: Regex result being defined when it shouldn't be(?)
by chenhonkhonk (Acolyte) on Nov 14, 2017 at 16:46 UTC
    P.p.s: After thinking about why I would've been using the quantifiers outside vs inside, separate from maybe capturing only one repetition of a group, I figured it out:

    Alternations. If you wanted a word among multiple choices but only 0-1 times you have a sort of choices:
    (this|that|third_thing)? ((this)?|(that)?|(third_thing)?)
    The first one is pretty clear, I want 0 or 1 of any of those words. It will return undef if I have 0.

    The second one, I don't even trust it. I think I could match all 3 if they happen in a row. Additionally, there's probably 4 capture groups created as a result.

    A quick search on if I had used 'alternation' properly: https://docstore.mik.ua/orelly/perl4/prog/ch05_08.htm
    "When you apply the ? to a subpattern that captures into a numbered variable, that variable will be undefined if there's no string to go there. If you used an empty alternative, it would still be false, but would be a defined null string instead."
      The second one, I don't even trust it. I think I could match all 3 if they happen in a row.

      No, it's fine, it reads like so: Match one of the three choices: "this" or "", "that" or "", or "third_thing" or "". Just like in your first example, the parentheses and alternation operator make sure that it will match only one of the three choices at that place in the regex.

      Additionally, there's probably 4 capture groups created as a result.

      Correct, but you can use non-capturing (?: ) parens to avoid that, i.e. ((?:this)?|(?:that)?|(?:third_thing)?) would make it have only one capturing group, like your first example. <update> And AnomalousMonk made an excellent point about (?| ) here. </update>

      I'd recommend a read of perlrequick, perlretut, and perlre for all of these features and the ones I mentioned earlier. Also, for playing around with regexes and testing out what they do, see my post here.

        I've already read about regexs. From a book. Most of the time each source doesn't explicitly bring up the full exceptions or they use jargon (like alternations.. and not 'or')

        I don't know if I intend offense or not, but comparing something like:
        (?:this)?|(?:that)?|(?:third_thing)?) if( ! $1 ){} #or defined, but why bother #nvm, see below vs $_ = /(this|that|third_thing)?/; if( defined $1 eq "" ){}
        Seems like there's a huge difference on readability, not even getting into when you have many alternations. Even getting rid of eq "".

        And of course "" and "0" are defined but not a true value, so if you were looking for numeric characters, you can't use if($1). I guess I need to try raw values and see how Perl handles \0 in true/false/define settings

        Edit: ARGH: defined(undef) == 0, and defined(undef) eq "", but "" == 0 isn't a numeric comparison, and "" eq 0 is false, as is "" eq "0". undef == 0 is true but produces a warning, while undef eq 0 is false.

        I'm putting those there in case anyone else ever comes across this bonanza of "false" comparisons.
      ((this)?|(that)?|(third_thing)?)
      ...
      ... I don't even trust it. ... there's probably 4 capture groups created as a result.

      Just as an aside, the  (?|(pat)|(te)|(rn)) "branch reset" pattern introduced with Perl version 5.10 will suppress the creation of a slew of captures in a case like this:

      c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $s = 'apathetic'; ;; my @captures = $s =~ m{ (pat) | (te) | (rn) }xms; dd \@captures; ;; @captures = $s =~ m{ (?| (pat) | (te) | (rn)) }xms; dd \@captures; " ["pat", undef, undef] ["pat"]
      See Extended Patterns in perlre.


      Give a man a fish:  <%-{-{-{-<

Re^2: Regex result being defined when it shouldn't be(?)
by chenhonkhonk (Acolyte) on Nov 14, 2017 at 16:18 UTC
    I'm not doing something if it is defined, I'm doing it if it's NOT defined.
    An annoyance to me (I come from a C background) is a variable failing to be defined does NOT return 0 or a 'FALSE' definition, it returns "". For safety reasons and explicitness, I program in the explicit results of tests i.e. defined $var eq "" or defined var ne "". Using simply 'defined $var' and '! defined $var' isn't as clear as what Perl is doing internally.

    If I do print "$3" from a match on 'var = 10' I do not get the same as print "". Regex DO NOT return "" on failing to match, they return undef. After further testing, it appears the difference is where the quantifier comes in:
    use strict; use warnings; my $string = "string"; if( $string m/([5]?)string/ ){ print "? inside group: $1\n"; #prints fine } if( $string m/([5])?string/ ){ print "? outside group: $1\n"; #Use of uninitialized value $1 in c +oncatenation (.) or string... } return 0;
    P.s. the reason I'm doing this manually is because I'm making it as portable as possible and sensible to me. I'm running Perl on Windows 7/8/10, modern Linux, a Debian 2.6.32, etc. Production environment with too many distributions, internal/external network, all that jazz. I already had an issue where a CPAN module I would've liked had some Linux-only make commands.
      An annoyance to me (I come from a C background) is a variable failing to be defined does NOT return 0 or a 'FALSE' definition, it returns "".

      Actually, that's not exactly what is going on. Perl has a special "false" value that is 0 when used in numeric context and "" in a string context, so in Perl if (boolean) and if (!boolean) are actually "explicit" tests for truth and falsehood for functions that return "true" and "false" values (this applies to just about every builtin, of course there are some rare special cases). Have a look at Truth and Falsehood. Once you get used to this, I hope you'll find if (!defined(...)) (or any of its variants like if (not defined(...)) or unless (defined(...))) more natural. At least personally, I was initially confused when I read if ( defined($x = $1) eq "" ), and I thought you might accidentally be misapplying an idiom like if ( (my $x = $1) eq "foo" ) (which does the assignment and then the comparison).

      If I do print "$3" from a match on 'var = 10' I do not get the same as print "". Regex DO NOT return "" on failing to match, they return undef. After further testing, it appears the difference is where the quantifier comes in:

      Right, which is why I left your $3, that is (])?, out of my explanation, and explicitly referred to your $1 (([@%\$]?)), which you were asking about :-)

      ... portable ... I already had an issue where a CPAN module I would've liked had some Linux-only make commands.

      According to CPAN Testers, Config::Perl runs on Linux, MSWin32, Cygwin, Darwin (Mac OS X), and various *BSD, and from Perl versions 5.8.1 thru 5.26.1.

      Update 2019-08-17: Updated the link to "Truth and Falsehood".

      defined $var or equivalently defined($var) will return the integer 1 (which is a TRUE value; 1 is also TRUE in C, so this shouldn't confuse you) if the variable is defined. It will return undef (which is a FALSE value) a FALSE value (see haukex's answer) if the variable is undefined. You then take that value, either 1 or undefthe FALSE value, and stringify it. The integer 1 stringifies into "1". The FALSE value undef stringifies into "". If you don't want undef FALSE to become "", don't stringify. (The eq operator is forcing the stringification on both its arguments.)

      If you really just want a boolean that decides whether the $var is defined or not, just use the truthiness of the result of defined $var -- that is explicitly the boolean test for whether the $var is defined, and the defined $var and !defined $var syntax are explicitly saying "variable is defined" and "variable is not defined". This is similar to C: if you define a function int is_five(int x) { return (x==5); }, then the return value of is_five(var) and !is_five(var) are explicit ways of testing whether or not the variable is 5. From your claim, in C, I would have to write is_five(var)==-1 to verify that var is 5, and is_five(var)==0 to verify that var is not 5, which I vehemently disagree with: that notation obfuscates what c is doing, not clarifies what it's doing internally. Just trust that Perl will do the right thing with boolean expressions in a boolean context, just like you trust that C does the right thing with boolean results in a boolean context.

      if it's the lack of parentheses that are confusing you, then use the parentheses.

      Aside: Urgh... I did one last refresh before hitting create, and saw that haukex beat me by a minute or two again. :-(. I went to all the trouble of writing this up, so I'll hit create anyway.

      update: I was wrong: defined($var) doesn't return undef or 1; it returns the special value, as haukex said.

      c:> perl -le "print defined($x)//'<undef>'; print defined($x)||'untrue +'" untrue c:>

        defined $var ... will return undef (which is a FALSE value) if the variable is undefined.

        Sorry, that's not quite correct, compare the outputs of the following:

        $ perl -wMstrict -MDevel::Peek -le 'my $x = undef; Dump( $x )' $ perl -wMstrict -MDevel::Peek -le 'my $x = !1; Dump( $x )' $ perl -wMstrict -MDevel::Peek -le 'my $x = defined(undef); Dump( $x ) +'

        (Update: Whoops, just saw your update, you saw that yourself)

        The rest of your post is excellent though, so no worries about the duplicated efforts, TIMTOWTDI :-) I especially like the comparison with C, and it makes another important point: comparing a true/false value explicitly is brittle: If defined decided to return 0 as a false value instead, then defined(...) eq "" will break!

        In C and in Perl, the result of a true conditional is 1. EDIT: In Perl, false can be 0 or "" depending on context. Or not. I don't like this. /edit

        When I'm doing checks on a bunch of values, if I used the implicit return from a is_defined() function, I can only have a boolean in response. If I want multiple types of responses I must use an explicit equals. Even in the boolean context, nearly all my conditions have some sort of equality test - even in the case of less than or greater than. To not have that form is an exception when reviewing the code, I have to stop and say "wait, what is the function supposed to be returning? a number? a string? a reference/pointer?".

        You even state that saying is_five()==1 is somehow not intuitive, when that is literally what your is doing, it is checking the truthfulness of whether that number is five.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1203389]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2024-04-24 07:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found