Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: Regex result being defined when it shouldn't be(?)

by chenhonkhonk (Acolyte)
on Nov 14, 2017 at 17:33 UTC ( [id://1203408]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Regex result being defined when it shouldn't be(?)
in thread Regex result being defined when it shouldn't be(?)

I've already read about regexs. From a book. Most of the time each source doesn't explicitly bring up the full exceptions or they use jargon (like alternations.. and not 'or')

I don't know if I intend offense or not, but comparing something like:
(?:this)?|(?:that)?|(?:third_thing)?) if( ! $1 ){} #or defined, but why bother #nvm, see below vs $_ = /(this|that|third_thing)?/; if( defined $1 eq "" ){}
Seems like there's a huge difference on readability, not even getting into when you have many alternations. Even getting rid of eq "".

And of course "" and "0" are defined but not a true value, so if you were looking for numeric characters, you can't use if($1). I guess I need to try raw values and see how Perl handles \0 in true/false/define settings

Edit: ARGH: defined(undef) == 0, and defined(undef) eq "", but "" == 0 isn't a numeric comparison, and "" eq 0 is false, as is "" eq "0". undef == 0 is true but produces a warning, while undef eq 0 is false.

I'm putting those there in case anyone else ever comes across this bonanza of "false" comparisons.

Replies are listed 'Best First'.
Re^5: Regex result being defined when it shouldn't be(?) (updated)
by haukex (Archbishop) on Nov 14, 2017 at 18:10 UTC
    Seems like there's a huge difference on readability, not even getting into when you have many alternations.

    Definitely, but there are some mechanisms to make regexes more readable, such as /x (as you're already using) and the things I mentioned here, including precompiled regexes via qr//, which you can interpolate into other regexes, Building Regex Alternations Dynamically, or even advanced features like (?(DEFINE) ...) (perlre).

    my $re1 = qr{ ... }msx; my $re2 = qr{ ... }msx; my $big_re = qr{ (?: $re1 | $re2 ) }msx;
    so if you were looking for numeric characters, you can't use if($1)

    As far as I can tell from what you're written so far, you seem to be very interested in whether a capture group matched something or not. This should make named capture groups, as I mentioned before, more interesting:

    use warnings; use strict; use Data::Dump qw/dd/; # for debugging my $re = qr{ ^ \s* # beginning of line (?<name> \w+ ) # the variable name \s* = \s* # equals (?: # one of the following ( (?<num> \d+ ) # a number | # or (?<str> \w+ ) # a word ) # ) \s* $ # end of line }msx; my @lines = split /\n/, <<'SAMPLE_INPUT'; foo=bar quz = 5 SAMPLE_INPUT for my $line (@lines) { $line =~ $re or die "Failed to parse '$line'"; dd \%+; # debug print "Match! Name: '$+{name}'\n"; if (exists $+{num}) { print "It was a number: '$+{num}'\n" } elsif (exists $+{str}) { print "It was a string: '$+{str}'\n" } else { die "internal error: neither str nor num" } } __END__ { # tied Tie::Hash::NamedCapture name => "foo", str => "bar", } Match! Name: 'foo' It was a string: 'bar' { # tied Tie::Hash::NamedCapture name => "quz", num => 5, } Match! Name: 'quz' It was a number: '5'

    Update: I'm not sure when you made your "Edit" but I didn't see it until later. The explanation for the behavior you are seeing is this (note I'm ignoring overloading here):

    • Numeric comparisons like ==, !=, >, etc. cause their arguments to be taken as numbers. This means:

      • undef is converted to 0 but is subject to a warning.
      • "" is not a number so it is subject to a warning, and is converted to 0.
      • "0" is converted to 0.
      • 0 is already a number and doesn't need to be converted.
      • Perl's "false" (!1, including defined(undef)) already has a numeric value of 0, so that is used.
      • Perl will attempt to convert any other string into a number, warning if it cannot do so cleanly. The string "0 but true" is special-cased to be exempt from this warning.
    • String comparisons like eq, ne, gt etc. cause their arguments to be taken as strings. That means:

      • undef is converted to "" but is subject to a warning.
      • "", "0", and "0 but true" are already strings and don't need to be converted.
      • 0 is converted to "0", and of course any other number is stringified.
      • Perl's "false" (!1, including defined(undef)) already has a string value of "", so that is used.

      This is why "" eq 0 and undef eq 0 are false, because they're both the same as "" eq "0".

    See Relational Operators and Equality Operators. As for why you shouldn't use these operators to check boolean values, I've already explained that elsewhere.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1203408]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2025-11-13 08:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (68 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.