Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

What does a failed regular expression match actually return?

by Anonymous Monk
on May 20, 2020 at 09:29 UTC ( #11116970=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

According to the perlop documentation the regular expression match operator (m//) will return true if it finds a match or false if doesn't. Usually where Perl operators return false they do so by either returning an empty string or 0, however this doesn't appear to be the case for failed regular expression matches.

Normally it doesn't matter as Perl being Perl just does what I intended, but earlier today I encountered a bug when assigning the result of regular expression match as part of initialising a hash - the following code exhibits the issue:

use Data::Dumper; warn Dumper( { a => "a" =~ m/b/, b => 'asdf' } );

Running this code results in:

$VAR1 = { 'asdf' => undef, 'a' => 'b' };

Eventually I realised that the failed match for the regular expression was somehow tricking the first comma to be evaluated in a scalar context, rather than a list context (note that if the regular expression matches then it returns 1 and the comma is evaluated in a list context). With this information I was able to fix the bug I had, but curiosity lead me to dig deeper into what was happening, so tried the following code:

use Data::Dumper; print Dumper( "a" =~ m/a/ ); # outputs 1 print Dumper( "a" =~ m/b/ ); # No Output For This Line! print Dumper( "a" eq "a" ); # outputs 1 print Dumper( "a" eq "b" ); # outputs an empty string ('') print Dumper( "a" !~ m/a/ ); # outputs an empty string ('') print Dumper( "a" !~ m/b/ ); # outputs 1 my $a = "a" =~ m/a/; print Dumper( $a ); # outputs 1 my $b = "a" =~ m/b/; print Dumper( $b ); # outputs an empty string ('')

From running this code it seems that the failed regular expression match is returning something that isn't really a traditional Perl value, but which does evaluate to false in most situations - a sort of "phantom" false value.

Which leaves me with two questions:

  1. What is the actual value returned by a failed regular expression match?
  2. Why does =~ return this "phantom" version of false, while !~ returns the more common empty string version?

Replies are listed 'Best First'.
Re: What does a failed regular expression match actually return? (updated)
by haukex (Chancellor) on May 20, 2020 at 09:36 UTC
    Usually where Perl operators return false they do so by either returning an empty string or 0

    Actually, they return both at the same time.

    What is the actual value returned by a failed regular expression match?

    Please see this table of regular expression return values. print Dumper( "a" =~ m/b/ ); is providing list context to its arguments, so it's the same as writing print Dumper( );

    Why does ... !~ returns the more common empty string version?

    Because "a" !~ m/a/ is the same as !('a' =~ /a/), and that returns Perl's special false value.

    Update: Your initial issue can be solved by scalar or PerlX::Maybe:

    warn Dumper( { a => scalar( "a" =~ m/b/ ), b => 'asdf' } ); use PerlX::Maybe; warn Dumper( { maybe a => "a" =~ m/b/, b => 'asdf' } );
    Eventually I realised that the failed match for the regular expression was somehow tricking the first comma to be evaluated in a scalar context, rather than a list context (note that if the regular expression matches then it returns 1 and the comma is evaluated in a list context).

    Sorry, no, your analysis is not correct - the hash assignment is entirely in list context, in the examples you showed there's no scalar context going on there at all; only in the my $a = ... assignments. Update 2: To nitpick myself a little, the arguments to the =~ operator are in scalar context.

Re: What does a failed regular expression match actually return?
by AnomalousMonk (Bishop) on May 20, 2020 at 12:32 UTC

    You don't say one way or the other, but I'm assuming you did not have warnings enabled in your code. Had you done so, Perl would have given you another clue as to what was going on.

    c:\@Work\Perl\monks>perl -Mstrict -le "use warnings; use Data::Dumper; warn Dumper( { a => 'a' =~ m/b/, b => 'asdf' } ); " Odd number of elements in anonymous hash at -e line 1. $VAR1 = { 'a' => 'b', 'asdf' => undef };
    The "Odd number of elements in anonymous hash ..." warning can be explained as follows:
    As haukex has explained, the expression
        { a => 'a' =~ m/b/,  b => 'asdf' }
    evaluates to
        { 'a', (empty list), 'b', 'asdf' }
    which flattens to
        { 'a', 'b', 'asdf' }
    which gives you the perplexing hash structure.

    Bottom line: always use strict and warnings (if you weren't doing so already), even when you know you don't really need them! :)


    Give a man a fish:  <%-{-{-{-<

      Nit: Returning an empty list simply means "putting no scalars on the stack". It doesn't return something that gets flattened.

Re: What does a failed regular expression match actually return?
by ikegami (Pope) on May 20, 2020 at 20:33 UTC

    In scalar context? A false scalar.

    In list context? Nothing. It returns zero scalars.

    a => 'a' =~ m/b/, b => 'asdf'
    is just a fancy way of writing
    'a', 'a' =~ m/b/, 'b', 'asdf'

    The expression within the hash constructor ({}) is evaluated in list context. Since the above list was evaluated in list context, it's individual elements were evaluated in list context. On a failed match, the above is therefore equivalent to

    'a', 'b', 'asdf'

    Solutions:

    a => scalar( 'a' =~ m/b/ ), b => 'asdf'
    a => !!( 'a' =~ m/b/ ), b => 'asdf'
    a => 'a' =~ m/b/ ? 1 : 0, b => 'asdf'
Re: What does a failed regular expression match actually return?
by perlfan (Priest) on May 21, 2020 at 06:10 UTC
    Perl doesn't have a single false value, it has "falsey" values. Then it has defined, which turns everything but undef into a "truthy" value.

    For example:

    print(( 0 ) ? qq{not false\n} : qq{false\n});# false print(( defined 0 ) ? qq{not false\n} : qq{false\n});# not false print(( '' ) ? qq{not false\n} : qq{false\n});# false print(( defined '' ) ? qq{not false\n} : qq{false\n});# not false print(( undef ) ? qq{not false\n} : qq{false\n});# false print(( defined undef ) ? qq{not false\n} : qq{false\n});# false print(( ' ' ) ? qq{not false\n} : qq{false\n});# not false print(( -1 ) ? qq{not false\n} : qq{false\n});# not false print(( () ) ? qq{not false\n} : qq{false\n});# false print(( [] ) ? qq{not false\n} : qq{false\n});# not false print(( {} ) ? qq{not false\n} : qq{false\n});# not false
    Which leads me to point out that Perl behaves correctly, even when faced with "No Output For This Line!" and '"phantom" version of false'.
    print(( "a" =~ m/a/ ) ? qq{not false\n} : qq{false\n}); # "not false" print(( "a" =~ m/b/ ) ? qq{not false\n} : qq{false\n}); # "false" print(( "a" eq "a" ) ? qq{not false\n} : qq{false\n}); # "not false" print(( "a" eq "b" ) ? qq{not false\n} : qq{false\n}); # "false" print(( "a" !~ m/a/ ) ? qq{not false\n} : qq{false\n}); # "false" print(( "a" !~ m/b/ ) ? qq{not false\n} : qq{false\n}); # "not false"

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11116970]
Approved by marto
Front-paged by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2020-06-05 11:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (37 votes). Check out past polls.

    Notices?