The second one, I don't even trust it. I think I could match all 3 if they happen in a row.
No, it's fine, it reads like so: Match one of the three choices: "this" or "", "that" or "", or "third_thing" or "". Just like in your first example, the parentheses and alternation operator make sure that it will match only one of the three choices at that place in the regex.
Additionally, there's probably 4 capture groups created as a result.
Correct, but you can use non-capturing (?: ) parens to avoid that, i.e. ((?:this)?|(?:that)?|(?:third_thing)?) would make it have only one capturing group, like your first example. <update> And AnomalousMonk made an excellent point about (?| ) here. </update>
I'd recommend a read of perlrequick, perlretut, and perlre for all of these features and the ones I mentioned earlier. Also, for playing around with regexes and testing out what they do, see my post here.
| [reply] [d/l] [select] |
I've already read about regexs. From a book. Most of the time each source doesn't explicitly bring up the full exceptions or they use jargon (like alternations.. and not 'or')
I don't know if I intend offense or not, but comparing something like:
(?:this)?|(?:that)?|(?:third_thing)?)
if( ! $1 ){} #or defined, but why bother #nvm, see below
vs
$_ = /(this|that|third_thing)?/;
if( defined $1 eq "" ){}
Seems like there's a huge difference on readability, not even getting into when you have many alternations. Even getting rid of eq "".
And of course "" and "0" are defined but not a true value, so if you were looking for numeric characters, you can't use if($1). I guess I need to try raw values and see how Perl handles \0 in true/false/define settings
Edit: ARGH: defined(undef) == 0, and defined(undef) eq "", but "" == 0 isn't a numeric comparison, and "" eq 0 is false, as is "" eq "0". undef == 0 is true but produces a warning, while undef eq 0 is false.
I'm putting those there in case anyone else ever comes across this bonanza of "false" comparisons. | [reply] [d/l] |
Seems like there's a huge difference on readability, not even getting into when you have many alternations.
Definitely, but there are some mechanisms to make regexes more readable, such as /x (as you're already using) and the things I mentioned here, including precompiled regexes via qr//, which you can interpolate into other regexes, Building Regex Alternations Dynamically, or even advanced features like (?(DEFINE) ...) (perlre).
my $re1 = qr{ ... }msx;
my $re2 = qr{ ... }msx;
my $big_re = qr{ (?: $re1 | $re2 ) }msx;
so if you were looking for numeric characters, you can't use if($1)
As far as I can tell from what you're written so far, you seem to be very interested in whether a capture group matched something or not. This should make named capture groups, as I mentioned before, more interesting:
use warnings;
use strict;
use Data::Dump qw/dd/; # for debugging
my $re = qr{
^ \s* # beginning of line
(?<name> \w+ ) # the variable name
\s* = \s* # equals
(?: # one of the following (
(?<num> \d+ ) # a number
| # or
(?<str> \w+ ) # a word
) # )
\s* $ # end of line
}msx;
my @lines = split /\n/, <<'SAMPLE_INPUT';
foo=bar
quz = 5
SAMPLE_INPUT
for my $line (@lines) {
$line =~ $re or die "Failed to parse '$line'";
dd \%+; # debug
print "Match! Name: '$+{name}'\n";
if (exists $+{num})
{ print "It was a number: '$+{num}'\n" }
elsif (exists $+{str})
{ print "It was a string: '$+{str}'\n" }
else { die "internal error: neither str nor num" }
}
__END__
{
# tied Tie::Hash::NamedCapture
name => "foo",
str => "bar",
}
Match! Name: 'foo'
It was a string: 'bar'
{
# tied Tie::Hash::NamedCapture
name => "quz",
num => 5,
}
Match! Name: 'quz'
It was a number: '5'
Update: I'm not sure when you made your "Edit" but I didn't see it until later. The explanation for the behavior you are seeing is this (note I'm ignoring overloading here):
Numeric comparisons like ==, !=, >, etc. cause their arguments to be taken as numbers. This means:
- undef is converted to 0 but is subject to a warning.
- "" is not a number so it is subject to a warning, and is converted to 0.
- "0" is converted to 0.
- 0 is already a number and doesn't need to be converted.
- Perl's "false" (!1, including defined(undef)) already has a numeric value of 0, so that is used.
- Perl will attempt to convert any other string into a number, warning if it cannot do so cleanly. The string "0 but true" is special-cased to be exempt from this warning.
String comparisons like eq, ne, gt etc. cause their arguments to be taken as strings. That means:
- undef is converted to "" but is subject to a warning.
- "", "0", and "0 but true" are already strings and don't need to be converted.
- 0 is converted to "0", and of course any other number is stringified.
- Perl's "false" (!1, including defined(undef)) already has a string value of "", so that is used.
This is why "" eq 0 and undef eq 0 are false, because they're both the same as "" eq "0".
See Relational Operators and Equality Operators. As for why you shouldn't use these operators to check boolean values, I've already explained that elsewhere. | [reply] [d/l] [select] |
((this)?|(that)?|(third_thing)?)
...
... I don't even trust it. ... there's probably 4 capture groups created as a result.
Just as an aside, the (?|(pat)|(te)|(rn)) "branch reset" pattern introduced with Perl version 5.10 will suppress the creation of a slew of captures in a case like this:
c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le
"my $s = 'apathetic';
;;
my @captures = $s =~ m{ (pat) | (te) | (rn) }xms;
dd \@captures;
;;
@captures = $s =~ m{ (?| (pat) | (te) | (rn)) }xms;
dd \@captures;
"
["pat", undef, undef]
["pat"]
See Extended Patterns in perlre.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |