Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: reg expression question

by Athanasius (Archbishop)
on Jan 29, 2015 at 08:02 UTC ( [id://1114879]=note: print w/replies, xml ) Need Help??


in reply to Re^2: reg expression question
in thread reg expression question

These special cases are fairly easy to accommodate:

  1. To prevent 4 consecutive digits from being wrongly identified as a year, specify that the digits occur at the start of the right-hand string:

    if (my @m = $right =~ /^\d{2}(\d{2})(.*)/) # ^ Add this

    Within a regex, the special character ^ means “match at the start of the line.”

  2. To remove spaces, use the substitution operator (with the /g modifier for global replacement):

    $right =~ s/\s+//g;
  3. As for 2.

  4. To prevent the string from ending in a dash, use the substitution operator again:

    $right =~ s/-$//;

    $ is another special regex character: it means “match at the end of the line.”

See “Metacharacters” in perlre.

Note the value of using a test-driven approach: I was able to add the 4 new input/output pairs to %data, make changes to get the new test cases to pass, and know that these modifications did not invalidate the original solution (because the 4 original test cases still pass).

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^4: reg expression question
by Anonymous Monk on Jan 29, 2015 at 08:06 UTC

      You are right, when used in a regex without an /m modifier, ^ means “match the beginning of the string,” and is equivalent to \A. “Match at the beginning of the line” (which is the definition given in perlre#Regular-Expressions) is strictly correct only when ^ is used in a regex with an /m modifier. So maybe this part of the documentation should be re-worded?

      I haven’t used Perl::Critic, and as to this particular policy — well, I’m suspicious of guidelines that say “always do X” regardless of context. In the present case, adding an /m modifier to the regex would, IMO, be misleading, as it would imply (or at least suggest) that the string being matched is expected to contain multiple newlines. But I acknowledge that this is a judgment call, and YMMV.

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        It is this "Without the /whatever modifier, whatever means ...; with the modifier it means..." locution that is the rationale for the kneejerk practice of always using the /xms constellation of modifiers with every regex regardless of whether  . ^ $ appear in the regex or not, and regardless of the fact that /x arguably makes it a bit more messy to deal with whitespace.

        I have adopted this practice, first encountered in the (in)famous PBP of TheDamian, because for me, regexes are hard enough without the further confusion of unnecessary degrees of freedom. Now, what does  . (dot) match? Everything! YBPMV.

        Update: See also re '/flags' mode, but I tend to avoid this in PM examples because it only appeared with Perl version 5.14 and has too narrow familiarity.


        Give a man a fish:  <%-(-(-(-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1114879]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-23 23:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found