Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^4: example of 'm / / m' related example and compare to 'm / / s'

by Anonymous Monk
on Nov 29, 2011 at 19:17 UTC ( [id://940684]=note: print w/replies, xml ) Need Help??


in reply to Re^3: example of 'm / / m' related example and compare to 'm / / s'
in thread PERL regex modifiers for m//

Hm. That's like advocating always taking a swimsuit & sunblock and a raincoat & umbrella cos it saves listening to the weather forecast.

/m and /s don't add functionality. They modify functionality.

So using /ms is more like rewiring your oddly designed radio so that the tuning dial and the volume control actually work as you expect...thereby—for example—enabling you to successfully listen to weather forecasts.

The very reason it is hard, even for long-time Perlers with scads of frequent regex user miles, to remember which (/s /m) does what, is because they are so rarely required.

I'd argue that they are often required, just rarely used correctly.

In my experience, matching start- and end-of-line is far more commonly needed that matching start- and end-of-string. The default behaviour is wrong practically every time anyone has to deal with multi-line data.

Likewise, the vast majority of .* instances I see in deployed code are being used as "match anything", which they don't.

BTW, it's easy to remember which is which: /s alters the behaviour of a single metacharacter (.) whereas /m alters the behaviour of multiple metacharacters (^ and $).

By using them everywhere they become the norm

Yes, that's precisely the point. The modified behaviours they provide should have been the norm from the start.

after a while people stop asking themselves why is he using that here. And that is bad.

Except that using them everywhere actually makes regexes work the way most people mistakenly think they already work. So even if they don't ask themselves why, they still get the "expected" behaviour.

In other words, using /ms consistently on regexes makes the (idiosyncratic) behaviour of regexes conform to people's (reasonable) expectations, rather than vice versa. It's a simple technique that fixes an infelicity in Perl 5. And that's why PBP recommends it.

Damian

Replies are listed 'Best First'.
Re^5: example of 'm / / m' related example and compare to 'm / / s'
by BrowserUk (Patriarch) on Nov 29, 2011 at 19:48 UTC

    Sorry, but I respectfully disagree.

    In my experience, matching start- and end-of-line is far more commonly needed that matching start- and end-of-string. The default behaviour is wrong practically every time anyone has to deal with multi-line data.

    I'll bet you 100 hours of my time on any (on-line accessible) project of your choosing, that if we do a survey of the regex uses on this site, not only will most of them be targeted at single line strings, an overwhelming majority will be targeted at single line strings.

    For sake of putting a number on overwhelming" let's say 10 single line uses to every one multi-line. I'd probably be quite happy to go to 20 to 1 if it would sway you into accepting the bet.

    You might find a slightly reduced ratio if you searched CPAN, but I doubt it would be by much.

    And once you squash the idea that matching against multi-line strings is the norm, giving away the heads-up that seeing those options explicitly stated should give the programmer, in favour of cargo-culting a 'throw it all in there cos it probably won't cause any problems' mandate, is a really bad idea in my book. In preference to asking the programmer to look up the documentation when they need it is dangerous.

    Every time educationalists have tried to "simplify the learning process", by dumbing down, it has increased the pass rate but also wholly devalued it. There's no point in having more people pass if they don't understand how to apply what they've learnt.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I'll bet you 100 hours of my time on any (on-line accessible) project of your choosing, that if we do a survey of the regex uses on this site, not only will most of them be targeted at single line strings, an overwhelming majority will be targeted at single line strings.

      No bet. I don't doubt that most regexes uses are single-line oriented. I never claimed otherwise. What I said was that people far more often use ^ as start-of-line instead of start-of-string.

      But that's what makes the defaults of ^ and $ so unfortunate. Because they happen to work okay most of the time (i.e. line-by-line), they only bite people when those people attempt something less usual and more intrinsically difficult (such as multiline parsing).

      And once you squash the idea that matching against multi-line strings is the norm, giving away the heads-up that seeing those options explicitly stated should give the programmer, in favour of cargo-culting a 'throw it all in there cos it probably won't cause any problems' mandate, is a really bad idea in my book. In preference to asking the programmer to look up the documentation when they need it is dangerous.

      Yes, that's fine for good programmers, such as yourself. But the problem is that most programmers don't know they need those options. They think regexes already work as if /s and /m are already on.

      Every time educationalists have tried to "simplify the learning process", by dumbing down, it has increased the pass rate but also wholly devalued it. There's no point in having more people pass if they don't understand how to apply what they've learnt.

      This has nothing to do with simplifying any learning process. It has to do with making Perl work better (and, in particular, work better with the weaknesses and blindspots of human nature). I have argued that habitual use of /xms does that. You disagree. That's your right and privilege.

      However, the fact that Perl 6 has (the equivalent of) /s on by default, and also does away with /m by offering separate always-on start-of-line/end-of-line anchors suggests that I'm not alone in believing that permanent /ms is the more appropriate default.

      Damian

        But that's what makes the defaults of ^ and $ so unfortunate. Because they happen to work okay most of the time (i.e. line-by-line), they only bite people when those people attempt something less usual and more intrinsically difficult (such as multiline parsing).

        Without the options, any attempt to use a regex to match a multi-line string will fail early and obviously. With the options, you might get away without the understanding of what they do for a while, but eventually your misunderstanding will bite you, but instead of being immediately obvious, it will likely become a mysterious and difficult to debug transient failure.

        Personally, I'd much rather that I got bitten by my misunderstandings the first time, or the first few times, I tried to do something that exposed that misunderstanding, than have only have it come to light when my cargo-culting mysteriously fails to match my actual requirements.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://940684]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-03-28 12:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found