Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Best practice or cargo cult?

by robinbowes (Beadle)
on Jun 20, 2006 at 14:37 UTC ( #556406=perlquestion: print w/ replies, xml ) Need Help??
robinbowes has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I commented on the qpsmtpd mailing list recently about regex best practices as recommended in Conway's book.

Specifically:

- always use /x
- always use /m
- always use /s

I got a response saying that using these regex switches all the time was "cargo cult programming".

I disagree - I think it's good practice.

Thoughts? Comments?

R.

--

Robin Bowes | http://robinbowes.com

Comment on Best practice or cargo cult?
Re: Best practice or cargo cult?
by davorg (Chancellor) on Jun 20, 2006 at 14:47 UTC

    As I understand it, these options will all effectively be turned on by default in the Perl 6 regex engine. So either Larry has decided that they are, in fact, best practice or Damian has sneaked them into the specs whilst Larry wasn't watching.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      As I understand it, these options will all effectively be turned on by default in the Perl 6 regex engine. So either Larry has decided that they are, in fact, best practice or Damian has sneaked them into the specs whilst Larry wasn't watching.

      Firstly, Perl 6 is not Perl 5.

      Secondly, Perl 6 gives you \N, a convenient way to write <-[\n]> (that's [^\n]). It's worse than ., but acceptable. Writing [^\n] all the time is a hard exercise for one's fingers, and makes for messy code. That's why I strongly believe you should only use /s when you really want . to include the newline character.

      /m won't be turned on by default in Perl 6. Instead, we get different metacharacters for begin/end of line versus string. So again it gives best of BOTH worlds.

      As for /x... I have no strong opinion about that. I don't think /\A\d+\z/ is unreadable, but I don't mind /\A \d+ \z/x at all.

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

        \N could easily be added to blead. Ill check into it.

        ---
        $world=~s/war/peace/g

Re: Best practice or cargo cult?
by Herkum (Parson) on Jun 20, 2006 at 14:50 UTC

    While xms may not always be needed, that does not mean you should not bother using it UNTIL you need it. You will become confused when you forget to use a switch, wasting time debugging becuase you used x instead of xms.

    If you use xms all the time, you will be consistent in your code and your expectations.

      While xms may not always be needed, that does not mean you should not bother using it UNTIL you need it. You will become confused when you forget to use a switch, wasting time debugging becuase you used x instead of xms. If you use xms all the time, you will be consistent in your code and your expectations.

      Using /xms by default only turns things around. It doesn't fix anything, except apparently many of Damian's regexes. I don't have any global statistics, but I do know for sure than in my code, I don't need /m and /s in more than 95% of all of my regexes.

      /xms on by default just changes the default. Instead of turning flags on when you need them, you start turning them off when you don't need them. And you're caught in exactly the same debugging thing. Not that I ever spent a second debugging this, though: I'm very clear about what I expect. When I write /s, that's a clear indication of how I expect Perl to handle my ., and when I don't write /s, it's clear that I wanted the other thing.

      /m communicates to the reader of the code that the string is conceptually multi-line (as opposed to, for example: filenames, XML tags, etc...). Well, it used to, before PBP spread this nonsense.

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Re: Best practice or cargo cult?
by rinceWind (Monsignor) on Jun 20, 2006 at 14:56 UTC

    I have a copy of PBP in front of me, and the chapter on regular expressions goes into great detail explaining why these should be used all the time.

    If the statement "always use /xms" is presented in isolation, then someone presenting this must be prepared to explain why - or cite the pages in PBP.

    I don't think that this is advocating cargo cult programming practices any more than recommending "always use strict and warnings" is.

    --

    Oh Lord, won’t you burn me a Knoppix CD ?
    My friends all rate Windows, I must disagree.
    Your powers of persuasion will set them all free,
    So oh Lord, won’t you burn me a Knoppix CD ?
    (Missquoting Janis Joplin)

      I don't think that this is advocating cargo cult programming practices any more than recommending "always use strict and warnings" is.

      No, those help catch errors. They don't change semantics.

      /m and /s change semantics in ways that are most of the time unneeded, and sometimes even unwanted.

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Re: Best practice or cargo cult?
by McDarren (Abbot) on Jun 20, 2006 at 15:02 UTC
    Well, (and to state the obvious) - if you're doing it simply because you saw somebody else do it, or just heard that it was a "good idea", without really understanding why - then yeah, sure I'd say it's cargo cult.

    But if you've read the book, absorbed and understood the information and then made an informed decision to do it (or not) - then it isn't. (or if you've made the same informed decision without even seeing the book)

    However, "Best Practices" and "Cargo Cult" are not necessarily mutually exclusive terms (as your title might suggest). For example, I think most people would agree that it's good practice to always use strict and warnings. Yet, not everybody that does that will be able to tell you why it's a good practice - or perhaps more importantly - when it's okay to not do it. Does that mean that these people are "Cargo Culting"?

    Yes, probably - but that doesn't mean it's a bad practice :)

    Cheers,
    Darren :)

      However, "Best Practices" and "Cargo Cult" are not necessarily mutually exclusive terms (as your title might suggest).

      The title does say "or", not "xor" :-)

        Sheesh, another one of those '"Shall we have Italian or Chinese tonight?" - "Yes"' types.
Re: Best practice or cargo cult?
by dsheroh (Parson) on Jun 20, 2006 at 15:04 UTC
    I would say that it depends on how literally you take the word "always". e.g., If you've got a regex that is so simple as to need no explanation, I would say that including /x, but no internal whitespace/comments, is cargo cult because the /x has no effect. /m and /s are similarly pointless to include when dealing with data which is known to not contain newlines.

    I'm always skeptical when I encounter the phrase "best practice". (In my experience, it tends to mean "We think this is the right thing to do, but can't be bothered (or aren't able) to explain why.") In this particular case, using /xsm with /(\d{3})-(\d{4})/ makes about as much sense as preferring $foo =~ /^bar$/xsm over $foo eq 'bar'. KISS. If you don't need the extra power/complexity, don't take special measures to make it available.

Re: Best practice or cargo cult?
by ikegami (Pope) on Jun 20, 2006 at 15:06 UTC

    Adhering to a programming style is a form of voluntary cargo cult programming. One adheres to a style to promote readability and maintainability through consistency. The style is not always completely appropriate in all circumstances, but that's a cost the adherant is willing to accept. As long as one knows when it's appropriate to break from the style, it's not cargo cult programming.

    So the real question is whether the style was created with thought, or whether it was the result of cargo cult programming. In this case, there are merits to always using those switches.

    • x encourages readability, the addition of comments, etc. at little or no expense.

    • Using m intermittently requires more work on the reader's part to understand the meaning of ^ and $.

      As to choosing whether to always use it or to never use it, using it adds two new instructions to your regexp library ("match start of line" and "match end of line") at some expense[*].

    • Using s intermittently requires more work on the reader's part to understand the meaning of ..

      As to choosing whether to always use it or to never use it, using it adds a new instruction to your regexp library at little or no expense. Where it could do "match any character but \n" before, it can now easily do the same ([^\n]) plus "match any character" (.).

    * — I consider having to use \A instead of ^ expensive (in terms of readability), but I already find $ dangerous to use in validation.

      I suppose I can agree that adopting someone else's programming style is a form of voluntary cargo cult programming, but I do not believe that assembling your own style by considering various options and choosing those which you consider to be the best for you would be CC (even if the end result happens to be identical to someone else's style).

      As to the merits of always using the switches, they all come at the expense of changing the regex semantics to be different from the standard/default semantics. I do not consider this to be "little to no expense". When there are defined standards and well-known semantics, I find it much better to stick with them where possible and only deviate when needed rather than making deviation the default.

        but I do not believe that assembling your own style by considering various options and choosing those which you consider to be the best for you would be CC

        Definitely. That's why I said "the real question is whether the style was created with thought".

        they all come at the expense of changing the regex semantics to be different from the standard/default semantics.

        "standard/default"? They're not the same. Personally, I use the modifiers only when needed. I need s more often than not, so my standard usage is with s, which is not the default. From what I see, when people don't use s on a regexp that uses ., it rarely would cause no harm to use s, and it's often an error not to use s.

        When there are defined standards and well-known semantics, I find it much better to stick with them where possible and only deviate when needed rather than making deviation the default.

        When there are defined standards and well-known semantics, I'd love to here about them. There's obviously differences in the standard usage of the modifiers (default vs Damian's, for example), and the semantics are far from constant.

Re: Best practice or cargo cult?
by Juerd (Abbot) on Jun 20, 2006 at 15:09 UTC

    - always use /x - always use /m - always use /s

    The thing I disagree with is the "always" part. Only use /s when you need /s, and only use /m when you need /m. These flags don't turn warnings or stricture on to help you avoid bugs, but they change semantics, arguably to a less common needed setting.

    Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

      I agree with you but for different reasons. "Always," because it's easier is just dumb, 'cause someone will find it easier not to.

      I'm with you on not modifying the default behavior. The reason is, most people know the default behavior first, and the non default second.

      I.e. ~/abc/ before ~/abc/i.

      To make it a company standard for some reason within the company is great. For instance if there was a training manual that taught /smx first. I personally think it's silly, but hey, there may be a valid reason out there somewhere.

      Teach people the basics and then push familiarity and standards WITH reasons that give flexibility and ability to do their job.

Re: Best practice or cargo cult?
by Anonymous Monk on Jun 20, 2006 at 17:01 UTC
    It sounds to me like you're raising a false dichotomy. If, by "cargo cult programming", you mean "doing something you don't fully understand"; well, that's true of a lot of programming, and indeed, a lot of life.

    People learn very quickly, even as children, how to turn on a light switch. They don't know how it works; it could be magic for all they know -- this is the very essence of a cargo cult.

    And even worse, almost nobody learns enough to really understand how to make a light bulb work; there's glassmaking, and metalurgy, manufacturing and storing inert gasses, generation of electricity, power distribution systems, and probably more.

    And yet, we survive despite (or probably, because of ) this "cargo cult" mentality. We live our lives doing things that we don't understand, in the hopes that a few really smart people do understand the pieces or subsystems involved (electrical engineers do the electrical stuff, mechanical engineers do the manufacturing stuff, etc.) This is true for our entire society; and it's stable.

    I don't know why computers should be any different. We should make the basic interfaces so simple that a child can understand them; and leave the hard stuff to the experts who really need to know. This principle is called "abstraction", and it's identitical to a "cargo cult" with one critical difference: in a cargo cult, all the experts who once understood the abstractions are gone.

    Cargo cult culture is just a cry for more experts who can understand the old abstractions; it's the well known problem of missing documentation under a different guise. If the islanders had documentation on how to construct and maintain radio towers, and were willing to spend the money on education of the experts and maintainance of the infrastructure, they wouldn't have a cargo cult; they'ld have working radios.

      It sounds to me like you're raising a false dichotomy. If, by "cargo cult programming", you mean "doing something you don't fully understand"; well, that's true of a lot of programming, and indeed, a lot of life.

      This choice of words was mine, and as I learned, wrong. I don't know what it is called, though. Maybe just "(following) bad advice".

Re: Best practice or cargo cult?
by JPeacock (Novice) on Jun 20, 2006 at 17:50 UTC

    One of the things that TheDamian emphasized in his talk at TPC that became the book PBP was that the rules he was espousing were the rules he thought were important. He didn't expect everyone to agree with every one of the items, but that he wanted each person to make a conscious decision about what they wanted to follow and stick with it. Having a set of default behaviors based on choice rather than habit was better than the opposite.

    So best practice is having a consistent behavior and cargo-cult is doing something because someone else told you to, without understanding the reasons yourself.

    John

      You summarized my major gripe with this otherwise great book.

      I feel it should be titled "Damian's Best Perl Practices", not "Perl Best Practices". Damian's opinion is considered authorative, and already it is common that people who know little about the language automatically resolve any discussion in the direction of PBP. Effectively, this book has at least partly destroyed the freedom and intellectual discussion that has always surrounded Perl in business use.

        Until PBP, there was not a single Perl book which had focused on writing better standardized code. Most of the books that I have seen are little more than an overview of many subjects and do not go into any depth. A good example of this is Advanced Perl Programming, Second Edition. From the table of Contents, for Template Tools (Chapter 3) it introduces,

        1. Formats and Text::Autoformat
        2. Text::Template
        3. HTML::Template
        4. HTML::Mason
        5. Template Toolkit
        6. AxKit

        You have given them 6 options but not much insight into what problem they are going to solve. You could argue that a person should look at them all and make your own decision. On the other side, you have not given the user information on how to develop an application or increase productivity. This sort of material is at best shallow and useless after reading it once. (In comparision, I will go back and look things up again in PBP periodically).

        Damian got an overall concensus on his thoughts and solutions from others and wrote a book on it. Damian Conway did a lot of the work but he did not do it alone. The thing that PBP actually provides are real solutions to common problems (example: use Readonly vs. use constants). That is something that other books just don't even address and leave me turned off from even looking at them.

        To imply that somehow he has destroyed discussion because of the title and being an authorative figure is being dishonest. There is nothing there that prevents you from writing your own book except your own motivations, skill as a writer and a someone to publish your book.

Re: Best practice or cargo cult?
by vkon (Deacon) on Jun 20, 2006 at 18:33 UTC
    I saw many examples of code without these modifiers (in books, CPAN modules, documentation) and they are typed only when needed.

    So I think - no, those are not good practice, rather those are just tools which I either need or not in my particular case.

    Regarding 'm' switch, I use it rarely, every time I use it I clearly imagine on how my strings are constructed and why namely I want line beginning.
    So it will be the first candidate to leave your list, IMO :):)

Re: Best practice or cargo cult?
by starbolin (Hermit) on Jun 20, 2006 at 20:43 UTC

    The problem with "Thou Shalt"s in programing ( and enginering in general ) is that someone is always advocating a colliding list of "Thou Shalt"s.

    From perlretut:

    You might wonder why '.' matches everything but "\n" - why not every character? The reason is that often one is matching against lines and would like to ignore the newline characters.
    Obviously Mark and Damian are coming at it from different perspectives. Probably based on the type of problem sets they work on. As perlretut points out, the two switches /s and /m create a total of four different behaviors. Each may suit a diffent class of problem.

    As to the use of /x, it only does one good if one actually comments one's code. Thanks to this thread I now have the ugly vision of seeing some SOPW's code sprinkled with m/../x but without any comments!

    And let me just remined everyone, in a little bit of advice a boss onced passed on to me:

    "Fortran 7 is the only real computer language."



    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}

      If my RE is too long for a single line but rather simple, I just use /x without a comment within the RE (maybe there's one before, e.g

      # replace all <a href="link"...>linktext</a> # with [url="link"]linktext[/url] $var =~ s~<a \s+ href = "([^"]+)" [^>]* > (.+?) </a> ~\[url="$1"\]$2\[/url\]~gsx;

      (Well, I know there are better ways to parse html than with REs, but it was the first example that came into my mind)

      I only use g, s, m when really needed and not as default options.

      Best regards,
      perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

Re: Best practice or cargo cult?
by GrandFather (Cardinal) on Jun 20, 2006 at 21:04 UTC

    Anything done all the time "because someone said so" is "cargo cult".

    However, just because it is "cargo cult" doesn't mean it can't be a best practice. So the question more usefully becomes "Is always using xms on regexen a good idea?".

    Personally I can see the justification for generally using /x and /s - they tend to avoid surprises, like when a chunk of whitespace is a tab rather than the space expected and treating all characters in a non-special way (/s) saves involving a couple of brain cells during processing more often than it requires them.

    But /m is a beast of a different stripe requiring a significant switch of meaning for ^ and $. I haven't read the book (books are expensive here and Perl is something of a "sideline" for me) so I don't know the justification, but I'd be inclined to leave the /m switch off.


    DWIM is Perl's answer to Gödel

      I agree. There is also another problem I see; as the ms flags have been used much less, it is possible they are buggier. Quite a few optimization bugs have been found in the m case (and some discussed here actually).

      cheers --stephan
Re: Best practice or cargo cult?
by Argel (Prior) on Jun 20, 2006 at 22:00 UTC
    Here's a rough summary of TheDamian's reasoning:

    /x (PBP pgs 236-237)

    RegEx's are really programs and thus should follow the other guidelines presented in the book (the most obvious being to comment them).

    /m (PBP pgs 237-239)

    Perl should behave the way most people think it will behave. And thanks to common line-based UNIX utilities like sed, grep, and awk the default Perl behavior for ^ and $ will not behave as expected (by most people). So use /m to align expectations.

    /s (PBP pgs 240-241)

    Pretty much the same argument as used in /m -- Perl should behave the way people think it will behave.

      Thank you. I haven't read the book, which prevents me from disagreeing with its conclusions without being an ass.

      I'm going to disagree now, but I still haven't read the book. I'm relying on your interpretation. So perhaps that's worth half.

      My half-assed opinion: /m is usually irrelevant to what people expect. The vast majority of programs end up reading things line-by-line. If you're only looking at a line at a time, then /m has no effect.

      Therefore, it is only relevant in the less-common circumstances when you have embedded newlines. I find that in those cases, I sometimes mean one thing, sometimes the other. As a result, I find /m sometimes useful, and sometimes it does the opposite of what I want. So I use it when it is what I need, and when reading such code, the existence of that option is a very useful indicator of what the expression is doing. Using it all the time, even when it has no effect, harms readability.

      As for /s, consider this common idiom:

      while (<>) { if (/^title:\s*(.*)/i) { $title = $1; } }
      What does that do? Well, it looks at a config file for a line saying "Title: The Adventures of the Were-Gerbil" and pulls out the title. If that included the newline, I would be rather annoyed. It's like asking someone "what's in the bag?" and receiving the answer "an apple, a sandwich, and a bunch of air". Smartass.

      Also, similarly to /m, I use /s specifically in situations where I want to do something unexpected with respect to newlines. I prefer using it sparingly, so that those situations are more noticeable. Most of the time, I'd rather use something like [\w\W] rather than a trailing /s, especially when what I really want is a mixture of both possible meanings of '.'.

      Always using /x, on the other hand, makes sense to me, even though I haven't switched over to it yet. Even in the absence of comments, I often like using it simply for readability: /\s*\((.*?)\)/ is just a lot harder to follow than /\s* \( (.*?) \)/x, and the latter could get away without a comment, IMH(-A)O.

        Re /x ... is it? Let me see, the second example, what would that match? Let's see: some whitespace if any, space, opening brace, space, as little as possible of anything (and this is what I'm interested in), one space, closing brace and ... what the heck? OK, OK, back onto the trees, everything's wrong so let's go back and ignore the spaces, so we want some whitespace, opening brace, anything up till a closing brace and that's it. Why the heck cannot they write what they want right away?!?

        Unless the regexps is really complex /x just get's in the way, confusing what is and what is not part of the regexp and what's to be ignored. And if the regexp IS that complicated it's usually better to split it into several named pieces

        my $tagname = '[\w!][\w\d-]*'; my $paramname = '[\w!][\w\d-]*'; ... die "Malformed filter definition line: $line\n" unless $line =~ /^$tagname(?:\s+$tagname)*(?:\s*:\s*$paramname(?:\ +s+$paramname)*)?$/o; ...

Re: Best practice or cargo cult?
by hsmyers (Canon) on Jun 21, 2006 at 01:07 UTC
    I've firmly believe that the phrase cargo cult programming is the best example of its own accusation. A kind of Hofstadter recursion--- I kinda like that...

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."
Re: Best practice or cargo cult?
by spiritway (Vicar) on Jun 21, 2006 at 05:13 UTC

    "Cargo Cult Programming" is a bit harsh, but I can see some merit to this idea. IMNSHO, it's not a good idea to *always* do something. There are usually exceptions to a rule, and it's a good idea to know when those exceptions apply. I've been bitten by the /x switch, for example - blindly using it, without understanding that it was screwing up my regexp. It was just sloppiness on my part, a habit that was so ingrained that it took me hours to figure out it was the cause of the problem.

Re: Best practice or cargo cult?
by tphyahoo (Vicar) on Jun 21, 2006 at 11:55 UTC
    How hard would it be to write a pragma to do this? Or does one already exist? I'm thinking something like...
    use warnings; use strict; use PBP::xsm; =~ /xsm is on/; =~ /(?-x)(?-s)m is on, x and s if off/; #or whatever the appropriate s +yntax for switching off regex options... ??
    Could be a nifty way of anticipating the Coming of Perl 6. Whenever that'll be...
        s/ (?<= Regexp:: ) Autof (?= lags) /DefaultF/gxms ? 1 : 0;
Re: Best practice or cargo cult?
by girarde (Friar) on Jun 21, 2006 at 15:52 UTC
    "Cargo cult programming" is a misleading pejorative, in that real cargo cults are doing something that worked before, even though it is no longer effective, like the guys who worship Prince Philip, thinking he's a local Chinese guy from their island who emigrated and married way, WAY up.
      real cargo cults are doing something that worked before, even though it is no longer effective

      It's not quite even that. Cargo cults aren't just doing what worked before; they're a combination of a valid scientific methodology, and a belief system ("voodoo") which essentially suggests that function follows form, and not vice versa.

      Once you believe in voodo, and discount reality to maintain your beliefs, cargo cults make a twisted kind of sense. Then again, once you're willing to discount reality, just about any religion or crazy belief system can make a twisted kind of sense, or at least redefined "sense" until it fits. :-)

Re: Best practice or cargo cult?
by Anonymous Monk on Jun 29, 2006 at 17:13 UTC
    He is right.

    The proof is in that I just saw Conway demo a language he made that ran in latin at YAPC 2006. You can arrange it in any order and it still works.

    That means he's right.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://556406]
Approved by McDarren
Front-paged by kwaping
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2014-08-23 06:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (172 votes), past polls