Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
"be consistent"

Where are the Perl::Critic policies for the Camel?

by tchrist (Pilgrim)
on Oct 05, 2013 at 18:40 UTC ( #1057058=perlquestion: print w/ replies, xml ) Need Help??
tchrist has asked for the wisdom of the Perl Monks concerning the following question:

The Situation

Over the past few years, I’ve come to be the “go-to guy” for an existing code base of several million lines of legacy Perl code running in a non-stop (365×24) environment across scores and indeed hundreds of machines spread all over the globe.

Whenever anything anywhere goes wrong in this code base, or if it needs enhancement, or if it simply needs explaining to people who don’t understand Perl very well, I’m the wizard-in-residence who gets tapped to explain what’s really going on and how one might go about enchancing, revising, fixing, or refactoring the old code.

Some of this code is as robust and sophisticated as the best of CPAN, written with a modular, testable, and often object-oriented design. But most of it is not. Much of it is written by very recent college graduates with no formal Perl training who just copied existing code without understanding it, making tiny tweaks along the way. So you see the same bad memes spread like a virus. Think of the bad old “script-kiddy” days of the very start of all the Perl CGI code people were writing back in the early 90s, and you’ll have a pretty good idea of what this code looks like. It is . . . astounding.

One enormous problem is that the code is written very un-idiomatically — even anti-idiomatically — as far as Perl norms are concerned. Often it looks like they’re writing a gigantic shell script or sometimes an ugly C program. They aren’t always very experienced with Unix, so something like the proper care and feeding of errno with all its vicissitudes and caveats completely eludes them, and process-management with fork, waitpid, pipes, and the correct 16-bit handling of $? is often missed.

The worst problem of all seems to be that they don’t have any experience in being able to recognize what natural, idiomatic, and maintainable Perl code even looks like in the first place. They are just not familiar with standard Perl idioms. So you see them doing strange things like foreach(<INPUT>), or resorting to character-by-character processing of strings using index and substr instead of using regexes. Almost all their data structures are flat arrays or hashes, nothing multilevel, and there are no structure/records to speak of. It’s at best Perl4ish.

The — or at least a — Solution(?)

So here is what I want to do, and my question for my fellow monks. I want to develop a highly tuned set of perlcritic policies to help provide automated advice for when they write new Perl code. This would consist of only a subset of the available Perl::Critic::Policies::* on CPAN, and even those would be tuned. That’s because I disagree with the severity levels of many of them, and with the very premises of others.

But some are really very helpful. I especially like the ones involving complexity metrics, both the McCabe stuff and the rest of it.

Question 1: Are there useful P::C Policies that are not on CPAN that I should be aware of?

It seems like other people must have made their own tweaked policies. Where are these? Are they hidden off on DarkPAN somewhere?

Question 2: Where are the policy sets that correspond not to Perl Best Practices, but rather to Programming Perl?

In Chapter 21 of the fourth edition of Programming Perl, “Common Practices”, there are three sections that very much lend themselves to conversion into P::C policies. These are:

  1. Efficiency (pp. 691–700)
  2. Programming with Style (pp. 701–704)
  3. Fluent Perl (pp. 705–714)

Some of those are of course conflicting, especially in the Efficiency section. But many are quite straightforward (and uncontroversial), and so should be easily enough coded up into separate policies.

But I can’t find any Camel policies for P::C. Do these not exist? Or are they hidden on DarkPAN? Surely someone somewhere must have begun work on this particular project of turning the relevant parts of Chapter 21 in perlcritic policies.

Or just maybe, since the book has only been out a couple of years in its current incarnation, this hasn’t been started yet. Would anyone out there be interested in such a set of policies being available?

I recognize that some of those “Style” matters fall more within the realm of perltidy than of perlcritic. That’s ok: I already have a perltidy config to help establish a standardized look for our new code base. It’s perlcritic policies that are giving me trouble right now, in that there just aren’t enough of them, and not the right kinds.

Question 3: Where are the Unicode P::C policies?

I notice that the triadic open policy completely ignores any advice to include the Encoding layer. That seems like a mistake. Specifying the encoding is so important that we included it as early as page 19 in the current Camel. It’s something that people really need to pay attention to.

That does remind me of something else. Chapter 21 doesn’t include any of the advice from my famous Stack Overflow Unicode advice list. There really needs to be a P::C policy set for all that stuff, too.

Question 4: How best to set up a project-wide set of defaults for perlcritic?

I have not yet read perlcritic’s documentation in full, but I don’t quite see how to have a per-project set of defaults. I can see how to have per-user overrides, but I’m trying to figure out the best “big project” approach. Is the best way to set the PERLCRITIC envariable to the project-specific config file? But what happens if individual users want to override those settings?

I guess what I would like is for it to do the project one first, the ~user one second, and the per-directory one third, with later values overriding earlier ones. But it looks to me like it can only read at most one config file, not a chain of them with priority. Am I misunderstanding or missing something?

Let’s Start a Project!

I would really like to see a set of Camel policies, plus various others. I also recognize that coding up all those policies is a lot of work, stuff I myself don’t have time for. That’s why I am hoping that someone else has already done so, or at least begun that work.

If not, then let’s please get that project going. Is this something that volunteers alone would be able to tackle? Or should we try to secure a funding grant to help pay for the development work?

I never, ever thought I would want anything like this, but if you were stuck with a few million lines of Perl Worst Practices code, I think you too might start thinking along these similar lines.

Thanks for all your ideas and help, suggestions and criticisms alike.


Comment on Where are the Perl::Critic policies for the Camel?
Re: Where are the Perl::Critic policies for the Camel?
by talexb (Canon) on Oct 05, 2013 at 19:02 UTC

    Great idea! While I have some free time these days (job-hunting takes a little time each day), I would to help out with this project.

    Having said that, I will promptly declare my bias towards agreeing with Perl Best Practices (and therefore perlcritic) -- but only 90-95% of the time.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Where are the Perl::Critic policies for the Camel?
by Anonymous Monk on Oct 05, 2013 at 19:09 UTC
Re: Where are the Perl::Critic policies for the Camel?
by toolic (Chancellor) on Oct 05, 2013 at 19:38 UTC
    I have no answers for you, just suggestions for more research:
    • Perhaps a browse through the bug list might reveal patches for the policies you want.
    • CONTACTING_THE_DEVELOPMENT_TEAM might unearth some policies, as well.
    • I scanned the POD with regards to perlcriticrc, and I agree with you that it does not seems to support prioritized config files.
Re: Where are the Perl::Critic policies for the Camel?
by vsespb (Hermit) on Oct 05, 2013 at 21:06 UTC
    Where are the Unicode P::C policies?
    Hm, do you think it's possible to implement any useful set of checks related to Unicode in static code analyzer?
    Afaik, perlcritic only checks one file, so it's impossible to say where data come from and if this text data or no. It's impossible to say if we have encoding layers for our file handles or no.

    Actually it's even impossible to tell at runtime if we have binary data or no, so it's probably even more hard to tell to static code analyzer.

    The thing I can think of is some warnings for regexps (like if we have "\d" in regexp it can mean something beyond 0-9, depending on flags (possible to analyze), 'use feature's (possible to analyze), data (impossible to analyze).
      vsespb kindly wrote:
      Hm, do you think it's possible to implement any useful set of checks related to Unicode in static code analyzer? Afaik, perlcritic only checks one file, so it's impossible to say where data come from and if this text data or no. It's impossible to say if we have encoding layers for our file handles or no.
      I agree that static analysis has its limitations. But there are still things that can be done, I think.

      These are all wrong, or at least have a code smell to them:

      • lc($a) cmp/eq/ne/... lc($b) should be using fc. Same story with uc.

      • Something like a-z should often be \p{Ll} or \p{lower}which mean different things!. Same with A-Z.

      • I would really like to see use warnings FATAL => "utf8".

      • I would like to see the charnames versions used instead of magic numbers, so for example \N{EM DASH} over \x{2014}.

      • The POSIX character classes are suspect. Thanks to Karl’s fine work, we do studiously follow UTS #18’s recommendations on compat properties here, but things like :punct: versus \p{punct} are a problem, because POSIX mixes symbols and puncts, and Unicode doesn’t.
        $ unichars -g '/[[:punct:]]/ != /\p{punct}/' U+0024 &#8237; $ GC=Sc DOLLAR SIGN U+002B &#8237; + GC=Sm PLUS SIGN U+003C &#8237; < GC=Sm LESS-THAN SIGN U+003D &#8237; = GC=Sm EQUALS SIGN U+003E &#8237; > GC=Sm GREATER-THAN SIGN U+005E &#8237; ^ GC=Sk CIRCUMFLEX ACCENT U+0060 &#8237; ` GC=Sk GRAVE ACCENT U+007C &#8237; | GC=Sm VERTICAL LINE U+007E &#8237; ~ GC=Sm TILDE

      • We should probably consider warning about using block properties instead of script properties.

      • There are all kinds of version-dependent Unicode character properties that nothing ever warns you about, and this drives me crazy. For example, you cannot use \p{space} or \p{Horiz_Space} or \h or \R in v5.8.8.

      • There was a shift in the allowable user-defined properties being restrict to InFoo or IsFoo at one point, which is not noted. Although I hope this will get better soon.

      • Should there be a policy that advises double-barrelled property names for the ones that apply? I find that that might help people who are confused about the script versus block properties. For example, \p{Latin} actually means \p{Script=Latin}, not any of \p{Block=Basic_Latin}, \p{Block=Latin_1}, \p{Block=Latin_1_Supplement}, \p{Block=Latin_Extended_A}, \p{Block=Latin_Extended_Additional}, \p{Block=Latin_Extended_B}, \p{Block=Latin_Extended_C}, or \p{Block=Latin_Extended_D}.

      • Similarly, I wonder about the GC categories. I just love \pL — and unlove the gratuitously embraced \p{L} — but when people write that or something like \p{Ll}, should there be a way to suggest that it would be more clearly written \p{GC=L}, \p{GC=Ll}, \p{General_Category=L}, \p{General_Category=Ll}, \p{General_Category=Letter}, or \p{General_Category=Lowercase_Letter}?

      • And even more importantly, shouldn't those be \p{Lowercase} or \p{lower}, since you otherwise miss the 159 lowercase code points that are not \p{GC=Ll}?

      • The approach of \PM\pM* instead of \X does not work.

      • Just as we tell people that /./ instead of /(?s:.)/ may sometimes get them into trouble, something that mentions that sometimes you really want /\X/ instead of /./ would be good. I just don’t know when to tell them that in a general not a specific case. I mean, I can look at it and know, but a program? Dunno.

      • Not using the \X-aware versions of substr, index, rindex, length can be a problem in some programs.

      • I have no earthly idea what to do about the maxim to NFD on the way in and NFC on the way out. At the least, people need to understand that unless they normalize their two variables, not even string comparisons between them will work. Well, barring Unicode::Collate, which is awfully heavy-weight but actually works. And don’t forget Unicode::Collate::Locale, either.

      • I can get rather nervous about seeing literal \n, \r, \f, \r\n in code. That really should be using \R but that only works in a regex.
        $ unichars '/ ^ \R $ /x' U+000A -- GC=Cc LINE FEED (LF) U+000B -- GC=Cc LINE TABULATION U+000C -- GC=Cc FORM FEED (FF) U+000D -- GC=Cc CARRIAGE RETURN (CR) U+0085 -- GC=Cc NEXT LINE (NEL) U+2028 -- GC=Zl LINE SEPARATOR U+2029 -- GC=Zp PARAGRAPH SEPARATOR
        Also, do note that "\r\n" = / \A \R \Z /x is also true, so like \X, \R can be more than one code point in length.

        Until Karl gets some support for $/ = q(\R) and readline and chomp into a future release, we’re stuck with this ugliness:

        @lines = do { local $/; split /\R/, <INPUT> };
        which is a big problem because it just doesn’t scale to superbig files or to interactive handles like two-way socket communications. And lord knows what $\ = q(\R) would ever mean for the output record separator. :)

        I do have an inspiration for a PerlIO::via::Unicode_Linebreak I/O layer to address all this, but haven’t thought it through.

      • Getting things like sorting right is hard. Under some operating systems, your cmp operator (and thus default sort, etc.) actually does work correctly provided that all these apply:
        1. You have done a POSIX::setlocale(LC_ALL, "en_US.utf8") somewhere in your program.
        2. Your string-comparison operators are within the lexical scope of a use locale pragma.
        3. (Maybe; haven’t checked.)Your internal Latin1 data is in its 2-byte UTF8 representation not the 1-byte Perl cheat/short-cut.

      • I really, really want to do something about encodings. Opening a text file without stating its encoding somewhere or other is a recipe for failure. The problem is that it can be stated in so many placed: a PERL_UNICODE environment variable, a use open pragma, a middle argument to open, or binmode — just to name a few. There are other ways, too.

      So even though there is a lot that cannot be done about Unicode, there is also a lot that could be done with Unicode.

      I really do think there could be some more P::C policies in these areas.chomp

        Well, actually recently I had experience writing not a small application (20k lines), which allows unicode everywhere and handles unicode correctly.

        But it does not need 90% of things that you listed

        Probably because my application does not try to analyze text data (it only stores it, converts, compares, reencodes), it does not need sort nor fc-aware comparison

        Your case is probably something that analyzes text (I can imagine now only something related to natural language processing or word processor or maybe a dictonary)

        So I think different applications need different level of unicode support

        Below some cases when policy you listed can be wrong in some circumstances:
        lc($a) cmp/eq/ne/... lc($b) should be using fc. Same story with uc.
        Something like a-z should often be \p{Ll} or \p{lower}
        If you write, say, code which have to deal with parsing http headers (no, that's not reinvention of wheel, like HTTP library, that can be a proxy server or REST library), then "cmp" and "a-z" would be correct choice, and fc() \p{lower} can introduce bugs (say, with "β" vs "ss").

        Other examples can be unit tests where you usually have to deal with pre-defined data sets, or internal program metadata which is always plain ASCII, or comparison of MD5/SHA hex values etc.
        Opening a text file without stating its encoding somewhere or other is a recipe for failure.
        Unless it's a binary file.

        @lines = do { local $/; split /\R/, <INPUT> };
        Hm. I think it's not correct to use something like U+2028 as line separator for files.

        You need code like this if you read from text file. Text file is something separated by LF or CRLF, other combinations are not portable.

        If you are writing word processor which should handle U+2028 you should not mix this with system file IO, instead introduce your own logic when you are spliting data to "lines" and paragraphs.

        I don't see where this can be correct to mix "lines" from your word processor logic and lines of text file on disk (or socket)
Re: Where are the Perl::Critic policies for the Camel? (the unthinking leading the unseasoned)
by tye (Cardinal) on Oct 06, 2013 at 05:15 UTC

    I'll just note that my experience of throwing automated linters at uninitiated programmers is very often similar to throwing wishes at classical genies. You'll end up with code that obeys the automated rules but much of it will be worse than what you had hoped to prevent.

    Keep that in mind and you might be able to mitigate that.

    - tye        

      Never mind automated linters, how many times have you seen this exchange?

      "You should always use strict and warnings."

      "I used to have those but they caused a lot of errors and my code wouldn't run so I took them out."

      (This has a relative: "Didn't the CPAN shell make test?" "Yeah, but it gave a lot of errors so I had to do a force install.")


        Andy Lester graciously wrote:

        Never mind automated linters, how many times have you seen this exchange?

        “You should always use strict and warnings.”

        “I used to have those but they caused a lot of errors and my code wouldn't run so I took them out.”

        How many times have I seen that particular exchange, Andy? As many times as there are grains of sand in the sea! Especially if you count including -Wall in your Makefile’s CCFLAGS.

        But I can top that. I’ve actually seen this, and not just once, either:

        Manager: “We can’t have any more coredumps from our C programs. It makes us look bad and interferes with production.”

        Programmer: signal(SIGSEGV, SIG_IGN);

        I dearly wish I were kidding; I’m not.

        I see stuff like this every day, stuff that doesn’t so much move you to righteous ranting as it does to being stricken dumb with sheer disbelief. I bet I now have keyboard-shaped wedges permanently imprinted on my forehead.

        It’s code that leads first to apoplexy and thence to syncope. The thing is, after they bring around the code-smelling salts and still dazed you awaken from your nightmare, it’s all still there: holy terrors of truly Biblical proportion.

        And by Biblical, I’m specifically thinking of the venerable Books of Job and of Lamentations and of Ecclesiastes: no Good News to be found. :)

        We’re well into sackcloth-and-ashes territory here.

        “Weeping may endure for an entire dark release, but joy cometh in the fresh light of refactoring.”

        ― Psalms 30:5, the Programmers’ edition

Re: Where are the Perl::Critic policies for the Camel?
by petdance (Parson) on Oct 06, 2013 at 14:42 UTC
    Some thoughts from someone who has written a number of P::C policies:
    1. The policies that get written are the ones that scratch an itch. I've never written a policy for foreach (<INPUT>) because I've never worked on a codebase where someone had written that. Vague names like $data? That's something I watch out for.
    2. It's really pretty easy to write a policy. The infrastructure is there and the testing is easy.
    3. The way I have perlcritic policies in my project is to have a .perlcriticrc in the root of the project and then have a "make critic" target.
    4. Severities are something I pretty much ignore. I see warnings mostly as binary. They're either there or they're not. If a project has so many of a given warning that it's not worth my time to flag it (say, on unnecessary string interpolation which is a pet peeve of mine but sometimes just something you have to live with), then I just turn it off in my perlcriticrc rather than have all the noise.
    5. P::C is not fast. When you have a small number of files you want to check before committing to your VCS, it's plenty fast. Running across, say, 5,000 files in a big project as a regular thing? That can take a while.
    6. Join the PC dev list. We'll be more than happy to have your input and to help guide.


Re: Where are the Perl::Critic policies for the Camel?
by hsmyers (Canon) on Oct 07, 2013 at 22:23 UTC
    Like some others have noted, I'm conflicted about PC for the same reason that I have been conflicted by versions of Lint for C and C++. That being the need to more easily reduce the 'error/warning' checks to something useful. I often feel that any code that makes it through with no output is either written by the gods or is too trivial to consider. Somewhat like Godel's notions about un-decidability. That said a PC module I'd certainly like to see would be based on 'Effective Perl Programming'. Strictly based, nothing else need apply. It also occurs to me that it would be nice to have named modules that could be the basis of a given run. This would of course allow the addition of local needs and caveats. Perhaps all of this is available now, it has been a while since I've looked(for the disclaimed reasons above) In any even EPP should be on every Perl programmers desk(alright bookshelf for you neat-freaks) as it is the best introduction to idiomatic Perl of any text out there. Yes it says that on the cover, but that doesn't prevent it from being correct.


    "Never try to teach a pig to wastes your time and it annoys the pig."

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1057058]
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2014-04-18 10:49 GMT
Find Nodes?
    Voting Booth?

    April first is:

    Results (466 votes), past polls