http://www.perlmonks.org?node_id=11131226

hrcerq has asked for the wisdom of the Perl Monks concerning the following question:

Hello again.

I've got a script that uses named capture groups to parse file records. The record format is always the same (four fields wide, colon separated), just like a passwd file (except for the number of fields).

For example:

0:1:bob:Bob
1:1:bob:Bob
1:0:alice:Alice

You get the idea. At first I used to split it, but later I came to the conclusion that using a regex with named groups could provide a more understandable code. The (simplified) example below might help you get the picture.

if (/^(?<type>[01]):(?<valid>[01]):(?<name>[^:]+):(?<comment>[^:]+)$/) + { print "Key name: ", $+{name}, "\n"; print "Key Comment: ", $+{comment}, "\n"; print "Not valid\n" unless $+{valid} || !$+{type}; } else { print "Malformed input: $_\n"; }

Now, I've noticed (according to perlretut) that named capture groups were introduced in Perl 5.10. I don't expect the script to run in older versions, but would it be wise to use v5.10 pragma?

Would it have any unwanted side effects? I'm not using any of the features of the bundle, just wanted to state that version 5.10 is required. Is that a good idea?

Replies are listed 'Best First'.
Re: Should I use v5.10 because of named groups?
by Fletch (Bishop) on Apr 13, 2021 at 22:36 UTC

    First a syntax nit: personally I don't find that regex any more readable than:

    my( $type, $valid, $name, $comment ) = split( qr/:/, $_ );

    In fact if you're explicitly wanting it in a hash I'd almost say go with something like this is still (IMHO) clearer:

    my %val; @val{ qw/ type valid name comment / } = split( qr/:/, $_ );

    Granted you're doing a minimal amount of validation checking [01] on the first two fields, but (without more context on your data) that doesn't feel that compelling. If you were to keep it with your regex with the capture groups then I'd at the least suggest adding an /x and putting whitespace around the colons so it's not as visually . . . run together.

    As to your question of using an explicit version: you can check the docs on feature what that turns on (specifically say, state, and the switch construct). Since named capture groups aren't toggled with that all you're getting is maybe bailing earlier in the (as you say) unlikely case you were ever run with an older perl. So . . . meh? That being said though I'm habitually using (say :) say and state so I typically use a much newer required version in what I write. Having that explicitly set regularly helps find problems (typically PATH is messed up and it's running under the ancient OS' /usr/bin/perl not the one it should be which would cause other things to blow up (missing CPAN modules etc.)).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      I guess you're right, that regex isn't any more readable than the alternatives you propose. In fact, I've rewritten it to:

      if (/^([01]):([01]):([^:]+):([^:]+)$/) { my ($type, $valid, $name, $coment) = ($1, $2, $3, $4); ...
      I maintained the regex as I'm fairly sure I'll need to improve it later for more checks on the input.
        Just a nit, in case you didn't know, but you can do the $1,$2,$3,$4 assignment in the if statement:
        if ( my ($type, $valid, $name, $coment) = $_ =~ m/^([01]):([01]):([^:]+):([^:]+)$/ ){}
        I don't think this makes much difference. I just have a coding preference to avoid $1, etc.
        I made 2 source lines because of code line length limits here. In actual code, I'd probably just have one line.
Re: Should I use v5.10 because of named groups?
by ikegami (Patriarch) on Apr 14, 2021 at 01:01 UTC

    I'm not using any of the features of the bundle, just wanted to state that version 5.10 is required.

    BEGIN { require 5.010; }

    Seeking work! You can reach me at ikegami@adaelis.com

      How something so obvious didn't cross my mind is something I'm still trying to figure out. But anyway, thank you for reminding me.

Re: Should I use v5.10 because of named groups?
by choroba (Cardinal) on Apr 14, 2021 at 08:01 UTC
    That's why I created Syntax::Construct. It makes it possible to state
    use Syntax::Construct qw( ?<> );

    I find it more user friendly, as neither the author nor the user need to remember which construct appeared in which version. Read the Description where I tried to explain my reasoning.

    You can also use it just as a reference to see which version introduced what.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Should I use v5.10 because of named groups?
by haj (Vicar) on Apr 14, 2021 at 10:05 UTC
    would it be wise to use v5.10 pragma?

    In my opinion it is always wise to add a version declaration. I add it for the version I'm developing for. Or, more precise, for the oldest version I'm going to run tests for. So, for standalone scripts this would always be a fairly recent version, and 5.10 would only appear if I'm contributing to an application which says it needs "5.10 or newer".

    So, in your case I would recommend to include a version pragma, but probably not use 5.010;.

Re: Should I use v5.10 because of named groups?
by perlfan (Vicar) on Apr 14, 2021 at 03:29 UTC

      At first I didn't understand what you meant by linking to this other thread. But now I've joined pieces together:

      • It appeared right after this one;
      • The data pattern is pretty close to the one I describe here;
      • It was created by some anonymous monk.

      I'm not saying you're implying anything (as you might just be pointing something related that might help, in fact I believe that's the case), but anyway I'd like to mention I'm not that anonymous monk, as I'm aware it could be understood as duplicating threads somehow, which is not cool.