Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Question: Capturing a repeated pattern

by robmderrick (Initiate)
on Apr 08, 2010 at 21:32 UTC ( #833636=perlquestion: print w/replies, xml ) Need Help??
robmderrick has asked for the wisdom of the Perl Monks concerning the following question:

I think I need the big space of this over chatterbox to properly frame this question.

I have an input line that looks like:

somename   1000  0.24  280  2 2576.9  2731.9  12.0  4195.3

I am looking for a way to capture all nine elements with one pattern that allows me to use one subpattern to capture the the last 8 elements.

I.e, I tried this:

(@array) = $_ =~ /(^[a-z]\w*)/\s+([\.\d]+)/\s+([\.\d]+)/\s+([\.\d]+)/\s+([\.\d]+)/\s+([\.\d]+)/\s+([\.\d]+)/\s+([\.\d]+)/\s+([\.\d]+)\s*$/i

... but that is ungainly.

I lose a lot of it with the following:

my $p = qr/\s+([\.\d]+)/; (@array) = $_ =~ /(^[a-z]\w*)$p$p$p$p$p$p$p$p$/io;

... but I would sure like to know if there is a way to use that one pattern 8 times, like repeating a digit match 8 times with \d{8}

Does such an arcane method exist?

-- rob derrick

Replies are listed 'Best First'.
Re: Question: Capturing a repeated pattern
by kennethk (Abbot) on Apr 08, 2010 at 21:55 UTC
    The issue is that if the parentheses only appear once in the regular expression and you use a repetition to represent your match like (\d){8}, you will repeatedly overwrite the same buffer. Probably the clearest solution is using the repetition operator x in constructing the string for your regular expression:

    #!/usr/bin/perl use strict; use warnings; $_ = 'somename 1000 0.24 280 2 2576.9 2731.9 12.0 4195.3'; my $regex = '(^[a-z]\w*)' . '\s+([\.\d]+)' x 8 . '$'; my (@array) = /$regex/io; print join "\n", @array; __END__ somename 1000 0.24 280 2 2576.9 2731.9 12.0 4195.3

    Note as well that if you are operating on the special variable $_ there is no need to bind it to the regular expression.

    You can also construct that in the regular expression itself using A bit of magic: executing Perl code in a regular expression, as described in perlretut.

      Thanks Kenneth, and all the other responders. I just wanted to make sure that my suspicion that there was an easy way to do this that wasn't long and ugly, but also not profoundly obfuscated was correct, or not correct. Like the "just use a split" answer, which if the input was uniform, would of course been the correct answer. Thanks all, I've got what I need. -- rob
Re: Question: Capturing a repeated pattern
by rubasov (Friar) on Apr 08, 2010 at 22:05 UTC
    Probably I do not know your specific problem, but your regex seems to be an overkill, a simple split is not sufficient for you?
    my $str = 'somename 1000 0.24 280 2 2576.9 2731.9 12.0 4195.3' +; my @array = split /\s+/, $str;

    update: It seems to me you have two goals: to verify the format of your input and to split it to fields. I think it is much more readable in two separate steps:

    die "invalid input format: '$str'" if $str !~ / \A [a-z]\w* (?: \s+ [\d.]+ ){8} \z /ixo; my @array = split /\s+/, $str;

      A split for that input line alone would work perfectly. But, my input is quite varied, and for this purpose, I only want to match those lines that exactly match my pattern, and not all of the others that split would grab.

        You could always match first and then split:

        $s = 'somename 1000 0.24 280 2 2576.9 2731.9 12.0 4195.3';; $s =~ m[\w+(?:\s+(?:\d+\.)?\d+){8}] and @a = split ' ', $s and shift @a and print "@a";; 1000 0.24 280 2 2576.9 2731.9 12.0 4195.3

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Sorry, I've updated the grandparent node before I saw your answer. And of course if the separator part cannot be easily subtracted from your regex then split won't suffice and you will need what kennethk recommended.
Re: Question: Capturing a repeated pattern
by LanX (Chancellor) on Apr 08, 2010 at 22:50 UTC
    In these simple cases better prefer simple approaches!

    Constructing regexes as concatenations of strings or qrs is simple and clear!

    But if you really need to "abuse" the full builtin regex power, you may wanna have a look at  "(?PARNO)" "(?-PARNO)" "(?+PARNO)" "(?R)" "(?0)" in perlre!

    This functionality was introduced for complicated recursions ...

    You can also try to dynamically generate the regex with embedded perlcode like (?{..}) or (??{..}). to achieve even more obfuscation... ;-)

    Cheers Rolf

    Update: example

    DB<1> $_=q(somename 1000 0.24 280 2 2576.9 2731.9 12.0 4195. +3) DB<2> $,=" | " DB<3> print /(^[a-z]\w*)(\s+[.\d]+)((?2))((?2))((?2))((?2))((?2))((? +2))((?2))/ somename | 1000 | 0.24 | 280 | 2 | 2576.9 | 2731.9 | 12. +0 | 4195.3

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://833636]
Approved by kennethk
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2017-04-30 18:29 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (542 votes). Check out past polls.