Pathologically Eclectic Rubbish Lister PerlMonks

### Repeating a capture group pattern within a pattern

by mldvx4 (Friar)
 on Jul 15, 2024 at 07:09 UTC Need Help??

mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

I am looking to simplify a pattern. If I have a string my \$x = "0.01 NaN 2.30 4.44"; then the following pattern finds the items present:

```my \$r1 = qr/([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)/x;

Notice that the same capture group criteria are repeated. I wonder how I may write that so it is simpler, shorter, and all on one line. Here is some pseudo-code to try to show what I am aiming for: my \$r1 = qr/(?=([Na0-9\.\-\+]+)\s+){4}/

However, I've tried that and some permutations without luck:

```#!/usr/bin/perl

use strict;
use warnings;

my \$x = "0.01 NaN 2.30 4.44";

# the following works as desired
my \$r1 = qr/([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)/x;

my (\$d, \$e, \$f, \$g) = (\$x =~ m/\$r1/x );
print qq(\$d, \$e, \$f, \$g\n);

# the following finds the first number twice
my \$r2 = qr/(?=(([Na0-9\.\-\+]+)\s*)){4}/x;

(\$d, \$e, \$f, \$g) = (\$x =~ m/\$r2/x );
print qq(\$d, \$e, \$f, \$g\n);

# the following finds a null prior to the first item
my \$r3 = qr/((?=([Na0-9\.\-\+]+)\s*){4})/x;

(\$d, \$e, \$f, \$g) = (\$x =~ m/\$r3/x );
print qq(\$d, \$e, \$f, \$g\n);

exit(0);

How can I write that pattern so that the pattern it contains is repeated but not locked into the values found in the very first match? Is this a case for using recursive patterns?

Replies are listed 'Best First'.
Re: Repeating a capture group pattern within a pattern
by Corion (Patriarch) on Jul 15, 2024 at 07:49 UTC

This might be cheating, but when retrieving multiple repeated matches, I often use /g after validating that the line looks somewhat valid:

```my \$re4 = qr/\b([Na0-9\.\-\+]+)\b/; # capture a floating point number
my @vals = (\$x =~ m/\$re4/gx );
croak "Invalid line '\$x'" if @vals != 4;
(\$d, \$e, \$f, \$g) = @vals;

Often, I first identify the section without capturing and then parse it in a second step (but that's not what you wanted):

```my \$float = qr/\b([Na0-9\.\-\+]+)\b/;
croak "Invalid line '\$x'" if \$x !~ /((?:\$float(\s+|\$)){4}))/;
print "Found numbers '\$1'\n";
my @vals = \$1 =~ /(\$float)/g;

I did not find a way to capture the repeated values in one go.

Re: Repeating a capture group pattern within a pattern
by haukex (Archbishop) on Jul 15, 2024 at 10:39 UTC

Sounds like a case for recursive subpatterns.

```#!/usr/bin/env perl
use warnings;
use strict;

my \$x = "0.01 NaN 2.30 4.44";

# the following works as desired
my \$r1 = qr/([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)\s+
([Na0-9\.\-\+]+)/x;
my (\$d, \$e, \$f, \$g) = (\$x =~ m/\$r1/x );
print "good: \$d, \$e, \$f, \$g\n";

# recursive subpatterns
my \$rx = qr{ ([Na0-9\.\-\+]+)
\s+ ((?1)) \s+ ((?1)) \s+ ((?1)) }x;
(\$d, \$e, \$f, \$g) = \$x =~ \$rx or die;
print "good? \$d, \$e, \$f, \$g\n";

# or match / validate first, then split
my \$ry = qr{ ([Na0-9\.\-\+]+) (?: \s+ (?1)){3} }x;
\$x =~ \$ry or die;
(\$d, \$e, \$f, \$g) = split ' ', \$&;
print "good? \$d, \$e, \$f, \$g\n";
Re: Repeating a capture group pattern within a pattern
by hippo (Archbishop) on Jul 15, 2024 at 09:21 UTC

If I were doing this for real I would use /g as Corion suggests. However, for interest's sake, here is one way to construct the pattern without repetition in the code.

```#!/usr/bin/env perl

use strict;
use warnings;

my \$x = "0.01 NaN 2.30 4.44";
my \$r1 = join '\s+', ('([Na0-9\.\-\+]+)') x 4;
print "r1 is '\$r1'\n";

my (\$d, \$e, \$f, \$g) = (\$x =~ m/\$r1/);
print qq(\$d, \$e, \$f, \$g\n);

Note that you don't need all those backslashes so the inner character class can be shortened to just [Na0-9.+-] but it has no effect on the end result.

If this is an XY Problem, then perhaps if you explain what you are actually trying to do someone could suggest a better approach.

🦛

"If this is an XY Problem, then perhaps if you explain what you are actually trying to do someone could suggest a better approach."

Thanks. I am figuring out a way to parse the Finnish Meteorological Institute's abomination of a data file. It is like 10 pages of empty XML tags followed by gems like these:

```<gml:DataBlock>
<gml:rangeParameters/>
<gml:doubleOrNilReasonTupleList>
1016.1 7.7 20.6 13.8 72.9 215.0 6.64 3.94 5.33 0.0 31.3 31.3 0.0 0.0 7
+16.7 8461717.0 NaN 7750386.0 2224256.0 38507.5 10.9 1016.4 7.7 21.1 1
+3.7 71.1 222.0 6.82 4.67 4.95 0.0 2.1 2.1 0.0 0.0 727.6 11081057.0 Na
+N 10146792.0 2372829.8 40682.2 11.2 1016.4 7.7 21.6 13.2 67.7 220.0 6
+.71 4.37 5.1 0.0 0.8 0.8 0.0 0.0 749.9 13780637.0 NaN 12614863.0 4011
+495.3 44407.4 11.3 1016.4 7.7 21.7 12.3 64.8 216.0 6.64 3.98 5.31 0.0
+ 0.0 0.0 0.0 0.0 693.8 16278537.0 NaN 14898300.0 5267137.5 47598.7 11
+.1 1016.1 7.7 22.1 9.8 56.4 224.0 6.72 4.7 4.81 0.0 0.0 0.0 0.0 0.0 6
+13.8 18488268.0 NaN 16917212.0 7048784.0 54656.4 11.3 1015.9 7.7 21.6
+ 10.3 59.0 223.0 6.52 4.48 4.74 0.0 0.0 0.0 0.0 0.0 508.5 20319020.0
+NaN 18588124.0 8487086.0 52728.9 11.1 1016.0 7.7 20.6 11.8 66.4 217.0
+ 6.22 3.81 4.93 0.0 0.2 0.0 0.0 0.2 387.0 21712000.0 NaN 19859482.0 9
+532787.0 45898.5 10.5 1016.1 7.7 19.9 12.7 71.8 219.0 5.97 3.74 4.64
+0.0 3.3 0.0 0.0 3.3 257.6 22639382.0 NaN 20706258.0 10155480.0 39930.
+0 9.9 1016.4 7.7 19.4 13.2 74.9 221.0 5.25 3.43 3.97 0.0 1.8 0.0 0.0
+1.8 140.2 23144020.0 NaN 21166678.0 10436559.0 36483.1 9.3 1016.4 7.7
+ 19.2 12.5 73.1 232.0 4.04 3.19 2.44 0.0 6.8 2.7 4.3 0.0 47.7 2331552
+6.0 NaN 21323276.0 10494672.0 38582.3 8.0 1016.3 7.7 19.2 11.2 68.6 2
+62.0 2.52 2.49 0.34 0.0 53.8 0.0 24.2 39.1 1.8 23322102.0 NaN 2132958
+4.0 10494672.0 43729.5 6.0 1016.5 7.7 19.0 11.0 68.8 247.0 2.3 2.08 0
+.98 0.0 100.0 0.0 99.5 100.0 0.1 23322252.0 NaN 21329518.0 10494672.0
+ 43367.8 3.7 1016.7 7.7 18.6 12.0 73.4 225.0 2.3 1.57 1.67 0.0 54.2 0
+.0 46.3 14.7 0.0 23322252.0 NaN 21329518.0 10494672.0 38219.0 3.6 101
+6.8 7.7 18.4 14.1 81.9 196.0 2.97 0.75 2.88 0.0 86.4 0.1 82.5 21.9 0.
+1 23322356.0 NaN 21329518.0 10494672.0 28237.8 4.4 1017.0 7.7 18.5 14
+.4 82.5 199.0 3.04 0.87 2.92 0.0 99.7 53.1 65.1 98.6 0.0 23322262.0 N
+aN 21329518.0 10494672.0 27807.3 5.0 1017.0 7.7 18.7 14.1 80.8 195.0
+2.52 0.57 2.46 0.0 100.0 71.6 23.1 99.9 0.1 23322488.0 NaN 21329518.0
+ 10494672.0 29676.3 4.6 1017.2 7.7 18.8 14.0 80.0 196.0 2.96 0.78 2.8
+5 0.18 100.0 100.0 100.0 99.7 0.4 23323732.0 NaN 21330926.0 10494672.
+0 7893.4 4.6 1017.1 7.7 18.6 14.2 81.3 155.0 2.05 -0.98 1.8 1.3 93.7
+58.4 63.3 57.6 15.0 23377624.0 NaN 21380274.0 10494672.0 29305.2 4.6
+1017.1 7.7 18.6 13.8 80.1 170.0 3.61 -0.55 3.56 1.3 92.8 89.9 30.2 1.
+7 72.1 23637418.0 NaN 21617728.0 10494761.0 31078.9 5.7 1017.3 7.7 18
+.9 14.9 83.3 178.0 2.97 -0.13 2.98 1.3 86.3 55.2 68.3 0.0 130.9 24108
+452.0 NaN 22048990.0 10494719.0 27161.4 5.4 1017.5 7.7 19.7 15.3 81.4
+ 197.0 2.29 0.48 2.24 1.3 57.2 22.3 44.9 0.0 302.4 25197118.0 NaN 230
+44054.0 10494678.0 28930.3 4.8 1017.7 7.7 20.4 14.6 76.5 198.0 1.82 0
+.28 1.78 1.3 66.0 10.1 62.2 0.0 413.4 26685568.0 NaN 24404956.0 10494
+752.0 34279.5 4.0 1017.5 7.7 20.7 13.8 72.6 161.0 2.67 -1.21 2.33 1.3
+ 84.8 0.0 84.8 0.0 429.6 28232080.0 NaN 25819190.0 10495451.0 38634.6
+ 4.6 1017.4 7.7 21.2 13.9 71.5 159.0 3.15 -1.39 2.81 1.3 97.7 0.2 97.
+7 0.0 444.8 29833140.0 NaN 27281036.0 10495527.0 39613.4 5.3 1017.6 7
+.7 22.0 13.7 68.4 161.0 2.93 -1.21 2.62 1.3 86.6 1.1 71.8 51.9 558.1
+31842300.0 NaN 29115360.0 10495682.0 42999.5 5.7 1017.4 7.7 23.6 11.8
+ 58.0 143.0 2.81 -1.89 2.04 1.3 5.5 0.0 4.9 0.6 628.6 34105344.0 NaN
+31181590.0 10496878.0 53270.7 5.5 1017.3 7.7 23.9 10.7 54.4 139.0 3.6
+8 -2.68 2.49 1.3 10.6 0.0 7.9 2.9 673.6 36530392.0 NaN 33394718.0 105
+04540.0 55330.6 6.5 1017.2 7.7 23.6 11.7 57.8 141.0 4.98 -3.24 3.81 1
+.3 37.6 0.0 5.4 34.0 671.6 38948084.0 NaN 35600932.0 10572389.0 53759
+.2 8.3 1017.3 7.7 23.0 12.4 61.2 145.0 5.0 -2.93 4.05 1.3 24.6 0.0 24
+.6 0.0 563.3 40976088.0 NaN 37452260.0 10577521.0 51063.5 8.4 1017.1
+7.7 22.9 13.0 63.6 145.0 4.66 -2.77 3.78 1.3 28.6 0.0 7.3 22.9 465.8
+42652852.0 NaN 38982672.0 10578041.0 48385.1 8.3 1017.0 7.7 22.8 12.7
+ 62.7 143.0 4.44 -2.77 3.46 1.3 38.8 0.0 34.2 6.9 350.8 43915868.0 Na
+N 40134736.0 10578632.0 49670.0 7.6 1016.7 7.7 22.4 12.8 64.2 138.0 4
+.17 -2.85 3.03 1.3 10.5 0.0 2.4 8.3 232.2 44751760.0 NaN 40897580.0 1
+0579198.0 48024.7 7.1 1016.5 7.7 21.7 14.3 71.1 128.0 3.16 -2.53 1.89
+ 1.3 0.9 0.0 0.0 0.9 131.4 45224988.0 NaN 41329512.0 10638878.0 40243
+.3 6.6 1016.6 7.7 21.2 14.9 74.9 124.0 2.97 -2.45 1.71 1.3 25.2 0.0 2
+.4 23.3 44.4 45384908.0 NaN 41475220.0 10681477.0 35910.2 4.9 1016.4
+7.7 20.9 15.2 77.1 119.0 2.62 -2.3 1.25 1.3 6.4 0.0 5.8 0.7 1.6 45390
+616.0 NaN 41480160.0 10681174.0 33633.9 4.5 1016.3 7.7 20.6 15.2 78.0
+ 77.0 3.04 -2.98 -0.49 1.3 1.5 0.0 1.0 0.4 0.1 45390552.0 NaN 4147990
+0.0 10681174.0 32898.9 4.4 1016.3 7.7 19.8 15.5 82.0 83.0 3.6 -3.54 -
+0.59 1.3 39.6 1.5 0.0 38.6 0.1 45390516.0 NaN 41479900.0 10681174.0 2
+8384.6 5.4 1016.2 7.7 19.8 15.2 80.9 80.0 3.99 -3.89 -0.88 1.3 79.8 0
+.0 0.1 79.8 0.0 45390256.0 NaN 41479900.0 10681174.0 29798.2 6.2 1016
+.1 7.7 19.9 14.5 78.0 83.0 4.55 -4.5 -0.65 1.3 72.8 1.8 5.5 70.7 0.1
+45390380.0 NaN 41479900.0 10681174.0 33125.9 6.9 1015.7 7.7 19.7 14.8
+ 79.8 79.0 4.93 -4.83 -1.06 1.3 82.5 4.1 0.9 81.6 0.1 45390640.0 NaN
+41479900.0 10681174.0 31134.5 7.6 1015.4 7.7 19.6 15.2 81.5 85.0 5.18
+ -5.15 -0.54 1.3 60.2 1.5 9.8 55.2 2.6 45399888.0 NaN 41488504.0 1068
+1174.0 29202.1 8.0 1015.0 7.7 19.7 16.0 84.4 90.0 5.29 -5.28 -0.08 1.
+3 72.2 0.4 15.8 66.8 32.6 45517296.0 NaN 41595768.0 10681174.0 25601.
+0 8.5 1014.9 7.7 19.9 16.2 84.5 91.0 5.63 -5.62 -0.07 1.3 62.2 0.1 3.
+6 60.7 99.1 45873920.0 NaN 41921652.0 10681188.0 25504.3 8.9 1014.7 7
+.7 20.5 16.2 82.2 97.0 5.27 -5.26 0.48 1.3 93.6 2.4 5.9 93.0 169.2 46
+483036.0 NaN 42477320.0 10681384.0 28098.3 8.7 1014.6 7.7 20.8 16.1 8
+0.8 100.0 5.36 -5.29 0.87 1.3 99.7 2.6 12.5 99.7 196.9 47191680.0 NaN
+ 43123740.0 10681074.0 29718.5 8.7 1014.2 7.7 21.7 16.3 78.2 101.0 5.
+17 -5.08 0.9 1.3 100.0 0.0 5.8 100.0 290.9 48239060.0 NaN 44080204.0
+10681266.0 32552.5 8.4 1014.1 7.7 22.3 16.4 76.7 102.0 5.69 -5.59 1.0
+4 1.3 100.0 0.0 0.3 100.0 344.6 49479400.0 NaN 45212932.0 10694658.0
+34180.5 9.1 1013.5 7.7 22.0 16.4 77.6 94.0 6.21 -6.2 0.29 1.3 100.0 0
+.0 14.7 100.0 258.3 50409360.0 NaN 46061948.0 10725906.0 33338.7 9.7
+1013.5 7.7 21.4 15.2 75.4 95.0 6.7 -6.69 0.48 1.3 100.0 0.0 79.6 100.
+0 115.0 50823272.0 NaN 46440372.0 10725795.0 35955.3 10.4 1013.1 7.7
+22.0 15.2 73.4 91.0 6.86 -6.86 0.07 1.3 99.5 0.0 78.3 97.8 235.6 5167
+1476.0 NaN 47214844.0 10727690.0 38242.3 10.8
</gml:doubleOrNilReasonTupleList>
</gml:DataBlock>

I've written them several times over the years and they have not fixed their data nor deigned to even reply.

That is pretty horrible but at least it appears well-formed whereby it is a space-separated list of decimal numbers and NaNs. I would probably just split it and go from there, TBH.

```#!/usr/bin/env perl

use strict;
use warnings;

my \$str = '1016.1 7.7 20.6 13.8 72.9 215.0 6.64 3.94 5.33 0.0 31.3 31.
+3 0.0 0.0 716.7 8461717.0 NaN 7750386.0 2224256.0 38507.5 10.9 1016.4
+ 7.7 21.1 13.7 71.1 222.0 6.82 4.67 4.95 0.0 2.1 2.1 0.0 0.0 727.6 11
+081057.0 NaN 10146792.0 2372829.8 40682.2 11.2 1016.4 7.7 21.6 13.2 6
+7.7 220.0 6.71 4.37 5.1 0.0 0.8 0.8 0.0 0.0 749.9 13780637.0 NaN 1261
+4863.0 4011495.3 44407.4 11.3 1016.4 7.7 21.7 12.3 64.8 216.0 6.64 3.
+98 5.31 0.0 0.0 0.0 0.0 0.0 693.8 16278537.0 NaN 14898300.0 5267137.5
+ 47598.7 11.1 1016.1 7.7 22.1 9.8 56.4 224.0 6.72 4.7 4.81 0.0 0.0 0.
+0 0.0 0.0 613.8 18488268.0 NaN 16917212.0 7048784.0 54656.4 11.3 1015
+.9 7.7 21.6 10.3 59.0 223.0 6.52 4.48 4.74 0.0 0.0 0.0 0.0 0.0 508.5
+20319020.0 NaN 18588124.0 8487086.0 52728.9 11.1 1016.0 7.7 20.6 11.8
+ 66.4 217.0 6.22 3.81 4.93 0.0 0.2 0.0 0.0 0.2 387.0 21712000.0 NaN 1
+9859482.0 9532787.0 45898.5 10.5 1016.1 7.7 19.9 12.7 71.8 219.0 5.97
+ 3.74 4.64 0.0 3.3 0.0 0.0 3.3 257.6 22639382.0 NaN 20706258.0 101554
+80.0 39930.0 9.9 1016.4 7.7 19.4 13.2 74.9 221.0 5.25 3.43 3.97 0.0 1
+.8 0.0 0.0 1.8 140.2 23144020.0 NaN 21166678.0 10436559.0 36483.1 9.3
+ 1016.4 7.7 19.2 12.5 73.1 232.0 4.04 3.19 2.44 0.0 6.8 2.7 4.3 0.0 4
+7.7 23315526.0 NaN 21323276.0 10494672.0 38582.3 8.0 1016.3 7.7 19.2
+11.2 68.6 262.0 2.52 2.49 0.34 0.0 53.8 0.0 24.2 39.1 1.8 23322102.0
+NaN 21329584.0 10494672.0 43729.5 6.0 1016.5 7.7 19.0 11.0 68.8 247.0
+ 2.3 2.08 0.98 0.0 100.0 0.0 99.5 100.0 0.1 23322252.0 NaN 21329518.0
+ 10494672.0 43367.8 3.7 1016.7 7.7 18.6 12.0 73.4 225.0 2.3 1.57 1.67
+ 0.0 54.2 0.0 46.3 14.7 0.0 23322252.0 NaN 21329518.0 10494672.0 3821
+9.0 3.6 1016.8 7.7 18.4 14.1 81.9 196.0 2.97 0.75 2.88 0.0 86.4 0.1 8
+2.5 21.9 0.1 23322356.0 NaN 21329518.0 10494672.0 28237.8 4.4 1017.0
+7.7 18.5 14.4 82.5 199.0 3.04 0.87 2.92 0.0 99.7 53.1 65.1 98.6 0.0 2
+3322262.0 NaN 21329518.0 10494672.0 27807.3 5.0 1017.0 7.7 18.7 14.1
+80.8 195.0 2.52 0.57 2.46 0.0 100.0 71.6 23.1 99.9 0.1 23322488.0 NaN
+ 21329518.0 10494672.0 29676.3 4.6 1017.2 7.7 18.8 14.0 80.0 196.0 2.
+96 0.78 2.85 0.18 100.0 100.0 100.0 99.7 0.4 23323732.0 NaN 21330926.
+0 10494672.0 7893.4 4.6 1017.1 7.7 18.6 14.2 81.3 155.0 2.05 -0.98 1.
+8 1.3 93.7 58.4 63.3 57.6 15.0 23377624.0 NaN 21380274.0 10494672.0 2
+9305.2 4.6 1017.1 7.7 18.6 13.8 80.1 170.0 3.61 -0.55 3.56 1.3 92.8 8
+9.9 30.2 1.7 72.1 23637418.0 NaN 21617728.0 10494761.0 31078.9 5.7 10
+17.3 7.7 18.9 14.9 83.3 178.0 2.97 -0.13 2.98 1.3 86.3 55.2 68.3 0.0
+130.9 24108452.0 NaN 22048990.0 10494719.0 27161.4 5.4 1017.5 7.7 19.
+7 15.3 81.4 197.0 2.29 0.48 2.24 1.3 57.2 22.3 44.9 0.0 302.4 2519711
+8.0 NaN 23044054.0 10494678.0 28930.3 4.8 1017.7 7.7 20.4 14.6 76.5 1
+98.0 1.82 0.28 1.78 1.3 66.0 10.1 62.2 0.0 413.4 26685568.0 NaN 24404
+956.0 10494752.0 34279.5 4.0 1017.5 7.7 20.7 13.8 72.6 161.0 2.67 -1.
+21 2.33 1.3 84.8 0.0 84.8 0.0 429.6 28232080.0 NaN 25819190.0 1049545
+1.0 38634.6 4.6 1017.4 7.7 21.2 13.9 71.5 159.0 3.15 -1.39 2.81 1.3 9
+7.7 0.2 97.7 0.0 444.8 29833140.0 NaN 27281036.0 10495527.0 39613.4 5
+.3 1017.6 7.7 22.0 13.7 68.4 161.0 2.93 -1.21 2.62 1.3 86.6 1.1 71.8
+51.9 558.1 31842300.0 NaN 29115360.0 10495682.0 42999.5 5.7 1017.4 7.
+7 23.6 11.8 58.0 143.0 2.81 -1.89 2.04 1.3 5.5 0.0 4.9 0.6 628.6 3410
+5344.0 NaN 31181590.0 10496878.0 53270.7 5.5 1017.3 7.7 23.9 10.7 54.
+4 139.0 3.68 -2.68 2.49 1.3 10.6 0.0 7.9 2.9 673.6 36530392.0 NaN 333
+94718.0 10504540.0 55330.6 6.5 1017.2 7.7 23.6 11.7 57.8 141.0 4.98 -
+3.24 3.81 1.3 37.6 0.0 5.4 34.0 671.6 38948084.0 NaN 35600932.0 10572
+389.0 53759.2 8.3 1017.3 7.7 23.0 12.4 61.2 145.0 5.0 -2.93 4.05 1.3
+24.6 0.0 24.6 0.0 563.3 40976088.0 NaN 37452260.0 10577521.0 51063.5
+8.4 1017.1 7.7 22.9 13.0 63.6 145.0 4.66 -2.77 3.78 1.3 28.6 0.0 7.3
+22.9 465.8 42652852.0 NaN 38982672.0 10578041.0 48385.1 8.3 1017.0 7.
+7 22.8 12.7 62.7 143.0 4.44 -2.77 3.46 1.3 38.8 0.0 34.2 6.9 350.8 43
+915868.0 NaN 40134736.0 10578632.0 49670.0 7.6 1016.7 7.7 22.4 12.8 6
+4.2 138.0 4.17 -2.85 3.03 1.3 10.5 0.0 2.4 8.3 232.2 44751760.0 NaN 4
+0897580.0 10579198.0 48024.7 7.1 1016.5 7.7 21.7 14.3 71.1 128.0 3.16
+ -2.53 1.89 1.3 0.9 0.0 0.0 0.9 131.4 45224988.0 NaN 41329512.0 10638
+878.0 40243.3 6.6 1016.6 7.7 21.2 14.9 74.9 124.0 2.97 -2.45 1.71 1.3
+ 25.2 0.0 2.4 23.3 44.4 45384908.0 NaN 41475220.0 10681477.0 35910.2
+4.9 1016.4 7.7 20.9 15.2 77.1 119.0 2.62 -2.3 1.25 1.3 6.4 0.0 5.8 0.
+7 1.6 45390616.0 NaN 41480160.0 10681174.0 33633.9 4.5 1016.3 7.7 20.
+6 15.2 78.0 77.0 3.04 -2.98 -0.49 1.3 1.5 0.0 1.0 0.4 0.1 45390552.0
+NaN 41479900.0 10681174.0 32898.9 4.4 1016.3 7.7 19.8 15.5 82.0 83.0
+3.6 -3.54 -0.59 1.3 39.6 1.5 0.0 38.6 0.1 45390516.0 NaN 41479900.0 1
+0681174.0 28384.6 5.4 1016.2 7.7 19.8 15.2 80.9 80.0 3.99 -3.89 -0.88
+ 1.3 79.8 0.0 0.1 79.8 0.0 45390256.0 NaN 41479900.0 10681174.0 29798
+.2 6.2 1016.1 7.7 19.9 14.5 78.0 83.0 4.55 -4.5 -0.65 1.3 72.8 1.8 5.
+5 70.7 0.1 45390380.0 NaN 41479900.0 10681174.0 33125.9 6.9 1015.7 7.
+7 19.7 14.8 79.8 79.0 4.93 -4.83 -1.06 1.3 82.5 4.1 0.9 81.6 0.1 4539
+0640.0 NaN 41479900.0 10681174.0 31134.5 7.6 1015.4 7.7 19.6 15.2 81.
+5 85.0 5.18 -5.15 -0.54 1.3 60.2 1.5 9.8 55.2 2.6 45399888.0 NaN 4148
+8504.0 10681174.0 29202.1 8.0 1015.0 7.7 19.7 16.0 84.4 90.0 5.29 -5.
+28 -0.08 1.3 72.2 0.4 15.8 66.8 32.6 45517296.0 NaN 41595768.0 106811
+74.0 25601.0 8.5 1014.9 7.7 19.9 16.2 84.5 91.0 5.63 -5.62 -0.07 1.3
+62.2 0.1 3.6 60.7 99.1 45873920.0 NaN 41921652.0 10681188.0 25504.3 8
+.9 1014.7 7.7 20.5 16.2 82.2 97.0 5.27 -5.26 0.48 1.3 93.6 2.4 5.9 93
+.0 169.2 46483036.0 NaN 42477320.0 10681384.0 28098.3 8.7 1014.6 7.7
+20.8 16.1 80.8 100.0 5.36 -5.29 0.87 1.3 99.7 2.6 12.5 99.7 196.9 471
+91680.0 NaN 43123740.0 10681074.0 29718.5 8.7 1014.2 7.7 21.7 16.3 78
+.2 101.0 5.17 -5.08 0.9 1.3 100.0 0.0 5.8 100.0 290.9 48239060.0 NaN
+44080204.0 10681266.0 32552.5 8.4 1014.1 7.7 22.3 16.4 76.7 102.0 5.6
+9 -5.59 1.04 1.3 100.0 0.0 0.3 100.0 344.6 49479400.0 NaN 45212932.0
+10694658.0 34180.5 9.1 1013.5 7.7 22.0 16.4 77.6 94.0 6.21 -6.2 0.29
+1.3 100.0 0.0 14.7 100.0 258.3 50409360.0 NaN 46061948.0 10725906.0 3
+3338.7 9.7 1013.5 7.7 21.4 15.2 75.4 95.0 6.7 -6.69 0.48 1.3 100.0 0.
+0 79.6 100.0 115.0 50823272.0 NaN 46440372.0 10725795.0 35955.3 10.4
+1013.1 7.7 22.0 15.2 73.4 91.0 6.86 -6.86 0.07 1.3 99.5 0.0 78.3 97.8
+ 235.6 51671476.0 NaN 47214844.0 10727690.0 38242.3 10.8';

my @fields = split / /, \$str;
my \$NaNcount = grep { \$_ eq 'NaN' } @fields;

print "There are " . scalar @fields .
" fields in the line of which \$NaNcount are NaN.\n";

If you really only want the first four, then split / /, \$str, 5 will bundle all the stuff you don't want into the unused 5th list item.

HTH.

🦛

Re: Repeating a capture group pattern within a pattern
by LanX (Saint) on Jul 15, 2024 at 11:14 UTC
"There is more than one way to do it" ™ depending on your use case.

The "problem" is not that you can't repeat a pattern in Perl, but that only the last captures are kept for explicit (...) groups.

One way is a code section to store the current capture groups.

Another to create explicit captures.

``` DB<25> \$_='1016.1 7.7 NaN -20.6 3.8 72.9 215.0'

DB<26> \$pat = qr(NaN|-?\d+\.\d)

DB<27> x m/(\$pat)/g
0  1016.1
1  7.7
2  'NaN'
3  '-20.6'
4  3.8
5  72.9
6  215.0
DB<28> x m/(\$pat)(?:\s|\$)/g
0  1016.1
1  7.7
2  'NaN'
3  '-20.6'
4  3.8
5  72.9
6  215.0
DB<29> x (m/(\$pat)(?:\s|\$)/g)[0..3]
0  1016.1
1  7.7
2  'NaN'
3  '-20.6'

...

DB<33> x m/(?:(\$pat)(?:\s|\$)){4}/
0  '-20.6'
DB<34> x m/(?:(\$pat)(?:\s|\$)(?{push @a,\$1})){4}/
0  '-20.6'
DB<35> x @a
0  1016.1
1  7.7
2  'NaN'
3  '-20.6'
DB<36>
...
DB<47> \$delim =  '(?:\s|\$)'

DB<48> p \$explicit=  "(\$pat)\$delim" x 4
((?^u:NaN|-?\d+\.\d))(?:\s|\$)((?^u:NaN|-?\d+\.\d))(?:\s|\$)((?^u:NaN|-?
+\d+\.\d))(?:\s|\$)((?^u:NaN|-?\d+\.\d))(?:\s|\$)
DB<49> x m/\$explicit/g
0  1016.1
1  7.7
2  'NaN'
3  '-20.6'
DB<50>

Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery

Re: Repeating a capture group pattern within a pattern
by talexb (Chancellor) on Jul 16, 2024 at 00:25 UTC

To me, the simpler solution would just be to split on a space, then use the regex on each of the four elements.

It's also possible that I'm missing something.

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Repeating a capture group pattern within a pattern
by Anonymous Monk on Jul 18, 2024 at 05:46 UTC
You're over thinking it.
```my \$x = "0.01 NaN 2.30 4.44";
```
Match the form rather than the content of the data:
```my \$r1 = qr/(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/;
```
Or ditch the pattern and split, as others suggest:
```my (\$d, \$e, \$f, \$g) = split /\s+/, \$x;
```
Re: Repeating a capture group pattern within a pattern
by WithABeard (Beadle) on Jul 25, 2024 at 11:08 UTC

Maybe I'm missing something, but this doesn't seem too difficult:

```> perl -e 'my \$x = "0.01 NaN 2.30 4.44";
my (\$d, \$e, \$f, \$g) = (\$x =~ /([Na0-9\.\-\0]+\b)/g);
print "d: \$d, e: \$e, f: \$f, g: \$g";'

output:

```d: 0.01, e: NaN, f: 2.30, g: 4.44

the /g flag makes it return a list of all matches.

I changed \s+ to \b (word-boundary) since the last piece doesn't have a space after it

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11160609]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-09-07 18:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?
 • erzuuli ‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.