Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Repeating a capture group pattern within a pattern

by mldvx4 (Friar)
on Jul 15, 2024 at 07:09 UTC ( [id://11160609]=perlquestion: print w/replies, xml ) Need Help??

mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

I am looking to simplify a pattern. If I have a string my $x = "0.01 NaN 2.30 4.44"; then the following pattern finds the items present:

my $r1 = qr/([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)/x;

Notice that the same capture group criteria are repeated. I wonder how I may write that so it is simpler, shorter, and all on one line. Here is some pseudo-code to try to show what I am aiming for: my $r1 = qr/(?=([Na0-9\.\-\+]+)\s+){4}/

However, I've tried that and some permutations without luck:

#!/usr/bin/perl use strict; use warnings; my $x = "0.01 NaN 2.30 4.44"; # the following works as desired my $r1 = qr/([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)/x; my ($d, $e, $f, $g) = ($x =~ m/$r1/x ); print qq($d, $e, $f, $g\n); # the following finds the first number twice my $r2 = qr/(?=(([Na0-9\.\-\+]+)\s*)){4}/x; ($d, $e, $f, $g) = ($x =~ m/$r2/x ); print qq($d, $e, $f, $g\n); # the following finds a null prior to the first item my $r3 = qr/((?=([Na0-9\.\-\+]+)\s*){4})/x; ($d, $e, $f, $g) = ($x =~ m/$r3/x ); print qq($d, $e, $f, $g\n); exit(0);

How can I write that pattern so that the pattern it contains is repeated but not locked into the values found in the very first match? Is this a case for using recursive patterns?

Replies are listed 'Best First'.
Re: Repeating a capture group pattern within a pattern
by Corion (Patriarch) on Jul 15, 2024 at 07:49 UTC

    This might be cheating, but when retrieving multiple repeated matches, I often use /g after validating that the line looks somewhat valid:

    my $re4 = qr/\b([Na0-9\.\-\+]+)\b/; # capture a floating point number my @vals = ($x =~ m/$re4/gx ); croak "Invalid line '$x'" if @vals != 4; ($d, $e, $f, $g) = @vals;

    Often, I first identify the section without capturing and then parse it in a second step (but that's not what you wanted):

    my $float = qr/\b([Na0-9\.\-\+]+)\b/; croak "Invalid line '$x'" if $x !~ /((?:$float(\s+|$)){4}))/; print "Found numbers '$1'\n"; my @vals = $1 =~ /($float)/g;

    I did not find a way to capture the repeated values in one go.

Re: Repeating a capture group pattern within a pattern
by haukex (Archbishop) on Jul 15, 2024 at 10:39 UTC

    Sounds like a case for recursive subpatterns.

    #!/usr/bin/env perl use warnings; use strict; my $x = "0.01 NaN 2.30 4.44"; # the following works as desired my $r1 = qr/([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)\s+ ([Na0-9\.\-\+]+)/x; my ($d, $e, $f, $g) = ($x =~ m/$r1/x ); print "good: $d, $e, $f, $g\n"; # recursive subpatterns my $rx = qr{ ([Na0-9\.\-\+]+) \s+ ((?1)) \s+ ((?1)) \s+ ((?1)) }x; ($d, $e, $f, $g) = $x =~ $rx or die; print "good? $d, $e, $f, $g\n"; # or match / validate first, then split my $ry = qr{ ([Na0-9\.\-\+]+) (?: \s+ (?1)){3} }x; $x =~ $ry or die; ($d, $e, $f, $g) = split ' ', $&; print "good? $d, $e, $f, $g\n";
Re: Repeating a capture group pattern within a pattern
by hippo (Archbishop) on Jul 15, 2024 at 09:21 UTC

    If I were doing this for real I would use /g as Corion suggests. However, for interest's sake, here is one way to construct the pattern without repetition in the code.

    #!/usr/bin/env perl use strict; use warnings; my $x = "0.01 NaN 2.30 4.44"; my $r1 = join '\s+', ('([Na0-9\.\-\+]+)') x 4; print "r1 is '$r1'\n"; my ($d, $e, $f, $g) = ($x =~ m/$r1/); print qq($d, $e, $f, $g\n);

    Note that you don't need all those backslashes so the inner character class can be shortened to just [Na0-9.+-] but it has no effect on the end result.

    If this is an XY Problem, then perhaps if you explain what you are actually trying to do someone could suggest a better approach.


    🦛

      "If this is an XY Problem, then perhaps if you explain what you are actually trying to do someone could suggest a better approach."

      Thanks. I am figuring out a way to parse the Finnish Meteorological Institute's abomination of a data file. It is like 10 pages of empty XML tags followed by gems like these:

      <gml:DataBlock> <gml:rangeParameters/> <gml:doubleOrNilReasonTupleList> 1016.1 7.7 20.6 13.8 72.9 215.0 6.64 3.94 5.33 0.0 31.3 31.3 0.0 0.0 7 +16.7 8461717.0 NaN 7750386.0 2224256.0 38507.5 10.9 1016.4 7.7 21.1 1 +3.7 71.1 222.0 6.82 4.67 4.95 0.0 2.1 2.1 0.0 0.0 727.6 11081057.0 Na +N 10146792.0 2372829.8 40682.2 11.2 1016.4 7.7 21.6 13.2 67.7 220.0 6 +.71 4.37 5.1 0.0 0.8 0.8 0.0 0.0 749.9 13780637.0 NaN 12614863.0 4011 +495.3 44407.4 11.3 1016.4 7.7 21.7 12.3 64.8 216.0 6.64 3.98 5.31 0.0 + 0.0 0.0 0.0 0.0 693.8 16278537.0 NaN 14898300.0 5267137.5 47598.7 11 +.1 1016.1 7.7 22.1 9.8 56.4 224.0 6.72 4.7 4.81 0.0 0.0 0.0 0.0 0.0 6 +13.8 18488268.0 NaN 16917212.0 7048784.0 54656.4 11.3 1015.9 7.7 21.6 + 10.3 59.0 223.0 6.52 4.48 4.74 0.0 0.0 0.0 0.0 0.0 508.5 20319020.0 +NaN 18588124.0 8487086.0 52728.9 11.1 1016.0 7.7 20.6 11.8 66.4 217.0 + 6.22 3.81 4.93 0.0 0.2 0.0 0.0 0.2 387.0 21712000.0 NaN 19859482.0 9 +532787.0 45898.5 10.5 1016.1 7.7 19.9 12.7 71.8 219.0 5.97 3.74 4.64 +0.0 3.3 0.0 0.0 3.3 257.6 22639382.0 NaN 20706258.0 10155480.0 39930. +0 9.9 1016.4 7.7 19.4 13.2 74.9 221.0 5.25 3.43 3.97 0.0 1.8 0.0 0.0 +1.8 140.2 23144020.0 NaN 21166678.0 10436559.0 36483.1 9.3 1016.4 7.7 + 19.2 12.5 73.1 232.0 4.04 3.19 2.44 0.0 6.8 2.7 4.3 0.0 47.7 2331552 +6.0 NaN 21323276.0 10494672.0 38582.3 8.0 1016.3 7.7 19.2 11.2 68.6 2 +62.0 2.52 2.49 0.34 0.0 53.8 0.0 24.2 39.1 1.8 23322102.0 NaN 2132958 +4.0 10494672.0 43729.5 6.0 1016.5 7.7 19.0 11.0 68.8 247.0 2.3 2.08 0 +.98 0.0 100.0 0.0 99.5 100.0 0.1 23322252.0 NaN 21329518.0 10494672.0 + 43367.8 3.7 1016.7 7.7 18.6 12.0 73.4 225.0 2.3 1.57 1.67 0.0 54.2 0 +.0 46.3 14.7 0.0 23322252.0 NaN 21329518.0 10494672.0 38219.0 3.6 101 +6.8 7.7 18.4 14.1 81.9 196.0 2.97 0.75 2.88 0.0 86.4 0.1 82.5 21.9 0. +1 23322356.0 NaN 21329518.0 10494672.0 28237.8 4.4 1017.0 7.7 18.5 14 +.4 82.5 199.0 3.04 0.87 2.92 0.0 99.7 53.1 65.1 98.6 0.0 23322262.0 N +aN 21329518.0 10494672.0 27807.3 5.0 1017.0 7.7 18.7 14.1 80.8 195.0 +2.52 0.57 2.46 0.0 100.0 71.6 23.1 99.9 0.1 23322488.0 NaN 21329518.0 + 10494672.0 29676.3 4.6 1017.2 7.7 18.8 14.0 80.0 196.0 2.96 0.78 2.8 +5 0.18 100.0 100.0 100.0 99.7 0.4 23323732.0 NaN 21330926.0 10494672. +0 7893.4 4.6 1017.1 7.7 18.6 14.2 81.3 155.0 2.05 -0.98 1.8 1.3 93.7 +58.4 63.3 57.6 15.0 23377624.0 NaN 21380274.0 10494672.0 29305.2 4.6 +1017.1 7.7 18.6 13.8 80.1 170.0 3.61 -0.55 3.56 1.3 92.8 89.9 30.2 1. +7 72.1 23637418.0 NaN 21617728.0 10494761.0 31078.9 5.7 1017.3 7.7 18 +.9 14.9 83.3 178.0 2.97 -0.13 2.98 1.3 86.3 55.2 68.3 0.0 130.9 24108 +452.0 NaN 22048990.0 10494719.0 27161.4 5.4 1017.5 7.7 19.7 15.3 81.4 + 197.0 2.29 0.48 2.24 1.3 57.2 22.3 44.9 0.0 302.4 25197118.0 NaN 230 +44054.0 10494678.0 28930.3 4.8 1017.7 7.7 20.4 14.6 76.5 198.0 1.82 0 +.28 1.78 1.3 66.0 10.1 62.2 0.0 413.4 26685568.0 NaN 24404956.0 10494 +752.0 34279.5 4.0 1017.5 7.7 20.7 13.8 72.6 161.0 2.67 -1.21 2.33 1.3 + 84.8 0.0 84.8 0.0 429.6 28232080.0 NaN 25819190.0 10495451.0 38634.6 + 4.6 1017.4 7.7 21.2 13.9 71.5 159.0 3.15 -1.39 2.81 1.3 97.7 0.2 97. +7 0.0 444.8 29833140.0 NaN 27281036.0 10495527.0 39613.4 5.3 1017.6 7 +.7 22.0 13.7 68.4 161.0 2.93 -1.21 2.62 1.3 86.6 1.1 71.8 51.9 558.1 +31842300.0 NaN 29115360.0 10495682.0 42999.5 5.7 1017.4 7.7 23.6 11.8 + 58.0 143.0 2.81 -1.89 2.04 1.3 5.5 0.0 4.9 0.6 628.6 34105344.0 NaN +31181590.0 10496878.0 53270.7 5.5 1017.3 7.7 23.9 10.7 54.4 139.0 3.6 +8 -2.68 2.49 1.3 10.6 0.0 7.9 2.9 673.6 36530392.0 NaN 33394718.0 105 +04540.0 55330.6 6.5 1017.2 7.7 23.6 11.7 57.8 141.0 4.98 -3.24 3.81 1 +.3 37.6 0.0 5.4 34.0 671.6 38948084.0 NaN 35600932.0 10572389.0 53759 +.2 8.3 1017.3 7.7 23.0 12.4 61.2 145.0 5.0 -2.93 4.05 1.3 24.6 0.0 24 +.6 0.0 563.3 40976088.0 NaN 37452260.0 10577521.0 51063.5 8.4 1017.1 +7.7 22.9 13.0 63.6 145.0 4.66 -2.77 3.78 1.3 28.6 0.0 7.3 22.9 465.8 +42652852.0 NaN 38982672.0 10578041.0 48385.1 8.3 1017.0 7.7 22.8 12.7 + 62.7 143.0 4.44 -2.77 3.46 1.3 38.8 0.0 34.2 6.9 350.8 43915868.0 Na +N 40134736.0 10578632.0 49670.0 7.6 1016.7 7.7 22.4 12.8 64.2 138.0 4 +.17 -2.85 3.03 1.3 10.5 0.0 2.4 8.3 232.2 44751760.0 NaN 40897580.0 1 +0579198.0 48024.7 7.1 1016.5 7.7 21.7 14.3 71.1 128.0 3.16 -2.53 1.89 + 1.3 0.9 0.0 0.0 0.9 131.4 45224988.0 NaN 41329512.0 10638878.0 40243 +.3 6.6 1016.6 7.7 21.2 14.9 74.9 124.0 2.97 -2.45 1.71 1.3 25.2 0.0 2 +.4 23.3 44.4 45384908.0 NaN 41475220.0 10681477.0 35910.2 4.9 1016.4 +7.7 20.9 15.2 77.1 119.0 2.62 -2.3 1.25 1.3 6.4 0.0 5.8 0.7 1.6 45390 +616.0 NaN 41480160.0 10681174.0 33633.9 4.5 1016.3 7.7 20.6 15.2 78.0 + 77.0 3.04 -2.98 -0.49 1.3 1.5 0.0 1.0 0.4 0.1 45390552.0 NaN 4147990 +0.0 10681174.0 32898.9 4.4 1016.3 7.7 19.8 15.5 82.0 83.0 3.6 -3.54 - +0.59 1.3 39.6 1.5 0.0 38.6 0.1 45390516.0 NaN 41479900.0 10681174.0 2 +8384.6 5.4 1016.2 7.7 19.8 15.2 80.9 80.0 3.99 -3.89 -0.88 1.3 79.8 0 +.0 0.1 79.8 0.0 45390256.0 NaN 41479900.0 10681174.0 29798.2 6.2 1016 +.1 7.7 19.9 14.5 78.0 83.0 4.55 -4.5 -0.65 1.3 72.8 1.8 5.5 70.7 0.1 +45390380.0 NaN 41479900.0 10681174.0 33125.9 6.9 1015.7 7.7 19.7 14.8 + 79.8 79.0 4.93 -4.83 -1.06 1.3 82.5 4.1 0.9 81.6 0.1 45390640.0 NaN +41479900.0 10681174.0 31134.5 7.6 1015.4 7.7 19.6 15.2 81.5 85.0 5.18 + -5.15 -0.54 1.3 60.2 1.5 9.8 55.2 2.6 45399888.0 NaN 41488504.0 1068 +1174.0 29202.1 8.0 1015.0 7.7 19.7 16.0 84.4 90.0 5.29 -5.28 -0.08 1. +3 72.2 0.4 15.8 66.8 32.6 45517296.0 NaN 41595768.0 10681174.0 25601. +0 8.5 1014.9 7.7 19.9 16.2 84.5 91.0 5.63 -5.62 -0.07 1.3 62.2 0.1 3. +6 60.7 99.1 45873920.0 NaN 41921652.0 10681188.0 25504.3 8.9 1014.7 7 +.7 20.5 16.2 82.2 97.0 5.27 -5.26 0.48 1.3 93.6 2.4 5.9 93.0 169.2 46 +483036.0 NaN 42477320.0 10681384.0 28098.3 8.7 1014.6 7.7 20.8 16.1 8 +0.8 100.0 5.36 -5.29 0.87 1.3 99.7 2.6 12.5 99.7 196.9 47191680.0 NaN + 43123740.0 10681074.0 29718.5 8.7 1014.2 7.7 21.7 16.3 78.2 101.0 5. +17 -5.08 0.9 1.3 100.0 0.0 5.8 100.0 290.9 48239060.0 NaN 44080204.0 +10681266.0 32552.5 8.4 1014.1 7.7 22.3 16.4 76.7 102.0 5.69 -5.59 1.0 +4 1.3 100.0 0.0 0.3 100.0 344.6 49479400.0 NaN 45212932.0 10694658.0 +34180.5 9.1 1013.5 7.7 22.0 16.4 77.6 94.0 6.21 -6.2 0.29 1.3 100.0 0 +.0 14.7 100.0 258.3 50409360.0 NaN 46061948.0 10725906.0 33338.7 9.7 +1013.5 7.7 21.4 15.2 75.4 95.0 6.7 -6.69 0.48 1.3 100.0 0.0 79.6 100. +0 115.0 50823272.0 NaN 46440372.0 10725795.0 35955.3 10.4 1013.1 7.7 +22.0 15.2 73.4 91.0 6.86 -6.86 0.07 1.3 99.5 0.0 78.3 97.8 235.6 5167 +1476.0 NaN 47214844.0 10727690.0 38242.3 10.8 </gml:doubleOrNilReasonTupleList> </gml:DataBlock>

      I've written them several times over the years and they have not fixed their data nor deigned to even reply.

        That is pretty horrible but at least it appears well-formed whereby it is a space-separated list of decimal numbers and NaNs. I would probably just split it and go from there, TBH.

        #!/usr/bin/env perl use strict; use warnings; my $str = '1016.1 7.7 20.6 13.8 72.9 215.0 6.64 3.94 5.33 0.0 31.3 31. +3 0.0 0.0 716.7 8461717.0 NaN 7750386.0 2224256.0 38507.5 10.9 1016.4 + 7.7 21.1 13.7 71.1 222.0 6.82 4.67 4.95 0.0 2.1 2.1 0.0 0.0 727.6 11 +081057.0 NaN 10146792.0 2372829.8 40682.2 11.2 1016.4 7.7 21.6 13.2 6 +7.7 220.0 6.71 4.37 5.1 0.0 0.8 0.8 0.0 0.0 749.9 13780637.0 NaN 1261 +4863.0 4011495.3 44407.4 11.3 1016.4 7.7 21.7 12.3 64.8 216.0 6.64 3. +98 5.31 0.0 0.0 0.0 0.0 0.0 693.8 16278537.0 NaN 14898300.0 5267137.5 + 47598.7 11.1 1016.1 7.7 22.1 9.8 56.4 224.0 6.72 4.7 4.81 0.0 0.0 0. +0 0.0 0.0 613.8 18488268.0 NaN 16917212.0 7048784.0 54656.4 11.3 1015 +.9 7.7 21.6 10.3 59.0 223.0 6.52 4.48 4.74 0.0 0.0 0.0 0.0 0.0 508.5 +20319020.0 NaN 18588124.0 8487086.0 52728.9 11.1 1016.0 7.7 20.6 11.8 + 66.4 217.0 6.22 3.81 4.93 0.0 0.2 0.0 0.0 0.2 387.0 21712000.0 NaN 1 +9859482.0 9532787.0 45898.5 10.5 1016.1 7.7 19.9 12.7 71.8 219.0 5.97 + 3.74 4.64 0.0 3.3 0.0 0.0 3.3 257.6 22639382.0 NaN 20706258.0 101554 +80.0 39930.0 9.9 1016.4 7.7 19.4 13.2 74.9 221.0 5.25 3.43 3.97 0.0 1 +.8 0.0 0.0 1.8 140.2 23144020.0 NaN 21166678.0 10436559.0 36483.1 9.3 + 1016.4 7.7 19.2 12.5 73.1 232.0 4.04 3.19 2.44 0.0 6.8 2.7 4.3 0.0 4 +7.7 23315526.0 NaN 21323276.0 10494672.0 38582.3 8.0 1016.3 7.7 19.2 +11.2 68.6 262.0 2.52 2.49 0.34 0.0 53.8 0.0 24.2 39.1 1.8 23322102.0 +NaN 21329584.0 10494672.0 43729.5 6.0 1016.5 7.7 19.0 11.0 68.8 247.0 + 2.3 2.08 0.98 0.0 100.0 0.0 99.5 100.0 0.1 23322252.0 NaN 21329518.0 + 10494672.0 43367.8 3.7 1016.7 7.7 18.6 12.0 73.4 225.0 2.3 1.57 1.67 + 0.0 54.2 0.0 46.3 14.7 0.0 23322252.0 NaN 21329518.0 10494672.0 3821 +9.0 3.6 1016.8 7.7 18.4 14.1 81.9 196.0 2.97 0.75 2.88 0.0 86.4 0.1 8 +2.5 21.9 0.1 23322356.0 NaN 21329518.0 10494672.0 28237.8 4.4 1017.0 +7.7 18.5 14.4 82.5 199.0 3.04 0.87 2.92 0.0 99.7 53.1 65.1 98.6 0.0 2 +3322262.0 NaN 21329518.0 10494672.0 27807.3 5.0 1017.0 7.7 18.7 14.1 +80.8 195.0 2.52 0.57 2.46 0.0 100.0 71.6 23.1 99.9 0.1 23322488.0 NaN + 21329518.0 10494672.0 29676.3 4.6 1017.2 7.7 18.8 14.0 80.0 196.0 2. +96 0.78 2.85 0.18 100.0 100.0 100.0 99.7 0.4 23323732.0 NaN 21330926. +0 10494672.0 7893.4 4.6 1017.1 7.7 18.6 14.2 81.3 155.0 2.05 -0.98 1. +8 1.3 93.7 58.4 63.3 57.6 15.0 23377624.0 NaN 21380274.0 10494672.0 2 +9305.2 4.6 1017.1 7.7 18.6 13.8 80.1 170.0 3.61 -0.55 3.56 1.3 92.8 8 +9.9 30.2 1.7 72.1 23637418.0 NaN 21617728.0 10494761.0 31078.9 5.7 10 +17.3 7.7 18.9 14.9 83.3 178.0 2.97 -0.13 2.98 1.3 86.3 55.2 68.3 0.0 +130.9 24108452.0 NaN 22048990.0 10494719.0 27161.4 5.4 1017.5 7.7 19. +7 15.3 81.4 197.0 2.29 0.48 2.24 1.3 57.2 22.3 44.9 0.0 302.4 2519711 +8.0 NaN 23044054.0 10494678.0 28930.3 4.8 1017.7 7.7 20.4 14.6 76.5 1 +98.0 1.82 0.28 1.78 1.3 66.0 10.1 62.2 0.0 413.4 26685568.0 NaN 24404 +956.0 10494752.0 34279.5 4.0 1017.5 7.7 20.7 13.8 72.6 161.0 2.67 -1. +21 2.33 1.3 84.8 0.0 84.8 0.0 429.6 28232080.0 NaN 25819190.0 1049545 +1.0 38634.6 4.6 1017.4 7.7 21.2 13.9 71.5 159.0 3.15 -1.39 2.81 1.3 9 +7.7 0.2 97.7 0.0 444.8 29833140.0 NaN 27281036.0 10495527.0 39613.4 5 +.3 1017.6 7.7 22.0 13.7 68.4 161.0 2.93 -1.21 2.62 1.3 86.6 1.1 71.8 +51.9 558.1 31842300.0 NaN 29115360.0 10495682.0 42999.5 5.7 1017.4 7. +7 23.6 11.8 58.0 143.0 2.81 -1.89 2.04 1.3 5.5 0.0 4.9 0.6 628.6 3410 +5344.0 NaN 31181590.0 10496878.0 53270.7 5.5 1017.3 7.7 23.9 10.7 54. +4 139.0 3.68 -2.68 2.49 1.3 10.6 0.0 7.9 2.9 673.6 36530392.0 NaN 333 +94718.0 10504540.0 55330.6 6.5 1017.2 7.7 23.6 11.7 57.8 141.0 4.98 - +3.24 3.81 1.3 37.6 0.0 5.4 34.0 671.6 38948084.0 NaN 35600932.0 10572 +389.0 53759.2 8.3 1017.3 7.7 23.0 12.4 61.2 145.0 5.0 -2.93 4.05 1.3 +24.6 0.0 24.6 0.0 563.3 40976088.0 NaN 37452260.0 10577521.0 51063.5 +8.4 1017.1 7.7 22.9 13.0 63.6 145.0 4.66 -2.77 3.78 1.3 28.6 0.0 7.3 +22.9 465.8 42652852.0 NaN 38982672.0 10578041.0 48385.1 8.3 1017.0 7. +7 22.8 12.7 62.7 143.0 4.44 -2.77 3.46 1.3 38.8 0.0 34.2 6.9 350.8 43 +915868.0 NaN 40134736.0 10578632.0 49670.0 7.6 1016.7 7.7 22.4 12.8 6 +4.2 138.0 4.17 -2.85 3.03 1.3 10.5 0.0 2.4 8.3 232.2 44751760.0 NaN 4 +0897580.0 10579198.0 48024.7 7.1 1016.5 7.7 21.7 14.3 71.1 128.0 3.16 + -2.53 1.89 1.3 0.9 0.0 0.0 0.9 131.4 45224988.0 NaN 41329512.0 10638 +878.0 40243.3 6.6 1016.6 7.7 21.2 14.9 74.9 124.0 2.97 -2.45 1.71 1.3 + 25.2 0.0 2.4 23.3 44.4 45384908.0 NaN 41475220.0 10681477.0 35910.2 +4.9 1016.4 7.7 20.9 15.2 77.1 119.0 2.62 -2.3 1.25 1.3 6.4 0.0 5.8 0. +7 1.6 45390616.0 NaN 41480160.0 10681174.0 33633.9 4.5 1016.3 7.7 20. +6 15.2 78.0 77.0 3.04 -2.98 -0.49 1.3 1.5 0.0 1.0 0.4 0.1 45390552.0 +NaN 41479900.0 10681174.0 32898.9 4.4 1016.3 7.7 19.8 15.5 82.0 83.0 +3.6 -3.54 -0.59 1.3 39.6 1.5 0.0 38.6 0.1 45390516.0 NaN 41479900.0 1 +0681174.0 28384.6 5.4 1016.2 7.7 19.8 15.2 80.9 80.0 3.99 -3.89 -0.88 + 1.3 79.8 0.0 0.1 79.8 0.0 45390256.0 NaN 41479900.0 10681174.0 29798 +.2 6.2 1016.1 7.7 19.9 14.5 78.0 83.0 4.55 -4.5 -0.65 1.3 72.8 1.8 5. +5 70.7 0.1 45390380.0 NaN 41479900.0 10681174.0 33125.9 6.9 1015.7 7. +7 19.7 14.8 79.8 79.0 4.93 -4.83 -1.06 1.3 82.5 4.1 0.9 81.6 0.1 4539 +0640.0 NaN 41479900.0 10681174.0 31134.5 7.6 1015.4 7.7 19.6 15.2 81. +5 85.0 5.18 -5.15 -0.54 1.3 60.2 1.5 9.8 55.2 2.6 45399888.0 NaN 4148 +8504.0 10681174.0 29202.1 8.0 1015.0 7.7 19.7 16.0 84.4 90.0 5.29 -5. +28 -0.08 1.3 72.2 0.4 15.8 66.8 32.6 45517296.0 NaN 41595768.0 106811 +74.0 25601.0 8.5 1014.9 7.7 19.9 16.2 84.5 91.0 5.63 -5.62 -0.07 1.3 +62.2 0.1 3.6 60.7 99.1 45873920.0 NaN 41921652.0 10681188.0 25504.3 8 +.9 1014.7 7.7 20.5 16.2 82.2 97.0 5.27 -5.26 0.48 1.3 93.6 2.4 5.9 93 +.0 169.2 46483036.0 NaN 42477320.0 10681384.0 28098.3 8.7 1014.6 7.7 +20.8 16.1 80.8 100.0 5.36 -5.29 0.87 1.3 99.7 2.6 12.5 99.7 196.9 471 +91680.0 NaN 43123740.0 10681074.0 29718.5 8.7 1014.2 7.7 21.7 16.3 78 +.2 101.0 5.17 -5.08 0.9 1.3 100.0 0.0 5.8 100.0 290.9 48239060.0 NaN +44080204.0 10681266.0 32552.5 8.4 1014.1 7.7 22.3 16.4 76.7 102.0 5.6 +9 -5.59 1.04 1.3 100.0 0.0 0.3 100.0 344.6 49479400.0 NaN 45212932.0 +10694658.0 34180.5 9.1 1013.5 7.7 22.0 16.4 77.6 94.0 6.21 -6.2 0.29 +1.3 100.0 0.0 14.7 100.0 258.3 50409360.0 NaN 46061948.0 10725906.0 3 +3338.7 9.7 1013.5 7.7 21.4 15.2 75.4 95.0 6.7 -6.69 0.48 1.3 100.0 0. +0 79.6 100.0 115.0 50823272.0 NaN 46440372.0 10725795.0 35955.3 10.4 +1013.1 7.7 22.0 15.2 73.4 91.0 6.86 -6.86 0.07 1.3 99.5 0.0 78.3 97.8 + 235.6 51671476.0 NaN 47214844.0 10727690.0 38242.3 10.8'; my @fields = split / /, $str; my $NaNcount = grep { $_ eq 'NaN' } @fields; print "There are " . scalar @fields . " fields in the line of which $NaNcount are NaN.\n";

        If you really only want the first four, then split / /, $str, 5 will bundle all the stuff you don't want into the unused 5th list item.

        HTH.


        🦛

Re: Repeating a capture group pattern within a pattern
by LanX (Saint) on Jul 15, 2024 at 11:14 UTC
    "There is more than one way to do it" ™ depending on your use case.

    The "problem" is not that you can't repeat a pattern in Perl, but that only the last captures are kept for explicit (...) groups.

    One way is a code section to store the current capture groups.

    Another to create explicit captures.

    DB<25> $_='1016.1 7.7 NaN -20.6 3.8 72.9 215.0' DB<26> $pat = qr(NaN|-?\d+\.\d) DB<27> x m/($pat)/g 0 1016.1 1 7.7 2 'NaN' 3 '-20.6' 4 3.8 5 72.9 6 215.0 DB<28> x m/($pat)(?:\s|$)/g 0 1016.1 1 7.7 2 'NaN' 3 '-20.6' 4 3.8 5 72.9 6 215.0 DB<29> x (m/($pat)(?:\s|$)/g)[0..3] 0 1016.1 1 7.7 2 'NaN' 3 '-20.6' ... DB<33> x m/(?:($pat)(?:\s|$)){4}/ 0 '-20.6' DB<34> x m/(?:($pat)(?:\s|$)(?{push @a,$1})){4}/ 0 '-20.6' DB<35> x @a 0 1016.1 1 7.7 2 'NaN' 3 '-20.6' DB<36> ... DB<47> $delim = '(?:\s|$)' DB<48> p $explicit= "($pat)$delim" x 4 ((?^u:NaN|-?\d+\.\d))(?:\s|$)((?^u:NaN|-?\d+\.\d))(?:\s|$)((?^u:NaN|-? +\d+\.\d))(?:\s|$)((?^u:NaN|-?\d+\.\d))(?:\s|$) DB<49> x m/$explicit/g 0 1016.1 1 7.7 2 'NaN' 3 '-20.6' DB<50>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

Re: Repeating a capture group pattern within a pattern
by talexb (Chancellor) on Jul 16, 2024 at 00:25 UTC

    To me, the simpler solution would just be to split on a space, then use the regex on each of the four elements.

    It's also possible that I'm missing something.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Repeating a capture group pattern within a pattern
by Anonymous Monk on Jul 18, 2024 at 05:46 UTC
    You're over thinking it.
    my $x = "0.01 NaN 2.30 4.44";
    
    Match the form rather than the content of the data:
    my $r1 = qr/(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/;
    
    Or ditch the pattern and split, as others suggest:
    my ($d, $e, $f, $g) = split /\s+/, $x;
    
Re: Repeating a capture group pattern within a pattern
by WithABeard (Beadle) on Jul 25, 2024 at 11:08 UTC

    Maybe I'm missing something, but this doesn't seem too difficult:

    > perl -e 'my $x = "0.01 NaN 2.30 4.44"; my ($d, $e, $f, $g) = ($x =~ /([Na0-9\.\-\0]+\b)/g); print "d: $d, e: $e, f: $f, g: $g";'

    output:

    d: 0.01, e: NaN, f: 2.30, g: 4.44

    the /g flag makes it return a list of all matches.

    I changed \s+ to \b (word-boundary) since the last piece doesn't have a space after it

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11160609]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-09-07 18:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.