Re^4: Variable matching on a regex

Because /g is just the repetition of the regex, and it may not be possible in some circumstances, I think

It could be perfectly feasible to have a regex like:

      /^\w+\s+(?:(\d+)\s+){3}\w+$/
[download]

So I want a word, 3 group of digits and a word. But right now I don't know how to get the 3 values for the group of numbers. So I usually do something like:

      /^\w+\s+(\d+)\s+(\d+)\s+(\d+)\s+\w+$/
[download]

that doesn't look so good. In extreme cases n can be far bigger than 3 and I should get the whole string of numbers and then split them. Also, if I want instead of just n repetitions to force a threshold ({3,10} for example) then I'm at a loss and I have to implement it in two steps.

I know that may be even clearer than the regex I'm trying to write, but I'm just curious about it. I'd like my regex to fit as as much as possible the format of my input and get the values straightforwardly. Don't know if it can be done though. That's my question.

There's no real problem behind, nor real output either. It's just something I've found several times and I've never been happy with the solutions I've implemented.

Hope it makes more sense now,

Thanks

Comment on Re^4: Variable matching on a regex Select or Download Code

Replies are listed 'Best First'.
Re^5: Variable matching on a regex by Marshall (Canon) on Jun 17, 2010 at 17:17 UTC
I'm not sure that I understand all the questions. It appears to me that you've asked a couple. This question is bit different than the first one. It is of consequnce to note that \w characters are "a-zA-Z0-9_", meaning that any \d is also a \w. Match global is great at repetitive pattern matching! The below shows how to match a "word" followed by some numbers. Enforcing a minimum number of "numbers" after the "word" is easy. The below shows cases where there has to be at least one number or two numbers. The case of enforcing a max is more difficult and I haven't come up with the right syntax. I suppose your intent is that jkl shouldn't appear as there are 5 numbers after that "word", the below shows the first 3 numbers after jkl instead of competely omitting that line as for example xyz was omitted as there aren't any numbers after that "word". I think there is some "look ahead" regex syntax that would solve this problem. But I'm not completely sure that is what you are asking about. `#!/usr/bin/perl -w use strict; my $input = "abc 456 897 xyz www 789 jkl 0123 456 889 3 4 fhg 123"; print "input=$input\n"; my @nums = $input =~ m/([a-zA-Z]+(?:\s+\d+){1,3})/g; print "$_\n" foreach (@nums); #prints: #input=abc 456 897 xyz www 789 jkl 0123 456 889 3 4 fhg 123 #abc 456 897 #www 789 #jkl 0123 456 889 #fhg 123 print "----\n"; @nums = $input =~ m/([a-zA-Z]+(?:\s+\d+){2,3})/g; print "$_\n" foreach (@nums); #prints: #---- #abc 456 897 #jkl 0123 456 889` [download]	[reply] [d/l]
Re^5: Variable matching on a regex by furry_marmot (Pilgrim) on Jun 17, 2010 at 20:01 UTC
I think you're confused on a number of points. First off, you can't have a variable number of regex matches if you don't use /g. So if you want to go beyond hard-coding your regexes, you need to get over it. Second, if you want to name your variables $d1, $d2, etc, you're just contradicting yourself again. You're asking how to know how many variables to create before you know how many matches you'll have. I suppose you could write a bunch of code to eval a string, but using an array is so simple. Third, /g can be used in loop constructs, which allow you to examine your data as you're parsing it. Very simple parsers are very easy to write. For example: `$s = 'abc 1 23 do 456 re 789 me 0123 456 2 23 456 789 0123 456'; push @results, ("This has " . length($1) . " digits: $1") while $s =~ /(\d+)/g; print "$_\n" for @results; # Prints: # This has 1 digits: 1 # This has 2 digits: 23 # This has 3 digits: 456 # This has 3 digits: 789, etc.` [download] Or you can look for more complicated patterns: `$s = '1 23 456 789 0123 456 2 23 456 789 0123 456'; push @results, ("This looks like a word: $1") while $s =~ /((?:\b\d{1, +2}\s+)+\b\d{3,})/g; print "$_\n" for @results; # Prints: # This looks like a word: 1 23 456 # This looks like a word: 2 23 456` [download] It's not really clear from what you've written what you're trying to do. But capturing a varying number of results is not hard if you get over the idea of using named scalars. --marmot	[reply] [d/l] [select]