RegEx Question

yoda54 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: RegEx Question by toolic (Bishop) on Sep 09, 2009 at 18:46 UTC
match a range of numbers with each digit just matching once? Please clarify the question by providing some sample input and some desired output. I've tried something like ^0-9{1,1} but its not working. Please use 'code' tags around your code because it renders poorly. See Writeup Formatting Tips. Here is what your regular expression means, according to YAPE::Regex::Explain: use warnings; use strict; use YAPE::Regex::Explain; print YAPE::Regex::Explain->new('[^0-9]{1,1}')->explain(); __END__ The regular expression: (?-imsx:[^0-9]{1,1}) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- [^0-9]{1,1} any character except: '0' to '9' (between 1 and 1 times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l]
Re^2: RegEx Question by yoda54 (Monk) on Sep 09, 2009 at 20:06 UTC
Thanks for the reply. I like to match permutations of 0-8 with each digits occurring once using all 8 digits. I've tried something like the follow but I can't get it straight... :( `[0-8]{1, 1}` [download]	[reply] [d/l]
Re^3: RegEx Question by CountZero (Bishop) on Sep 09, 2009 at 21:28 UTC
`[0-8]{1, 1}` matches one digit out of the set {0 1 2 3 4 5 6 7 8}, so it doesn't do what you need at all. Actually, I am not sure that you can do this in one regex. I will show a simpler example only using permutations of the digits 0 - 3 (to make it easier to follow): First you have to check that your string contains 4 digits in the range 0 - 3. That one is easy: `/^[0-3]{4}$/` If your string passes this test, then you check whether each digit occurs only once. `/^(\d)(?!\d\1)(\d)(?!\d\2)(\d)(?!\d\3)\d$/;` [download] This works by capturing each digit and using negative look-aheads to check that this digit does not re-occur again in the string. The following program proves that it works: `use warnings; use strict; for my $one (0 .. 3) { for my $two (0 .. 3) { for my $three (0 .. 3) { for my $four (0 .. 4) { my $test = "$one$two$three$four"; print "$test\n" if $test=~m/^[0-3]{4}$/ and $test=~m/^ +(\d)(?!\d\1)(\d)(?!\d\2)(\d)(?!\d\3)\d$/; } } } }` [download] Output: `0123 0132 0213 0231 0312 0321 1023 1032 1203 1230 1302 1320 2013 2031 2103 2130 2301 2310 3012 3021 3102 3120 3201 3210` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re^4: RegEx Question by ikegami (Patriarch) on Sep 10, 2009 at 00:46 UTC
Re^5: RegEx Question by CountZero (Bishop) on Sep 10, 2009 at 06:03 UTC
Some notes below your chosen depth have not been shown here
Re^4: RegEx Question by yoda54 (Monk) on Sep 09, 2009 at 21:34 UTC
Re^5: RegEx Question by GrandFather (Saint) on Sep 09, 2009 at 22:12 UTC
Re^3: RegEx Question by graff (Chancellor) on Sep 09, 2009 at 23:41 UTC
I like to match permutations of 0-8 with each digits occurring once using all 8 digits. Um... the way I learned it, "0-8" represents 9 digits. (1-8 would be 8 digits, as would 0-7.) Is there some digit between 0 and 8 that you intend to leave out, and if so, which one? Anyway, if you had said "permutations of 0-8 with each digit occurring once, using all 9 digits", then I would understand that you are looking only for strings of nine digit characters, such that all nine characters are distinct, and none of them is the digit "9": `#!/usr/bin/perl use strict; while (<DATA>) { chomp; if ( length() == 9 and not ( /[^0-8]/ or /(.).*\1/ )) { print "$_\n"; } else { warn "rejected input: $_\n"; } } __DATA__ 01234 123456780 1234567890 223456781 345678012 234567890 a12345678 012345678 456782011 456782013` [download] Update: I suspect that the regex used as the last stage in my conditional is a fairly expensive operation; for strings that actually meet the criteria (are not rejected), it has to do 8+7+6+...+1 (total of 36) character comparisons to finish. (There should be some sort of "O(...n...)" expression for that, but it escapes me.) So, it would most likely be better to use a split/hash solution, as suggested by others, especially if you'll be handling large quantities of input with a relatively high "success" rate. Something like this: `while (<DATA>) { chomp; if ( length() == 9 and not /[^0-8]/ ) { my %c = map { $_ => undef } split //; if ( keys %c == 9 ) { print "$_\n"; next; } } warn "rejected input: $_\n"; }` [download]	[reply] [d/l] [select]
Re: RegEx Question by Marshall (Canon) on Sep 10, 2009 at 06:55 UTC
Here is my 2 bits worth: Please show some simple input and output. I didn't completely understand your question. The below code shows some common techniques. If you want sequences of digits (a list of solutions) that don't repeat, the code is different, but not by much. #!/usr/bin/perl -w use strict; use Data::Dumper; my @test =(112,1234,1424); foreach (@test) { if (is_num_repeated($_) ) { print "$_ FAILED digit is repeated\n"; } else { print "$_ OK no digit repeated\n"; } } foreach (@test) { print "no_repeat: $_ string before repeat is: ", first_non_repeating_digits($_),"\n"; } sub is_num_repeated { my $num = shift; my @digits = split(//,$num); my %seen; foreach (@digits) { $seen{$_}++; } # this is grep in a scalar context... return (grep {$_ >1} values %seen); } sub first_non_repeating_digits { my $num = shift; my @digits = split(//,$num); my %seen; my $result; foreach (@digits) { return $result if ($seen{$_}++); $result .= $_; } return $result; } __END__ 112 FAILED digit is repeated 1234 OK no digit repeated 1424 FAILED digit is repeated no_repeat: 112 string before repeat is: 1 no_repeat: 1234 string before repeat is: 1234 no_repeat: 1424 string before repeat is: 142 [download]	[reply] [d/l]
Re: RegEx Question by grizzley (Chaplain) on Sep 10, 2009 at 07:27 UTC
Simple approach: `D:\>perl -lne "print /^[0-8]{9}$/ && !/(.)(?=.\1)/ ? 'ok' : 'not ok'" 1234 not ok 1234567689 not ok 012345678 ok 018273645 ok 010101010 not ok ^Z` [download] And with one regexp: `D:\>perl -lne "print /^(?:([0-8])(?!.\1)){9}$/ ? 'ok' : 'not ok'" 123456780 ok 123123123 not ok 123456781 not ok ^Z` [download]	[reply] [d/l] [select]
Re^2: RegEx Question by Marshall (Canon) on Sep 10, 2009 at 08:01 UTC
Simple approach: `D:\>perl -lne "print /^[0-8]{9}$/ && !/(.)(?=.*\1)/ ? 'ok' : 'not ok'"` [download] This is short and very clever, but "simple"? I think NOT. Also, I didn't see in the OP spec that this was an exact sequence of 9 digits.	[reply] [d/l]
Re^3: RegEx Question by grizzley (Chaplain) on Sep 10, 2009 at 12:11 UTC
Well, its simplier than the second approach :> And sequence of 9 or 8 digits is mentioned in discussion above	[reply]


more useful options
	PerlMonks