Re: Regex to match 20 chars of some digits followed by some spaces
by tachyon (Chancellor) on Dec 19, 2003 at 03:07 UTC
|
I can't see any good reason to use Parse::RecDescent to parse fixed width records. This would seem to be using a A-bomb to crack a walnut. Surely you would be better off to unpack the data into a structure and validate from there?
In addition to the examples above you can for fun autogenerate one that does the job you want - rather ugly but it does work.
for ( reverse 1..20 ) {
$re .= sprintf "\\d{%d} {%d}|", $_, 20-$_;
}
chop $re;
$re = qr/^(?:$re)$/;
print $re, $/;
@tests = (
'01234567890123456789', # OK
'123 ', # OK
'123 ', # NOK
'123 123 ', # NOK
' ', # NOK
'123c ', # NOK
);
for(@tests){
print m/$re/ ? "'$_' #OK\n" : "'$_' #NOK\n"
}
__DATA__
(?-xism:^(?:\d{20} {0}|\d{19} {1}|\d{18} {2}|\d{17} {3}|\d{16} {4}|\d{
+15} {5}|\d{14} {6}|\d{13} {7}|\d{12} {8}|\d{11} {9}|\d{10} {10}|\d{9}
+ {11}|\d{8} {12}|\d{7} {13}|\d{6} {14}|\d{5} {15}|\d{4} {16}|\d{3} {1
+7}|\d{2} {18}|\d{1} {19})$)
'01234567890123456789' #OK
'123 ' #OK
'123 ' #NOK
'123 123 ' #NOK
' ' #NOK
'123c ' #NOK
| [reply] [d/l] |
|
document : checkpoint address report(s?) doctrailer
report : report1 | report2 | report3 | ...
report1 : lt[100] report1_cost_centre(s?) lt[200]
report1_cost_centre : lt[300] report1_txn(s?) lt[400]
report1_txn : lt[500] lt[600] page_break(?) lt[700]
lt : "<LT$arg[0]>" lt_data
lt_data : /[^\\]*/ lt_end {$return = $item{__PATTERN1__}
...
but for 15 different reports, hundreds of lt records, lots of options, repeats and alternations.
+++++++++++++++++
#!/usr/bin/perl
use warnings;use strict;use brain;
| [reply] [d/l] |
|
$str = 'first name EOFlast name EOFaddress field
+ EOF';
my @rec_def = (
[ 'first_name', 20 ],
[ 'last_name', 20 ],
[ 'address', 30 ],
);
sub parse_fixed_width {
my ( $record, $rec_def ) = @_;
my %struct;
my $offset = 0;
for my $rec(@$rec_def) {
$struct{$rec->[0]} = substr $record, $offset, $rec->[1];
$offset += $rec->[1];
}
return length($record) == $offset ? \%struct : '';
}
use Data::Dumper;
print Dumper parse_fixed_width( $str, \@rec_def );
__DATA__
$VAR1 = {
'first_name' => 'first name EOF',
'address' => 'address field EOF',
'last_name' => 'last name EOF'
};
| [reply] [d/l] |
Re: Regex to match 20 chars of some digits followed by some spaces
by Zaxo (Archbishop) on Dec 19, 2003 at 03:32 UTC
|
# Given $record
my %record;
@record{ qw/account address info/ }
= unpack 'A20 A42 A255', $record; # adjust widths to suit
# ($record{'account'}) = $record{'account'} =~ /^(\d[\d ]*)$/
($record{'account'}) = $record{'account'} =~ /^(\d+)$/
or die 'Bad Account ID'; # detaints, too
# verify the rest
The unpack width enforces the field width you expect. If spaces can't occur between digits, it becomes even simpler. The matching regex would then be /^(\d+)$/. 'An' is the unpack template for a space-padded field of bytes and results in stripping the trailing spaces. In the regex, [\d ] is a character class of digits and spaces.
Update: Simplified the code to agree with leriksen's spec.
| [reply] [d/l] [select] |
Re: Regex to match 20 chars of some digits followed by some spaces
by Roger (Parson) on Dec 19, 2003 at 05:16 UTC
|
Hi leriksen,
You were so close to getting it right, if you extend the regexp just a little bit with the match-time interpolation technique.
use strict;
use warnings;
while (<DATA>) {
chomp;
print m/(\d{1,20})(??{' ' x (20 - length($1))})/ ?
"match\n" : "not match\n";
}
__DATA__
" 123451234512345"
" 123451234512345 "
"123451234512345 "
"1234512345 "
"123 451 2345 "
" "
And the output is exactly as expected -
not match
not match
match
match
not match
not match
| [reply] [d/l] [select] |
Re: Regex to match 20 chars of some digits followed by some spaces
by leriksen (Curate) on Dec 19, 2003 at 03:02 UTC
|
Some collegues are first to the punch
mildside has
m/^\d((?<=\d)\d(?=([ \d]|$))|(?<=[\d ]) (?=( |$))){19}$/
another is
m/^(?=\d*(?:\d ) *(?!\d)$)[0-9 ]{20}$/
+++++++++++++++++
#!/usr/bin/perl
use warnings;use strict;use brain;
| [reply] [d/l] [select] |
Re: Regex to match 20 chars of some digits followed by some spaces
by blokhead (Monsignor) on Dec 19, 2003 at 03:05 UTC
|
The contents of {..} in your regex aren't interpolated. You actually need them to be (re)interpolated at the time of a possible match. You can do this with (??{ code }), which is a bit of an ugly hack... I have no idea if Parse::RecDescent will like these, presumably it just evals the regex so it may work.
my $regex = qr/\[(\d{1,20})(??{ " {" . (20 - length $1) . "}" })\]/;
while (<DATA>) {
print /$regex/ ? "yes\n" : "no\n";
}
__DATA__
[12345678901234567890]
[123 ]
[234223423 ]
[23409234329c ]
I don't know of a good way to do this without extended regex features (or multiple regexes). If there were some way to do this in general, I'd have to get to work on some regex abuse a la Abigail. There have been a few times when something like this would have been handy!
| [reply] [d/l] |
Re: Regex to match 20 chars of some digits followed by some spaces
by sauoq (Abbot) on Dec 19, 2003 at 06:34 UTC
|
perl -nle 'print "match" if /^\d(?:\d| (?![^ ])){19}$/'
Matches 1 digit followed by 19 digits or spaces-not-followed-by-a-non-space.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] |
Re: Regex to match 20 chars of some digits followed by some spaces
by ysth (Canon) on Dec 19, 2003 at 06:19 UTC
|
/\d(?!.{0,18} \d)[\d ]{19}/
Matches 20 digits and spaces, beginning with a digit,
where there are no digits following spaces.
Update: had 0,19, meant 19 | [reply] [d/l] |
|
| [reply] [d/l] [select] |
|
I actually have never used Parse::RecDescent but was assuming it was working on an input buffer using the
supplied regex as something like /\G$regex/gc
so I didn't supply a beginning anchor and an ending anchor
is not usable. I haven't got around to checking my assumption yet...and you know what they say when you assUme.
| [reply] [d/l] |
|
(YAWTDI) Regex to match 20 chars of some digits followed by some spaces
by Zaxo (Archbishop) on Dec 19, 2003 at 08:05 UTC
|
No extraction of data this time, just a little check on $record,
print substr( $record, 0, 20) =~ /^\d+ *$/ ? 'OK' : 'NOK';
I like to keep the regexen as simple as possible.
| [reply] [d/l] |
Re: Regex to match 20 chars of some digits followed by some spaces
by BrowserUk (Patriarch) on Dec 19, 2003 at 06:53 UTC
|
m[^ \d (?: (?<! \x20 ) \d | \x20 ){19} $]x
Which says that the entire string must constist of a digit followed by 19 ((digits not preceeded by spaces) or spaces).
print m[^ \d (?: (?<!\x20) \d | \x20 ){19} $]x
? 'Yes:' . $_
: ' No:' . $_
for @t;
No: 123451234512345
No: 123451234512345
Yes:123451234512345
Yes:1234512345
No:123 451 2345
No:
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
| [reply] [d/l] [select] |
Re: Regex to match 20 chars of some digits followed by some spaces
by Enlil (Parson) on Dec 19, 2003 at 07:24 UTC
|
In the spirit of TMTOWTDI: #!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
print m/\d(?:(?:\d| (?!\d))){19}/ ?
"$_ matches\n" : "$_ does not match\n";
}
__DATA__
"01234567890123456789"
"123"
"123 123 "
"123c "
" "
"11232424525252423 "
enlil | [reply] [d/l] |
Re: Regex to match 20 chars of some digits followed by some spaces
by duff (Parson) on Dec 19, 2003 at 03:06 UTC
|
Sounds like you just need to move the checking for digits bit into your program logic and out of your rule. I.e., match 20 chars and then in your code, check that you got the requisite number of digits. Something like:
account: /\d[ \d]{18} /
I interpret your example to mean that you always want one digit and one space. Of course, that will let things like "12 56 90123456789 " through, but that's where you use one of those nifty code block after the rule :)
| [reply] [d/l] |
Re: Regex to match 20 chars of some digits followed by some spaces
by Chmrr (Vicar) on Dec 19, 2003 at 14:21 UTC
|
Yet another way to do it: /^\d+ *(?<=^.{20})$/ That is, one or more digits, followed by zero or more spaces -- and only thereafter do we check that it summed to 20 characters total. Probably not as efficient (involves more backtracking) but easier for my eyes to understand.
| [reply] [d/l] |