Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Regex to match 20 chars of some digits followed by some spaces

by tachyon (Chancellor)
on Dec 19, 2003 at 03:07 UTC ( [id://315733]=note: print w/replies, xml ) Need Help??


in reply to Regex to match 20 chars of some digits followed by some spaces

I can't see any good reason to use Parse::RecDescent to parse fixed width records. This would seem to be using a A-bomb to crack a walnut. Surely you would be better off to unpack the data into a structure and validate from there?

In addition to the examples above you can for fun autogenerate one that does the job you want - rather ugly but it does work.

for ( reverse 1..20 ) { $re .= sprintf "\\d{%d} {%d}|", $_, 20-$_; } chop $re; $re = qr/^(?:$re)$/; print $re, $/; @tests = ( '01234567890123456789', # OK '123 ', # OK '123 ', # NOK '123 123 ', # NOK ' ', # NOK '123c ', # NOK ); for(@tests){ print m/$re/ ? "'$_' #OK\n" : "'$_' #NOK\n" } __DATA__ (?-xism:^(?:\d{20} {0}|\d{19} {1}|\d{18} {2}|\d{17} {3}|\d{16} {4}|\d{ +15} {5}|\d{14} {6}|\d{13} {7}|\d{12} {8}|\d{11} {9}|\d{10} {10}|\d{9} + {11}|\d{8} {12}|\d{7} {13}|\d{6} {14}|\d{5} {15}|\d{4} {16}|\d{3} {1 +7}|\d{2} {18}|\d{1} {19})$) '01234567890123456789' #OK '123 ' #OK '123 ' #NOK '123 123 ' #NOK ' ' #NOK '123c ' #NOK

cheers

tachyon

  • Comment on Re: Regex to match 20 chars of some digits followed by some spaces
  • Download Code

Replies are listed 'Best First'.
Re: Re: Regex to match 20 chars of some digits followed by some spaces
by leriksen (Curate) on Dec 19, 2003 at 03:20 UTC
    Not all the text I am parsing is fixed width, just this bit.

    The rest is somewhat like this (deboned to protect client)

    document : checkpoint address report(s?) doctrailer report : report1 | report2 | report3 | ... report1 : lt[100] report1_cost_centre(s?) lt[200] report1_cost_centre : lt[300] report1_txn(s?) lt[400] report1_txn : lt[500] lt[600] page_break(?) lt[700] lt : "<LT$arg[0]>" lt_data lt_data : /[^\\]*/ lt_end {$return = $item{__PATTERN1__} ...
    but for 15 different reports, hundreds of lt records, lots of options, repeats and alternations.

    +++++++++++++++++
    #!/usr/bin/perl
    use warnings;use strict;use brain;

      Ah that makes more sense. If you have to deal with fixed width records you may find this sub handy:

      $str = 'first name EOFlast name EOFaddress field + EOF'; my @rec_def = ( [ 'first_name', 20 ], [ 'last_name', 20 ], [ 'address', 30 ], ); sub parse_fixed_width { my ( $record, $rec_def ) = @_; my %struct; my $offset = 0; for my $rec(@$rec_def) { $struct{$rec->[0]} = substr $record, $offset, $rec->[1]; $offset += $rec->[1]; } return length($record) == $offset ? \%struct : ''; } use Data::Dumper; print Dumper parse_fixed_width( $str, \@rec_def ); __DATA__ $VAR1 = { 'first_name' => 'first name EOF', 'address' => 'address field EOF', 'last_name' => 'last name EOF' };

      cheers

      tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://315733]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2025-01-25 04:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (70 votes). Check out past polls.