http://www.perlmonks.org?node_id=268198


in reply to Parse mailing addresses with a regex

Hi,

you only have to translate your remarks into regex-speech. For more clarity it's helpful to use the x-modifier. Here my result with your remarks.

while (my $line = <DATA>) { #if ($line =~ m!^(\d+)\s+(([A-Za-z]+\s+[A-Za-z].\s+[A-Za-z]+)| +([A-Za-z]+\s+[A-Za-z]+) )!) { if ($line =~ m! ^(\d+)\s+ ([^\d]+) ((?:\w+\s)+) (\w+)\s (\w\w)\s (\d{5})\s ([\d\-]+)\s (.*)\s$ !x) { print "\n"; print "\$1: $1 \n"; print "\$2: $2 \n"; print "\$3: $3 \n"; print "\$4: $4 \n"; print "\$5: $5 \n"; print "\$6: $6 \n"; print "\$7: $7 \n"; $custNum = $1; # First number field. $custName = $2; # Name styles can vary + + so match everything between two numbers. $custStreet = $3; # Street is everything + + after name and before CITY. $custCity = $4; # City is after addres + +s and before the TWO char state identifier. $custState = $5; # State is after addre + +ss and before FIVE digit zip number. $custZip = $6; # Zip is before teleph + +one number and after State id. $custTel = $7; # Telephone no. is aft + +er zip and before comments field. $custComments = $8; # Last remaining part ++after telephone number. } } __DATA__ 141 Martha Lynn Costello 11750 Old Mill Drive Media PA 19063 610-555-1 +212 no detail 178 Edgar Jones Jr. 18013 Highfield Road Ashton Ma 20861 323-774-1339 +no detail 161 Joyce W. Whang 18 Long Point Lane Media PA 19063 610-891-2344 no d +etail 188 Alex Smith 1979 Biltmore St NW Apt B Washington DC 20009 202-913-6 +685 no detail
produces
# perl re $1: 141 $2: Martha Lynn Costello $3: 11750 Old Mill Drive $4: Media $5: PA $6: 19063 $7: 610-555-1212 $1: 178 $2: Edgar Jones Jr. $3: 18013 Highfield Road $4: Ashton $5: Ma $6: 20861 $7: 323-774-1339 $1: 161 $2: Joyce W. Whang $3: 18 Long Point Lane $4: Media $5: PA $6: 19063 $7: 610-891-2344 $1: 188 $2: Alex Smith $3: 1979 Biltmore St NW Apt B $4: Washington $5: DC $6: 20009 $7: 202-913-6685
Greetings, tos