Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Grab zip codes out of an HTML page

by Falkkin (Chaplain)
on Feb 24, 2001 at 08:56 UTC ( [id://60611]=CUFP: print w/replies, xml ) Need Help??

Takes in an HTML page and extracts everything that looks like a ZIP code (that is, 5 digits in a row.) Prints them out, separated by commas. I'm sure there's an better way to do this, but thought I'd share. :)
cat zips.html | perl -ne 's/(\d{5,})//g; print "$1," if $1'

Replies are listed 'Best First'.
Re: Grab zip codes out of an HTML page
by damian1301 (Curate) on Feb 24, 2001 at 09:31 UTC
    Wouldn't the s/(\d{5,})//g; match 5 or more instances of consecutive numbers? So if, in a webpage, there is 123456778990, it would match all of the numbers and return them in $1 A better solution to this would be to omit the comma and result with this

    s/(\d{5})//g;

    But, since your not actually making a substitution in that code, you should just use a match.

    m/(\d{5})(-\d{4})?/g;

    That way you don't have to falsely delete anything and its much more tidier :). Also, now you can catch the full zip code for better accuracy (eg. 12345-1234). Hope I helped.

    UPDATE:Thanks albannach for pointing out my typing mistake and for suggesting the (-\d{4})? part :)

    Almost a Perl hacker.
    Dave AKA damian

    I encourage you to email me

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://60611]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-24 18:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found