Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Grab zip codes out of an HTML page

by Falkkin (Chaplain)
on Feb 24, 2001 at 08:56 UTC ( #60611=snippet: print w/ replies, xml ) Need Help??

Description: Takes in an HTML page and extracts everything that looks like a ZIP code (that is, 5 digits in a row.) Prints them out, separated by commas. I'm sure there's an better way to do this, but thought I'd share. :)
cat zips.html | perl -ne 's/(\d{5,})//g; print "$1," if $1'
Comment on Grab zip codes out of an HTML page
Download Code
Re: Grab zip codes out of an HTML page
by damian1301 (Curate) on Feb 24, 2001 at 09:31 UTC
    Wouldn't the s/(\d{5,})//g; match 5 or more instances of consecutive numbers? So if, in a webpage, there is 123456778990, it would match all of the numbers and return them in $1 A better solution to this would be to omit the comma and result with this

    s/(\d{5})//g;

    But, since your not actually making a substitution in that code, you should just use a match.

    m/(\d{5})(-\d{4})?/g;

    That way you don't have to falsely delete anything and its much more tidier :). Also, now you can catch the full zip code for better accuracy (eg. 12345-1234). Hope I helped.

    UPDATE:Thanks albannach for pointing out my typing mistake and for suggesting the (-\d{4})? part :)

    Almost a Perl hacker.
    Dave AKA damian

    I encourage you to email me

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://60611]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2014-07-24 02:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (156 votes), past polls