Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Find rental listings in Craig's list within certain distance

by johnnywang (Priest)
on Apr 18, 2005 at 07:16 UTC ( #448749=CUFP: print w/replies, xml ) Need Help??

Here's a short script to find all apartment rentals on Craig's list for the San Francisco Bay area, given an address and the distance (in miles). Some notes:
  • You can easily add real-estate listings, room-to-share, etc., by changing the base url in the code
  • You can also use Craig's list for other metropolitan areas, again by changing the base url. I think this only works for US, since it uses google maps to find the latitude/longitude, google maps doesn't seem to have GIS data for other regions.
  • The script takes two arguments: the address (quote it), and the distance in miles. You can easily change the unit by passing the appropriate string to Geo::Distantce.
  • I'm outputting to stdout, you can certainly print to html files to make the links.
  • Like other scraping code, this will need to be updated if either Craig's list or Google maps change their html.
#!/usr/bin/perl -w use strict; use WWW::Mechanize; use Geo::Distance; use URI::Escape; unless (@ARGV==2){ print "Usage: need to pass the address (in quotes), and the distan +ce.\n"; exit 0; } my $address = $ARGV[0]; my $dist = $ARGV[1]; my $num_pages = 10; # number of craiglist pages to process, be nice! print "\nFinding apartment listings within $dist miles from $address\n +"; my $agent = WWW::Mechanize->new(); my $google_reg = qr{<point lat="([-\d\.]+)" lng="([-\d\.]+)"/>}; # first, find long/lat for the address using google maps. my $googlemap = "".uri_escape($address); $agent->get($googlemap); my ($my_lat,$my_long) = $agent->content() =~ /$google_reg/i; print "You're at longitude=$my_long, latitude=$my_lat\n\n"; # now crawl craig's list for listings my $geo = new Geo::Distance(); my $site = ""; my $url = "$site/apa/"; process_page($_) foreach $url, map{$url."index${_}00.html"}(1..$num_pa +ges); exit(0); sub process_page{ my $u = shift; my ($price,$title,$ul,$lat,$long); $agent->get($u); my @links = $agent->find_all_links(url_regex=> qr{/apa/(\d+)\.html +}i); foreach my $link(@links){ if($link->text() =~ /^\s*\$([\d,]+)\s+/i){ $price = $1; $title = $link->text(); $ul = $site.$link->url(); $agent->get($ul); $agent->follow_link( text => "google map"); ($lat,$long) = $agent->content() =~ /$google_reg/i; my $d = $geo->distance('mile',$my_long,$my_lat => $long, $ +lat); if($d <= $dist){ print "price: \$$price\n"; print "distance: $d miles\n"; print "title: $title\n"; print "url: $ul\n\n"; } } } }

Replies are listed 'Best First'.
Re: Find rental listings in Craig's list within certain distance
by kesterkester (Hermit) on Apr 18, 2005 at 17:53 UTC

    This is fantastic! A nice job tying together several modules to make a neat tool.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://448749]
Approved by thor
Front-paged by kesterkester
and the monks are mute...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2018-06-22 04:32 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (121 votes). Check out past polls.