http://www.perlmonks.org?node_id=19756

Speedfreak has asked for the wisdom of the Perl Monks concerning the following question:

Hej All,

One day I am going to get this nailed but until then, can someone please guide me in the ways...

Heres the problem:

I have a CSV file which contains a list of places plus there position on the planet in Latitude and Longtitude stored as decimal.

Now, the prinicpal is to select one of these locations and return a list of places within a defined square area of a certain size.

First problem is that I have to open this CSV file and search it for the place selected to get its Lat/Lon pair. I then have to re-search it for all the places who's position is close based on the search area.

My theory was to create a 2 dimensional array (sort of rows/cols if you will), by reading lines in, splitting them by the delimeter and putting the values in the second dimension. I would then index a line by the first.

However, reading my Perl docs and books I have, I cant seem to find how to do this.

However I do it I need to complete the following steps.

1) Open the CSV file and pull it into an array.
2) Go through each element of the array to find a matching place name and then get its Lat/Lon from a column on that row.
3) Re-search the array for places who's position lies within a predefined area around the chosen place.

Can anyone advise me on this? I am completely lost and dont want to have to parse a file from disk twice and putting it into an array and searching it twice seems to me to take less disk overhead than using a simple SQL query.

- Jed

  • Comment on Reading file into an array and working with it.

Replies are listed 'Best First'.
Re: Reading file into an array and working with it.
by httptech (Chaplain) on Jun 25, 2000 at 21:10 UTC
    What I would do is use Text::CSV to do the actual parsing, then push each array returned as an array reference onto a master array of locations:
    use Text::CSV; my @locations; my $csv = Text::CSV->new(); open (FILE, "locations.csv") or die "Couldn't open location file: $!"; while (<FILE>) { $csv->parse($_); push(@locations, [$csv->fields]); } close FILE;

      Might be worth pointing out that on CPAN there is now a Text::CSV_XS which, as its name implies, does the parsing in C code. It is therefore much faster than the pure Perl implementation in Text::CSV.

      --
      <http://www.dave.org.uk>

      European Perl Conference - Sept 22/24 2000
      <http://www.yapc.org/Europe/>
        Hmm, I wonder if DBD::CSV could use it as a backend... and whether it would speed DBD:CSV up, because it is pretty slow.
Re: Reading file into an array and working with it.
by chromatic (Archbishop) on Jun 25, 2000 at 21:17 UTC
    Anytime you find yourself iterating through an array, looking for a specific value, you should stop and ask "Would a hash be better here?"

    What I would do:

    1. Open the file.
    2. For each line, split it into $name and $location.
    3. Put the data into two hashes -- one in the format $name => $location, the other $location => $name.
    4. Close the file.
    5. Look up the location by name from the first hash.
    6. Write a couple of loops to cover the coordinates of the preefined area around the place (add to and subtract from the Lat/Lon values).
    7. Look up names (if they exist) by location in the second hash.
    You might also look at the DBM file modules included with Perl, like DB_File, GDBM_File, NDBM_File, and ODBM_File. Also be aware that parsing many CSV files with a regex (even in a split statement) is tricky, so Text::CSV may come in handy.
Re: Reading file into an array and working with it.
by davorg (Chancellor) on Jun 25, 2000 at 21:16 UTC

    Sounds like to need to look at perldoc perllol and perldoc perdsc, both of which cover this in some detail.

    In summary you'd do something like this (assuming your delimiter is a tab):

    my @data; open(DATA, $file) || die "Can't open $file: $!\n"; while (<DATA>) { push @data, [split /\t/]; }

    You would then access the various elements like this:

    my $town = 'London'; my ($lat, $long); foreach (@data) { if $_->[0] eq $town; ($lat, $long) = ($_->[1], $_->[2]); last; }

    However, if this is how you are going to be using the data, then you might be better off building a hash of arrays, where the key to the hash is the town name and the value is a two element list containing lat and long. You would construct that something like this:

    my %data; open(DATA, $file) || die "Can't open $file: $!\n"; while (<DATA>) { my ($town, @vals) = split(/\t/); $data{$town} = \@vals; }

    you could then get the lat and long for a particular town like this:

    my ($lat, $long) = @{$data{$town}};

    Hope this helps

    --
    <http://www.dave.org.uk>

    European Perl Conference - Sept 22/24 2000
    <http://www.yapc.org/Europe/>
Re: Reading file into an array and working with it.
by Ovid (Cardinal) on Jun 25, 2000 at 22:35 UTC
    Well, my answer is probably overkill, but I found the problem so much fun that I wrote the code for it. This may not be the most efficient way of getting your answer, but it's what I came up with. Any optimization advice would be great!

    Here's the sample data file I created:

    Mt. Wrangell,AK,62N,144W Hico,TX,32N,99W Neotsu,OR,45N,124W Applegate,CA,39N,121W Arbuckle,CA,39N,122W Lakeport,CA,39N,123W Hot Springs,CA,40N,121W
    Here's the program:
    #!/usr/bin/perl -w use strict; my $data = "test.txt"; # here is the raw latitude/longitude + data my ($xlat, $xlon) = qw(39N 122W); # here's what we'll search for, +/- +$variance my (%lat_lon, @lat, @lon, @final_lat, @final_lon, %dups); my $variance = 1; # change this to the degree variance + desired open (DATA, "<$data") || die "Can't open $data for reading: $!\n"; while (<DATA>) { chomp; my ($city, $state, $lat, $lon) = split /,/; $lat_lon{$lat}{$lon}->[0] = $city; $lat_lon{$lat}{$lon}->[1] = $state; } close (DATA) || die "Can't close $data: $!\n"; # find all lats which are +/- $variance of target lat foreach my $lat_key (keys %lat_lon) { my ($lat, $lat_NS, $xlat_NS); $lat = $1, $lat_NS = $2 if $lat_key =~ /^(\d{1,3})([NS])$/o; $xlat_NS = $2 if $xlat =~ /^(\d{1,3})([NS])$/o; if ($xlat_NS eq $lat_NS) { push (@lat, $lat_key) if ($1 <= $lat + $variance) && ($1 >= $l +at - $variance); } } # find all lons which are +/- $variance of target lon foreach my $good_lat (@lat) { foreach my $lon_key (keys %{$lat_lon{$good_lat}}) { my ($lon, $lon_WE, $xlon_WE); $lon = $1, $lon_WE = $2 if $lon_key =~ /^(\d{1,3})([WE])$/o; $xlon_WE = $2 if $xlon =~ /^(\d{1,3})([WE])$/o; if ($xlon_WE eq $lon_WE) { push (@lon, $lon_key) if ($1 <= $lon + $variance) && ($1 > += $lon - $variance); } } } # remove duplicate latitudes and longitudes foreach (@lat) { push (@final_lat, $_) unless $dups{$_}++; } foreach (@lon) { push (@final_lon, $_) unless $dups{$_}++; }
    The two arrays, @final_lat and @final_lon, will contain all of the unique latitudes and longitudes which are within $variance of your target latitude and longitude.

    Incidentally, the information that you were looking for regarding multi-dimensional arrays is found in Programming Perl, by O'Reilly Books, Second Edition, starting on page 257. You'll want to read through that to see what I was doing with a hash of hashes of arrays.

    Cheers!

    Update: As mentioned in previous answers, you'll want to check out Text::CSV. My example above will fail if you have a data field with an embedded comma:

    "Some city, County", State, 45N, 120W
    The quotes will also cause problems if you're trying to eliminate them.
Re: Reading file into an array and working with it.
by merlyn (Sage) on Jun 26, 2000 at 02:47 UTC
Re: Reading file into an array and working with it.
by lhoward (Vicar) on Jun 26, 2000 at 02:27 UTC
    You should really consider using a database for this kind of thing if you're going to be doing it more than occasionally. I implemented a "location search" on a website that I developed using this methodology and it works great.
    1. find latitude/longitude of the location the searcher is interested in
    2. determine latitude/longitude of "bounding box" that bounds an area N miles from the latitude/longitude of the search location be sure to take the curvature of the earth into account when doing this computation.
    3. select all location of interest from the DB with a simple SQL query: select * from LOCATION where LATITUDE between LATMIN and LATMAX and LONGITUDE between LONMIN and LONMAX