Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Selecting a particular record from groups of records

by msexton (Initiate)
on Sep 20, 2012 at 13:08 UTC ( #994658=perlquestion: print w/ replies, xml ) Need Help??
msexton has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a question, but don't know what to Google for help or even which part of the Perl manual to read.

Basically I have a file that consists of a number of records. Each record has a date/time entry, a latitude, longitude, a water depth and a number of flags. The file comes from a multibeam bathymetry export. The problem is that the date/time stamp is the same on a number of records and I want to chose one of these dependant upon one of the flags.

So I may have 30 records with the same date/time stamp (the number of records is not fixed for each date/time). Each of these records has a particular latitude, longitude and water depth, one of the QC flags however determines the record I want (it has the smallest value for that date/time). Unfortunately this value is not fixed. It is just the minimum value for that date/time. I wish to output that record.

I then move onto the next date/time group (will be around 30 records, but could be slighty higher or lower) and repeat the process.

This continues until the end of the file.

I really only need to be pushed in the right direction (I hope). I dont have a sample of the data with me, but can provide some tomorrow when I am at work.

Any guidance would be much appreciated.

Thanks

Mike Sexton

Comment on Selecting a particular record from groups of records
Re: Selecting a particular record from groups of records
by Athanasius (Monsignor) on Sep 20, 2012 at 13:27 UTC

    In Perl, the rule is: When in doubt, use a hash!

    So, create an empty hash. Then read the data file, one record at a time: if its date/time is not already a key in the hash, store whatever values you need against this new key. If the date/time is already in the hash, compare the QC flags — if the new record has a smaller value, substitute the new record’s details for those currently stored against that date/time.

    When all records have been read, output the hash contents, sorted by their date/time keys.

    Hope that helps,

    Athanasius <°(((><contra mundum

Re: Selecting a particular record from groups of records
by Anonymous Monk on Sep 20, 2012 at 13:37 UTC

    i think, it is better to take a look at your file, first. however, i can have a recommendation. take date/time as a hash key and use references for a complex array in array.

Re: Selecting a particular record from groups of records
by jethro (Monsignor) on Sep 20, 2012 at 13:52 UTC

    Is your data sorted for date/time, i.e. are all records with the same date/time together? You implied it but I want to make sure I understand you correctly.

    In that case you can just write a loop, reading each record and checking for two things: If the date/time is the same as the previous record then if the QC flags are lower than the QC flags of a temporary record, replace the temporary record with this record. If the date/time is new, then print the temporary record (which holds the lowest record of the previous date/time) and replace the temporary record with this record. That's all

    If your data is not sorted for date/time, then you need to store the smallest record for each date/time in a hash and output those records after you read the whole file. A problem might be that you need to have as many records as you have different date/times in memory, but it doesn't sound as if that is a problem for your case

    PS: A flag is a data item that is either on or off. So technically there is no smallest value for flags. You can easily confuse programmers with misnomers like this

Re: Selecting a particular record from groups of records
by pvaldes (Chaplain) on Sep 21, 2012 at 16:57 UTC
    use strict; use warnings; use List::Util qw(min); my %seahash; my $date; my $time; my $coolflag; while(<DATA>){ next if /^date\/time/; chomp; my ($date, $time, $lat, $lon, $depth, $flag, $coolflag, $otherflag) = + split; # you could need the other variables later, so I save it # print $date, " ",$time, " ", $coolflag,"\n"; # checkpoint push @{$seahash{$date . " " . $time}}, $coolflag; } while (my ($timestamp, $flags) = each (%seahash)) { print $timestamp, " ", min(@{$flags}),"\n"} __DATA__ date/time lat, lon, depth, flag, flag, flag 21-sep-2012 01:00:00 0 0 -200 a 1 c 21-sep-2012 01:00:00 0 0 -100 a 9 c 20-sep-2012 02:20:00 0 0 -1500 a 2 c 20-sep-2012 02:20:00 0 0 -500 a 4 c 20-sep-2012 02:20:00 0 0 -400 a 3 c 20-sep-2012 02:20:00 0 0 -300 a 3 c 10-Aug-2012 02:20:00 0 0 -200 a 3 c 10-Aug-2012 02:20:00 0 0 -200 a 1 c 10-Aug-2012 02:20:00 0 0 -200 a 7 c

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://994658]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2014-10-26 07:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (152 votes), past polls