http://www.perlmonks.org?node_id=636953


in reply to Pulling out oldest entries from a text file

Update: Seems I'm too slow today...

here is my quick hack..
#!/usr/bin/perl -w use strict; my $entries; while ( my $line = <DATA> ){ $line =~ /\d?\W*(gr\d)\W*(\d*-\d\d-\d\d)/; next if ( !$2 ); my $group = $1; my $date = $2; $date =~ s/-//g; if ( ! defined( $entries->{$group}) || ( $entries->{$group}->{date} < $date ) ){ $entries->{$group}->{date} = $date; $entries->{$group}->{entry} = $line; } } foreach (keys( %{$entries} )){ print "entry: $entries->{$_}->{entry}"; } __DATA__ item group entry_date 34 gr1 2003-03-02 12 gr1 1990-03-14 39 gr3 2002-04-11 66 gr4 2006-03-16 32 gr3 1998-02-13 90 gr1 2004-06-15 55 gr4 1999-06-15 etc ...


2nd Update: On the other hand, my code is the onlyone which will not get confused by misformatted lines yet .. :-)

3rd Update:
Seems I'm bored..
I just did some benchmarking..
I created some testdata with the code below:
#!/usr/bin/perl -w open F, ">testdata"; for ( 0..1000000 ){ print F "$_ gr".int(rand(10))." ". (1990+int(rand(25))) . '- +0'. (int(rand(10))) . '-' . (10 + int(rand(20)) )."\n"; } close F;

After this I did some measures:
my code: time ./latestentries.pl entry: 15970 gr5 2014-09-29 entry: 79485 gr8 2014-09-29 entry: 135788 gr7 2014-09-29 entry: 221 gr2 2014-09-29 entry: 18669 gr9 2014-09-29 entry: 46760 gr1 2014-09-29 entry: 4960 gr3 2014-09-29 entry: 9486 gr0 2014-09-29 entry: 19710 gr4 2014-09-29 entry: 56757 gr6 2014-09-29 real 0m8.689s user 0m8.617s sys 0m0.060s ------------------- anno's code: micha@laptop ~/prog/perl/test $ time perl test-anno.pl 962757, gr0, 2014-09-29 964472, gr1, 2014-09-29 984704, gr2, 2014-09-29 980128, gr3, 2014-09-29 985851, gr4, 2014-09-29 931318, gr5, 2014-09-29 976880, gr6, 2014-09-29 988367, gr7, 2014-09-29 992654, gr8, 2014-09-29 962175, gr9, 2014-09-29 real 0m4.556s user 0m4.424s sys 0m0.036s ------------------- and duff's entry: micha@laptop ~/prog/perl/test $ time perl test-duff.pl 100154 gr5 1990-00-10 5654 gr8 1990-00-10 2318 gr7 1990-00-10 9789 gr2 1990-00-10 19151 gr9 1990-00-10 91314 gr1 1990-00-10 124846 gr3 1990-00-10 14858 gr0 1990-00-10 175946 gr4 1990-00-10 95691 gr6 1990-00-10 real 0m3.497s user 0m3.452s sys 0m0.036s

The winner is duff.. :-)
He's the only one who looks for the eldest entry, AND wrote the fastest code...