Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Get most recent data based on a date from an array of hashes.

by Anonymous Monk
on Jan 18, 2022 at 17:35 UTC ( #11140571=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Trying to get the most recent data set from this array, "sort" not working, prints all the data instead.
I only want to get the most recent:

{ 'Color' => 'green', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '08-06-2022' }
Which is the most recent based on the " 'Date' => '08-06-2022'".
Any suggestions?

Test code:
#!/usr/bin/perl -w use strict; use Data::Dumper; my $data = [ { 'Color' => 'green', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '08-06-2022' }, { 'Color' => 'black', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '01-05-2019' }, { 'Color' => 'blue', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '10-11-2020' }, { 'Color' => 'white', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '01-03-2022' }, { 'Color' => 'red', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '03-21-2021' }, ]; my @filtered = sort { $a->{Date} cmp $b->{Date} } @$data; print Dumper @filtered;

Thanks for looking!

Replies are listed 'Best First'.
Re: Get most recent data based on a date from an array of hashes.
by tybalt89 (Prior) on Jan 18, 2022 at 18:18 UTC
    #!/usr/bin/perl use strict; use warnings; use List::AllUtils qw( max_by ); my $data = [ { 'Color' => 'green', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '08-06-2022' }, { 'Color' => 'black', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '01-05-2019' }, { 'Color' => 'blue', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '10-11-2020' }, { 'Color' => 'white', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '01-03-2022' }, { 'Color' => 'red', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '03-21-2021' }, ]; my $mostrecent = max_by { my @fields = split /-/, $_->{Date}; join '', @fields[2,0,1] } @$d +ata; use Data::Dump 'dd'; dd $mostrecent;

    Outputs:

    { acc => 1111, Color => "green", Date => "08-06-2022", Step => "Platfo +rm" }
Re: Get most recent data based on a date from an array of hashes.
by davido (Cardinal) on Jan 18, 2022 at 20:54 UTC

    I would approach it by transforming the date field into ISO8601 format, holding onto it as a sort key, and then sorting based on that format. Here's one way of doing that:

    use strict; use warnings; use Data::Dumper; my $data = [ { 'Color' => 'green', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '08-06-2022' }, { 'Color' => 'black', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '01-05-2019' }, { 'Color' => 'blue', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '10-11-2020' }, { 'Color' => 'white', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '01-03-2022' }, { 'Color' => 'red', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '03-21-2021' }, ]; my @sorted = map { $_->[1] } # Drop the date key + keeping only the payload sort { $a->[0] cmp $b->[0] } # Sort based on dat +e key map { [to_iso8601($_->{Date}), $_] } # Temporarily store + tuples of transformed date key, payload @$data; print Dumper \@sorted; sub to_iso8601 { my $date = shift; # Extract the components of a mm-dd-yyyy format my %components; @components{qw(month day year)} = split /-/, $date; # Enforce month and date must be 2 digits, possibly adding a leadi +ng 0. $components{$_} =~ s/^(\d{1})$/0$1/ for qw(day month); # Return them as a string with the components rearranged as yyyy-m +m-dd # (ISO8601) return join '-', @components{qw(year month day)}; }

    My own paranoia about date manipulation is that I should be using a module for date comparisons and transformations, so this approach sort of goes against my own instinct to not try to do date manipulation myself. If there's any possibility at all that dates aren't as easy to transform as I've done here, use a module for that.

    This approach uses a Schwartzian Transform to augment the data we're sorting with a sort key. We create a tuple of [$sort_key, $payload], sort those tuples, then retain only the payload from the sorted tuples. And this approach defines a to_iso8601 function that converts mm-dd-yyyy format (or even m-d-yyyy) to yyyy-mm-dd format to use as a sort key.


    Dave

      Technically, if you only need to transform the date into a sortable key, you can just fudge the maths behind it to make it faster.

      Say, the original date is MM-DD-YYYY (one of the most useless date formats available to modern computing), you could do something like this:

      my $timestamp = '03-21-2021'; my ($month, $day, $year) = split/\-/, $timestamp; my $sortkey = $day + ($month * 100) + $year * 10000;

      Now that the thing is an integer, you can compare numerically, which is faster than a string compare.

      perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'
Re: Get most recent data based on a date from an array of hashes.
by choroba (Archbishop) on Jan 18, 2022 at 17:46 UTC
    Dates can be compared as strings if they follow the YYYY-MM-DD pattern (compulsory xkcd reference).
    sub mdy2ymd { my ($mdy) = @_; $mdy =~ /(..)-(..)-(....)/ and return "$3-$1-$2"; } my @filtered = sort { mdy2ymd($a->{Date}) cmp mdy2ymd($b->{Date}) } @$data;

    For longer lists, you might want to use the Schwartzian transform so Perl doesn't have to convert each date several times.

    Better yet, store the dates directly in the YYYY-MM-DD format and you can sort them the way you wanted.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      I tried, but still getting all the data instead of the most recent:
      sub mdy2ymd { my ($mdy) = @_; $mdy =~ s/^(\d{1,2})\D(\d{1,2})\D(\d{4}|\d{4}\s+).*/$1\-$2\-$3/; return "$3-$1-$2"; } my @filtered = sort { mdy2ymd($a->{Date}) cmp mdy2ymd($b->{Date}) } @$data; print Dumper @filtered;

        Your sort call just reorders your list, it does nothing to "filter" the contents. You want to take the first last item off the generated list.

        Also it doesn't make much sense to use s/// to modify the string to M-D-Y to then return a different string Y-M-D. Better would be to either use just a match m// and return the new string, or to actually switch things around with the s/// and return the modified string.

        ## Either $mdy =~ m{^(...blahblah...)}; return "$3-$1-$2"; ## Or . . . $mdy =~ s{^(...blahblah...)}{$3-$1-$2}; return $mdy; ## Or maybe (with a new enough perl) return $mdy =~ s{...}{$3-$1-$2}r;

        Edit: Misread the order of comparison being used in the parent sample code.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

Re: Get most recent data based on a date from an array of hashes.
by johngg (Canon) on Jan 19, 2022 at 00:30 UTC

    You can use Time::Piece (which is core) to get the epoch seconds from the date. The epoch key/value pair can then be added to the hash ref. for sort'ing numerically and then delete'ed from the hash ref. afterwards. You can then take the last element from the sorted items to get the latest date.

    use strict; use warnings; use Time::Piece; use Data::Dumper; my $data = [ { Color => q{green}, Step => q{Platform}, acc => q{1111}, Date => q{08-06-2022}, }, { Color => q{black}, Step => q{Platform}, acc => q{1111}, Date => q{01-05-2019}, }, { Color => q{blue}, Step => q{Platform}, acc => q{1111}, Date => q{10-11-2020}, }, { Color => q{white}, Step => q{Platform}, acc => q{1111}, Date => q{01-03-2022}, }, { Color => q{red}, Step => q{Platform}, acc => q{1111}, Date => q{03-21-2021}, }, ]; my @filtered = ( map { delete $_->{ epoch }; $_; } sort { $a->{ epoch } <=> $b->{ epoch } } map { $_->{ epoch } = Time::Piece->strptime( $_->{ Date }, q{%m-%d-%Y} )->epo +ch(); $_; } @{ $data } )[ -1 ]; print Data::Dumper->Dumpxs( [ \ @filtered ], [ qw{ *filtered } ] );

    Produces

    @filtered = ( { 'Color' => 'green', 'Date' => '08-06-2022', 'Step' => 'Platform', 'acc' => '1111' } );

    I hope this is helpful.

    Update: Expanded wording re. use of the temporary epoch key/value pair.

    Cheers,

    JohnGG

      my @filtered = ( ... )[ -1 ];

      Since the @filtered array cannot be more than a single element, wouldn't it be better to make this a scalar?
          my $most_recent = ( ... )[ -1 ];

      This would have the possibly desirable side effect that if the input array were empty, $most_recent would be undefined/false. (It must be defined/true otherwise because it would be a reference.)


      Give a man a fish:  <%-{-{-{-<

Re: Get most recent data based on a date from an array of hashes.
by haukex (Bishop) on Jan 18, 2022 at 17:46 UTC

    One way might be to parse the dates via Time::Piece into a new value into the hashes and then sorting based on that, e.g.

    $_->{TimePiece} = Time::Piece->strptime($_->{Date}, '%m-%d-%Y') for @$ +data; my @filtered = sort { $a->{TimePiece} <=> $b->{TimePiece} } @$data;

    Fetching the most recent result out of the array and removing the TimePiece key back out of the hash left as an exercise to the reader.

Re: Get most recent data based on a date from an array of hashes.
by kcott (Bishop) on Jan 19, 2022 at 06:45 UTC

    TMTOWTDI,

    I removed the superfluous, identical Step and acc key/value pairs throughout. I moved the wanted element to the centre of the @$data array. You'll need Perl v5.14 to use the s///r. There's similarities to other solutions using a Schwartzian Transform.

    #!/usr/bin/env perl use 5.014; use warnings; my $data = [ { Color => 'black', Date => '01-05-2019' }, { Color => 'blue', Date => '10-11-2020' }, { Color => 'green', Date => '08-06-2022' }, { Color => 'white', Date => '01-03-2022' }, { Color => 'red', Date => '03-21-2021' }, ]; my @sorted = map $_->[1], sort { $b->[0] <=> $a->[0] } map [$_->{Date} =~ s/(..)-(..)-(....)/$3$1$2/r, $_], @$data; use Data::Dump; dd $sorted[0];

    Output:

    { Color => "green", Date => "08-06-2022" }

    — Ken

Re: Get most recent data based on a date from an array of hashes.
by Marshall (Canon) on Jan 19, 2022 at 10:48 UTC
    I work with GMT time zone dates with the YYYY-MM-DD HH:MM:SS format often. String compare with this is fine. The key is that the leading zeroes are absolutely necessary. This is actually what a date looks like in an SQLite DB.

    I see many fine posts with sorting techniques if @$data is large.

    Not considered yet is what happens if more than one thing happened on the "most recent date"? In my data sets, measured to the second, I would allow for this possibility (and it very likely could indeed happen). Code below shows just one way.

    #!/usr/bin/perl -w use strict; use Data::Dumper; use Data::Dump qw(dump dd); my $data = [ { 'Color' => 'green', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '08-06-2022' }, { 'Color' => 'black', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '01-05-2019' }, { 'Color' => 'reddish', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '03-21-2021' }, { 'Color' => 'blue', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '10-11-2020' }, { 'Color' => 'white', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '08-06-2022' }, { 'Color' => 'red', 'Step' => 'Platform', 'acc' => '1111', 'Date' => '03-21-2021' }, ]; @$data = sort{my $A = $a->{Date}; my $B = $b->{Date}; $A =~ s/(\d+)-(\d+)-(\d+)/$3-$1-$2/; $B =~ s/(\d+)-(\d+)-(\d+)/$3-$1-$2/; $B cmp $A}@$data; my $most_recent_href = $data->[0]; dd $most_recent_href; #see if there are others on same date as most recent?? foreach my $href (@$data[1..@$data-1]) { if ($href->{Date} eq $most_recent_href->{Date}) { dd $href; } else {last;} } __END__ { acc => 1111, Color => "green", Date => "08-06-2022", Step => "Platfo +rm" } { acc => 1111, Color => "white", Date => "08-06-2022", Step => "Platfo +rm" }

      Note that max_by() can also return multiple max's if it is used in list context.

      #!/usr/bin/perl use strict; use warnings; use List::AllUtils qw( max_by ); my $data = [ { acc => 1111, Color => "green", Date => "08-06-2022", Step => "Plat +form" }, { acc => 1111, Color => "black", Date => "01-05-2019", Step => "Plat +form" }, { acc => 1111, Color => "reddish", Date => "03-21-2021", Step => "Pl +atform" }, { acc => 1111, Color => "blue", Date => "10-11-2020", Step => "Platf +orm" }, { acc => 1111, Color => "white", Date => "08-06-2022", Step => "Plat +form" }, { acc => 1111, Color => "red", Date => "03-21-2021", Step => "Platfo +rm" }, ]; my @mostrecent = max_by {join '', (split /-/, $_->{Date})[2,0,1] } @$d +ata; use Data::Dump 'dd'; dd @mostrecent;

      Outputs:

      ( { acc => 1111, Color => "green", Date => "08-06-2022", Step => "Plat +form" }, { acc => 1111, Color => "white", Date => "08-06-2022", Step => "Plat +form" }, )
        This is way ++cool.
        I wasn't aware of List::AllUtils or List::Utilsby in the List:: menagerie

        List::Utils
        List::MoreUtils
        List::AllUtils
        List::SomeUtils
        List::UtilsBy

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11140571]
Approved by davido
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2022-05-29 09:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (101 votes). Check out past polls.

    Notices?