Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

goto HACK

by Anonymous Monk
on Sep 25, 2018 at 07:27 UTC ( #1222945=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to build a hash with unixtime keys and eliminate duplicates without losing entries. Since the exact time is not too important I came up with this hack that increments keys until they're unique. But this is so wrong! I need some help figuring out the right way with an array:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @times = qw(1000 1000 1000 1010 1010 1010); my $hash = {}; my $seen = {}; for my $time (@times) { if ($seen->{$time}) { HACK: $time++; if ($seen->{$time}) { goto HACK } else { $seen->{$time}++ } } else { $seen->{$time}++ } $hash->{$time}->{one} = 1; $hash->{$time}->{two} = 2; } print Dumper $hash;
Thank you for your time.

Replies are listed 'Best First'.
Re: goto HACK
by dave_the_m (Monsignor) on Sep 25, 2018 at 07:37 UTC
    use strict; use warnings; use Data::Dumper; my @times = qw(1000 1000 1000 1010 1010 1010); my $hash = {}; for my $time (@times) { $time++ while exists $hash->{$time}; $hash->{$time}->{one} = 1; $hash->{$time}->{two} = 2; } print Dumper $hash;
    Note that $time++ modifies the element of @times, since 'for' makes $time an alias of each element. If that's undesirable, copy $time to another var at the top of the loop, and use that instead.

    Dave.

      Note that $time++ modifies the element of @times, since 'for' makes $time an alias of each element.

      That's kinda creepy! I assumed declaring the var was making a copy of the element that wouldn't change the array. I can't believe I never noticed that... and the same with hash values, but not keys?

        Yes, the keys function returns temporary copies of the keys while values returns the actual values. So
        $_++ for @a; # modifies each element of @a $_++ for values %hash; # modifies every value of %hash $_++ for keys %hash; # doesn't do anything useful

        Dave.

Re: goto HACK
by Eily (Monsignor) on Sep 25, 2018 at 08:29 UTC

    I need to build a hash with unixtime keys and eliminate duplicates without losing entries
    why do you need to remove duplicates in the first place? Is this so that you can use them as a hash key? One simple way to do that is to append a number in the key to make them unique:
    use Data::Dump qw(pp); my @array = qw<1000 1000 1000 1010 1010 1010 1020 1022 1023>; my %out; while (my ($pos, $val) = each @array) { $out{$val.':'.$pos} = { One => 1, Two => 1 }; } pp \%out; __END__ { "1000:0" => { One => 1, Two => 1 }, "1000:1" => { One => 1, Two => 1 }, "1000:2" => { One => 1, Two => 1 }, "1010:3" => { One => 1, Two => 1 }, "1010:4" => { One => 1, Two => 1 }, "1010:5" => { One => 1, Two => 1 }, "1020:6" => { One => 1, Two => 1 }, "1022:7" => { One => 1, Two => 1 }, "1023:8" => { One => 1, Two => 1 }, }
    You can still access the timestamp by splitting the keys.

    Here's another method, where you remove precision from the timestamp to replace it with an index (in my example below, with a precision of 10, I remove the last digit to replace it with a value between 0 and 9).

    my %out; my %index; my $precision = 10; for my $val (@array) { my $rounded = $precision*(int($val/$precision)); die "Too many duplicates" if exists $index{$rounded} and $index{$rou +nded} == $precision; my $timestamp = $rounded + $index{$rounded}++; $out{$timestamp} = { One => 1, Two => 1 }; } pp \%out; __END__ { 1000 => { One => 1, Two => 1 }, 1001 => { One => 1, Two => 1 }, 1002 => { One => 1, Two => 1 }, 1010 => { One => 1, Two => 1 }, 1011 => { One => 1, Two => 1 }, 1012 => { One => 1, Two => 1 }, 1020 => { One => 1, Two => 1 }, 1021 => { One => 1, Two => 1 }, 1022 => { One => 1, Two => 1 }, }
    It won't work when there are too many timestamps that round down to the same value though (eg, if you remove the last two digits, you can't have 100 times the same value) and will change a value even if it was already new (1022 and 1023 became 1021 and 1022). On the other hand it's a single loop rather than two nested loops.

Re: goto HACK
by choroba (Bishop) on Sep 25, 2018 at 08:31 UTC
    In a game I wrote years ago, I used a similar trick.
    $when += 0.001 while exists $calendar{$when};

    The point was to keep the events in the right order. Depending on the density of the events, you can easily schedule an event to a completely different time when adding 1. Adding rand .05 would be another option.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: goto HACK
by cavac (Curate) on Sep 25, 2018 at 09:30 UTC

    If you only have timestamps, you can just count them:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my @times = qw(1000 1000 1000 1010 1010 1010); my %data; foreach my $key (@times) { $data{$key}++; } print Dumper(\%data);

    Results:

    $ perl timehash1.pl $VAR1 = { '1000' => 3, '1010' => 3 };

    On the other hand, if you have data associated with the timestamps, you can just push that data into an array associated with that specific timestamp inside the hash. Sounds complicated, but it isn't:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my @times = (1000 => 'hello',, 1000 => 'world', 1000 => 'foo', 1010 => 'bar', 1010 => 'baz', 1010 => ['hello world', 'hallo welt'], ); my %data; while(@times) { my ($key, $val) = (shift @times, shift @times); push @{$data{$key}}, $val; } print Dumper(\%data);

    And here is the result:

    $ perl timehash2.pl $VAR1 = { '1010' => [ 'bar', 'baz', [ 'hello world', 'hallo welt' ] ], '1000' => [ 'hello', 'world', 'foo' ] };

    Edit: If you want to avoid temporary variables, you can also make the main loop faster but harder to read:

    while(@times) { push @{$data{shift @times}}, shift @times; }
    "For me, programming in Perl is like my cooking. The result may not always taste nice, but it's quick, painless and it get's food on the table."
Re: goto HACK
by bliako (Priest) on Sep 25, 2018 at 11:50 UTC

    I find increasing the resolution of the timestamp by making it a real (ie not an integer) as suggested by Re: goto HACK a very good solution: essentially for each integer timestamp you get trillions of sub-timestamps to accommodate trillions of integer-timestamp duplicates. Additionally, sorting wrt keys as reals is fast - I guess faster than sorting keys as strings. And you can get your original integer timestamp fastly-fast too!

    Adding a random number in (0,1) to the integer timestamp is also good solution to create non-duplicate keys and you do not need to keep track of the last real key inserted for that duplicate in order to increment it. However, it will cost you searching for already existing real keys your entire hash because it is entirely possible to get duplicate random numbers.

    In similar situations I create an array as the value of each potentially duplicate timestamp-key and then keep pushing duplicates in there till the cows come home (suggested also at Re: goto HACK). Added benefits are you keep your hash small by branching (i.e. push items into arrays for one key). And the arrays are already in the order your data came in, if that's what you want.

Re: goto HACK
by dsheroh (Prior) on Sep 26, 2018 at 07:27 UTC
    I need some help figuring out the right way with an array
    Your instinct is correct, and a hash of arrays (HoA) is the "best" way to go about this:
    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my @times = qw(1000 1000 1000 1010 1010 1010); my $hash = {}; my $ord; for my $time (@times) { my $event = { one => 1, two => 2, ord => $ord++ }; push @{$hash->{$time}}, $event; } print Dumper $hash;
    Since all of your data items were identical, I also added $ord to show that they do indeed preserve their order.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1222945]
Approved by haukex
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2019-06-20 19:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (91 votes). Check out past polls.

    Notices?