Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Allocation of anonymous arrays

by OwlHoot (Novice)
on Feb 07, 2014 at 09:18 UTC ( [id://1073836]=perlquestion: print w/replies, xml ) Need Help??

OwlHoot has asked for the wisdom of the Perl Monks concerning the following question:

In some code I reviewed for a colleague was a hash defined something like the following (a simplified version) :

my %fred = ( [1, 2, 3] => [1, 1, 0], [3, 4, 5] => [0, 1, 0], [0, 2, 4] => [1, 2, 1], ::: );

The code accesses this using the array references as a key, and as the "key" arrays are all distinct these references must be unique. So no problem there.

But the code later uses another hash constructed by calling reverse() on the above, and it seems to me that may be problematic if any of the anonymous "value" arrays are equal, which in the circs it appears they may well be, because perl may be clever enough to spot identical value arrays and use one copy for any such subsets, in which case reverse() would lose all duplicate entries at random.

Even if perl does not do that today, who is to say it won't start being done in some future version?

My colleague who wrote the code assures me this problem won't arise because (in his words) "the hash values are references to arrays of values, not the array values themselves". But I'm not convinced - If more than one identical value array shares the same location, then of course their references will be the same.

Any ideas?

Regards

John R Ramsden

Replies are listed 'Best First'.
Re: Allocation of anonymous arrays
by dsheroh (Monsignor) on Feb 07, 2014 at 10:53 UTC
    Your coworker is either using some extremely deep magic here or (much more likely) that assignment doesn't do what he thinks it does.

    Hash keys are strings, not scalars. So, when you try to use an array ref as a hash key, it gets stringified and your hash key is the resulting string (e.g., the literal text "ARRAY(0xdeadbeef)"), not the original reference. If he really wants 'random' unique keys for the hash, he could just as well use [] for all the keys. (Or, really, if you're going to use random garbage keys for your hash, you may as well use an array instead, since you won't be able to do key-based lookups anyhow.)

    As for your actual question, meditate upon this:

    $ perl -e 'print [] . "\n" . [] . "\n" . [] . "\n";' ARRAY(0x9e557ec) ARRAY(0x9e6ef80) ARRAY(0x9e6efbc)
    Three references to empty anonymous arrays in a single statement, yet each is unique.
Re: Allocation of anonymous arrays
by Athanasius (Archbishop) on Feb 07, 2014 at 09:46 UTC

    Hello OwlHoot, and welcome to the Monastery!

    Consider:

    my @foo = (42, 43, 45); my @bar = (42, 43, 45);

    Would Perl see that both arrays contain the same values, and therefore “optimise” the storage by having @foo and @bar refer to the same set of storage locations? No, because these arrays must be allowed to change independently. For example, incrementing $foo[0] should have no impact on the value of $bar[0].

    Now, an anonymous array is just an array which is accessed by a reference rather than a name. Two anonymous arrays which happen to share the same data cannot be optimised to refer to common storage, because either is free to change independently of the other as the script runs.

    So, I think your colleague is correct. But — please explain the motivation for using array references as hash keys in this way?!?

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks for reply Athanasius (and to everyone who has replied so promptly)

      With reference to your first example, in theory I would have thought perl (or "a Similar Language") could indeed have @foo and @bar refer to the same storage initially. Then as soon as any code wanted to amend one of them, or even define a reference to them, perl could create a copy and start using that.

      (Granted this probably wouldn't be a good idea in practice because most arrays defined explicitly will usually be amended, or references to them defined, during the subsequent running of the program.)

      Literal array references not initially assigned to a variable seem even more suitable for this "copy on write" treatment, as they are initially and quite likely never amended or pointed to explicitly. But I can see that in principle one could have something like:

      my %fred = ( [1, 2, 3] = [1, 1, 0], [3, 4, 2] = [1, 0, 1], [2, 2, 1] = [1, 1, 0], ); my $i = 0; foreach my $ref (keys %fred) ( $ref->[2] = ++$i; )

      So I'm still not entirely convinced one way or the other!

      As for why my colleague chose this system, it is a test script in which each key array represents a test case in compact form and the value represents outcome flags. (I would have represented these in a rather different way, all in one array for example, but each to their own.)

      Regards

      John R Ramsden

        So I'm still not entirely convinced one way or the other!

        Sorry guys (you and your colleague), you are arguing about silly things. Of course, your colleague is right. Perl will never optimize away arrays with the same content. It is just silly for perl to run through all arrays trying to see if they have the same content. But your colleague is completely wrong using array references as keys.

        Have you actually tried to run the code you gave above? If you run it with "use strict;" then you'll get error "Can't use string ("ARRAY(0xc80de8)") as an ARRAY ref while "strict refs" in use". This is because array that you have given when you created hash is already gone, so perl would try to create new NAMED array and the name for that array would be ARRAY(0xc80de8) or whatever is the unique string that has identified original array.

        ... perl ... could indeed have @foo and @bar refer to the same storage initially. Then as soon as any code wanted to amend one of them, or even define a reference to them, perl could create a copy and start using that.

        (Granted this probably wouldn't be a good idea ...)

        You're right: Perl could do that, and it's a bad idea, and so Perl doesn't do that.

        ... each key array represents a test case in compact form ...

        The representational form is compact, indeed: it is the nullity; it has ceased to be; bereft of life, it rests in peace. As soon as the anonymous array constructor  [ ... ] finishes its job, the reference it returns is immediately converted into a string and ceases to exist as a reference. Because the referent can no longer be accessed in any way whatsoever (because it has no reference), it is marked for garbage collection (its reference count is zero) and, in the fullness of time, it softly and silently vanishes away.

        Oops, I meant:
        foreach my $ref (keys %fred) ( $fred{$ref}->[2] = ++$i; )
Re: Allocation of anonymous arrays
by Discipulus (Canon) on Feb 07, 2014 at 09:38 UTC
    as i understand this code you are using the string representation of an anonymous array as key, not the reference. Values are anonymous array, i dont think they can be mixed even if you use theese alien key names. The whole thing make little sense to me.

    L*
    @widowzDoubleQuotation> perl -MData::Dumper -e "%h = ([1, 2, 3] => [1, + 1, 0],[3, 4, 5] => [1, 1, 0],[1, 1, 0] => [1, 2, 1],[0, 2, 4] => [1, + 2, 1],); print Dumper \%h; print map {qq!$_ is a !.ref($_).qq!\n!} k +eys %h" __OUTPUT__ $VAR1 = { 'ARRAY(0x1d46694)' => [ 1, 2, 1 ], 'ARRAY(0x1ca21d4)' => [ 1, 2, 1 ], 'ARRAY(0x1ca15f4)' => [ 1, 1, 0 ], 'ARRAY(0x6eb01c)' => [ 1, 1, 0 ] }; ARRAY(0x1d46694) is a ARRAY(0x1ca21d4) is a ARRAY(0x1ca15f4) is a ARRAY(0x6eb01c) is a ##undef ie is not a ref but a bare string
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Allocation of anonymous arrays
by shmem (Chancellor) on Feb 07, 2014 at 12:56 UTC
    The code accesses this using the array references as a key, and as the "key" arrays are all distinct these references must be unique. So no problem there.

    No problem except that the keys of that hash aren't anonymous arrays. They are strings, keys of hashes are always strings. You cannot get get back at the anonymous array from its string representation, which is there only as a label. Instead of the hex number attached to it (which actually is the address of a C structure), it could also have the md5 checksum over the array members attached.

    my %fred = ( [1, 2, 3] => [1, 1, 0], [3, 4, 5] => [0, 1, 0], [0, 2, 4] => [1, 2, 1], ); for my $k (keys %fred) { print "$k element 0: '",$k->[0],"'\n"; print "array: (",join(",",@$k),")\n"; } __END__ ARRAY(0x17ba658) element 0: '' array: () ARRAY(0x17b3b80) element 0: '' array: () ARRAY(0x1796998) element 0: '' array: ()

    You might suspect, that the anonymous arrays went out of scope after the keys were generated out of them, so storing the arrays somewhere would keep them alive. That's true, but even so, the original arrays are not accessible via their string representation:

    my @ary = ([1, 2, 3],[3, 4, 5],[0, 2, 4]); my %fred = ( $ary[0] => [1, 1, 0], $ary[1] => [0, 1, 0], $ary[2] => [1, 2, 1], ); for my $k (keys %fred) { print "$k element 0: '",$k->[0],"'\n"; print "array: (",join(",",@$k),")\n"; } __END__ ARRAY(0x19cca60) element 0: '' array: () ARRAY(0x19ccb80) element 0: '' array: () ARRAY(0x19af998) element 0: '' array: ()

    If you want a reversible hash which allows you to get at the values of the arrays identified by their stringy names, you need to construct two hashes - one to map the strings to the arrays, and the hash with them strings as keys and values:

    my @ary = ( [1, 2, 3], [1, 1, 0], [3, 4, 5], [0, 1, 0], [0, 2, 4], [1, 2, 1], ); my (@arystrings, %aryhash); for my $ary ( @ary ) { $aryhash{$ary} = $ary; push @arystrings, scalar $ary; # same string as the key above } my %fred = @arystrings; # treats @arystrings as (key,value,key,value,. +..) list print_stuff(); %fred = reverse %fred; # reverse hash print_stuff(); sub print_stuff { for my $k (keys %fred) { print "$k element 0: '",$aryhash{$k}->[0],"'\n"; print "array: (",join(",",@{$aryhash{$k}}),")\n"; } } __END__ ARRAY(0x25e6998) element 0: '1' array: (1,2,3) ARRAY(0x2603b80) element 0: '3' array: (3,4,5) ARRAY(0x260f948) element 0: '0' array: (0,2,4) ARRAY(0x260f8d0) element 0: '0' array: (0,1,0) ARRAY(0x260f9c0) element 0: '1' array: (1,2,1) ARRAY(0x2603a60) element 0: '1' array: (1,1,0)

    Note that keys produces the keys of a hash in random order.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Allocation of anonymous arrays (ref addr)
by Anonymous Monk on Feb 07, 2014 at 09:36 UTC

    Any ideas?

    An explanation of what problem you're trying to solve would be helpful

    At first look (at this late hour), the way that hash is populated/used look kinda silly and poinless :)

    Perl will hapilly reuse reference addresses ... since the hash keys are strings, as soon the references are gone, their refaddr's are free for reuse by perl

    If you want unique keys you're better off using a UUID or some such like Session::Token - Portable, secure, efficient, simple random session token generation that satisfies those OWASP recommendations

      fw() for 1 .. 10; sub fw { my %uniq; for my $ix ( 0 .. 100 ){ for my $key ( wf() ){ my $count = $uniq{$key}++; if( $count > 1 ){ my $keycount = keys %uniq; my $buckets = %uniq; print "started repeating at iteration $ix after only $ +keycount in $buckets buckets \n"; return; } } } } sub wf { my %f; $f{ [] } = [0]; $f{ [] } = [1]; $f{ [] } = [2]; $f{ [] } = [3]; return keys %f; } __END__ started repeating at iteration 8 after only 26 in 20/32 buckets started repeating at iteration 4 after only 14 in 13/32 buckets started repeating at iteration 5 after only 17 in 14/32 buckets started repeating at iteration 5 after only 17 in 13/32 buckets started repeating at iteration 5 after only 19 in 15/32 buckets started repeating at iteration 5 after only 18 in 16/32 buckets started repeating at iteration 6 after only 18 in 16/32 buckets started repeating at iteration 7 after only 20 in 15/32 buckets started repeating at iteration 6 after only 18 in 14/32 buckets started repeating at iteration 7 after only 26 in 21/32 buckets
        sorry for my ignorance Anonymous, but what are you demonstrating here? can you explain your code?

        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Allocation of anonymous arrays
by LanX (Saint) on Feb 07, 2014 at 15:23 UTC
    References and their stringifications are unique as long as their data structure have not been released before.

    But the deeper problem like others already mentioned, must be stressed out:

    Perl has no way to get a reference from it's stringification!

    It's a one way street...

    Then reversing this hash is like treating a marijuana addict with cocaine.

    This whole technique is pointless as long as you don't manually store a lookup-hash to be able to transform key-string to ref.

    DB<107> $aref=[1,2,3]; $lookup{$aref}=$aref => [1, 2, 3] DB<108> \%lookup => { "ARRAY(0x8ffd450)" => [1, 2, 3] }

    Tell your colleague there is no way to use literal arrays here cause the information gets lost.¹

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    PS: as a side note, Python allows other data-types to be keys, but only if they are immutable ... like literal strings are.

    update

    ¹) As long as he doesn't use a tied hash from a fancy CPAN module

      This whole technique is pointless as long as you don't manually store a lookup-hash to be able to transform key-string to ref.

      But then, in the example given, you have to have the value of  $aref (i.e., the reference) in order to stringize it and use it look up the value of  $aref — which also seems pointless.

        There different shades of pointless in this discussion. :)

        you don't have to keep all $arefs after construction.

        DB<112> for $aref ( [1,2,3],[4,5,6],[7,8,9] ) { $lookup{$aref} = $aref; $hash{$aref} = [ reverse @$aref ]; } => "" DB<113> \%hash => { "ARRAY(0xa547e40)" => [9, 8, 7], "ARRAY(0xa5c2e68)" => [6, 5, 4], "ARRAY(0xa5c3188)" => [3, 2, 1], } DB<114> \%lookup => { "ARRAY(0xa547e40)" => [7, 8, 9], "ARRAY(0xa5c2e68)" => [4, 5, 6], "ARRAY(0xa5c3188)" => [1, 2, 3], } DB<115> print "@{$lookup{$_}}\n" for keys %hash 7 8 9 4 5 6 1 2 3

        Hiding all of this behind a tied hash should be feasible, (depending on implementation details of Tie::Hash , IAW when, where and how "stringization" happens )

        Anyway I didn't try to find such implementations on CPAN.

        One use case could be to implement sets of complex data structures including set operations

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        updates
        corrected link to Tie::Hash
Re: Allocation of anonymous arrays
by sundialsvc4 (Abbot) on Feb 07, 2014 at 15:20 UTC

    No, Perl won’t put two arrays into the same storage just because (at the moment ...) their values are identical.  But ... in practice, it sure is easy to come into a situation where you think that it did ... where you think that you’re just changing a value “over here,” and ... (gasp!) ... a value “over there” just changed, too!

    What will actually turn out to have happened, in those cases, is that you had a reference to the same list of values in two or more places ... that you thought that you were telling Perl to move (to duplicate) an array or a hash (something that requires to be “referenced” ...).   But what Perl was actually doing was creating references.   It seemed to work, until you changed what you thought was an independent, isolated value, and saw that the changes had apparently propagated.   Easy to do.   And the [erroneous ...] explanation, that the optimizer did something wrong, is also an intuitively-appealing assumption.

    Perl tries hard to be a “DWIM = Do What I Mean, TMTOWTDI = There’s More Than One Way To Do It™” language, and so its very-flexible syntax is occasionally misleading.   If you are used to dealing with strongly-typed and/or compiled languages where there’s really only one way to do it and where any deviations from that will be caught for you at compile time ... Perl isn’t like that.   Its design is not like that, which is neither right nor wrong.

      In fact, perl frequently has that, what we used to know when dealing with the Navy at Ferranti - Training Simulator Group as the, 'NWITIAF' (Not What I Thought I Asked For) factor :-)

      A user level that continues to overstate my experience :-))
Re: Allocation of anonymous arrays
by sundialsvc4 (Abbot) on Feb 07, 2014 at 17:11 UTC

    Even though, sometimes, the value that is the “total key to” a particular value, suitable for use in a hash-key that singularly leads to one value ... it might be almost-as-well to formulate a hash-key that potentially leads to a list of values, through which (an appropriate accessor-method) will have to search.   A well-made accessor makes it easy / invisible.   For that matter, you could have several hashrefs and multiple references.   Basically, like what you have with the indexes that point into a DB table.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1073836]
Approved by baxy77bax
Front-paged by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-24 21:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found