Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: Using hashes for set operations...

by John M. Dlugosz (Monsignor)
on May 23, 2011 at 17:15 UTC ( #906322=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Using hashes for set operations...
in thread Using hashes for set operations...

In Perl stringification of references has no performance penalty. It's just a string with the reference addressą and the type (this includes package name if blessed).
perl -e '$p=[];print $p'
ARRAY(0x928c880)
(I think you are confusing with other language like JS˛, where the whole data is dumped)
No, I'm thinking that it is pointless to compare references since any two copies will test as unequal. Instead, you must manually write something that stringifies (or hashes, in the other sense of the word) to a canonical form in order to then test for equivalence.

I guess that depends on what the user intends, so the FAQ should point out that using a reference (or object) as a hash key will stringify as you show, so do the same test as == against the reference itself (the address).

I think intersection and friends should be like sort, in that they can take a piece of code that is used to determine what is meant by equivalence in this particular case. That's easy to call but can be inefficient; and just like you use the whatever maneuver with sort to cache the keys, you could do the same with intersection. But the eventual module can have that built-in, as your ideas directly incorporate that kind of keying. Then the user needs to provide code to produce a canonical key of one item, as opposed to comparing the equivalence of two parameters.

But back to the underlying code: If I want two ad-hoc uses of [qw/1 2 3/] to be considered the same, stringifying the reference won't do it. It needs to call a function to generate the string key from the contents. And we suppose that this is expensive, so only call it once per value in each input list.

The user wants to find the intersection of two lists, so he would be told to pass @set1 and @set2, and optionally a &func, which defaults to built-in stringification. Prepare your internal %set1 from @set1 and func(each element), and arrange the code (at least in the case where a func is passed -- it could have different implementations) to not need to call func again on some value but to always keep it with the key.


Comment on Re^3: Using hashes for set operations...
Select or Download Code
Re^4: Using hashes for set operations...
by LanX (Canon) on May 23, 2011 at 20:15 UTC
    > No, I'm thinking that it is pointless to compare references since any two copies will test as unequal.

    your thinking of nested structures I'm thinking of objects. If you have instances representing something like "Employees" you don't wanna identify twins.

    > so do the same test as == against the reference itself (the address).

    Actually it's eq, think about the way scalars are compared.

    > The user wants to find the intersection of two lists, so he would be told to pass @set1 and @set2, and optionally a &func, which defaults to built-in stringification.

    I was already meditating about this, I also like the Python approach (where sets are a built-in datatype) to make the hash function operate on the basis of an "equality" method of "hashable" objects. (but I don't know how this is efficiently implemented) IIRC it's possible in Perl to overload the way objects are stringified.

    >If I want two ad-hoc uses of [qw/1 2 3/] to be considered the same, stringifying the reference won't do it.

    IMHO sets of "deeply compared" nested structure are better done with nested hashes. (kind of a tree search for each level of nesting)

    > And we suppose that this (the key function) is expensive, so only call it once per value in each input list.

    agreed.

    BTW: interesting read

    Cheers Rolf

    update: fixed unescaped brackets

      I'm thinking X, you're thinking Y: more food for the FAQ. There are different problems to be solved.

      Stringification on an object: The overloaded stringify function might not be what you want for this specific call to intersection. Just like you can sort different ways, you want to identify matches some custom way for this operation alone.

      I'll read those links and reply more later. (Oh, that's not a link, it's an unescaped [).

        > The overloaded stringify function might not be what you want for this specific call to intersection.

        Sure but you can't solve all possible tasks at the same time. The stringification of an objectref is IMHO a reasonable default, which should be configurable of course.

        Cheers Rolf

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://906322]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2014-07-29 21:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (228 votes), past polls