Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: To Hash or to Array--Uniqueness is the question.

by Moron (Curate)
on Dec 02, 2005 at 16:19 UTC ( #513630=note: print w/ replies, xml ) Need Help??


in reply to Re: To Hash or to Array--Uniqueness is the question.
in thread To Hash or to Array--Uniqueness is the question.

A simple hash can be made to maintain order and since zero extra lines are used in the foregoing example (just a couple of inline additions), the overheads of interfacing to a module seem way too much. One should not reach for the nearest module the minute the going gets the slightest bit non-trivial - rather later in the thought process perhaps - it also seems appropriate to encourage people to think rather than be mentally lazy.

my %h = (); while (<$fh>) { $h{ $_ } ||= $.; } close $fh; print sort { $h{a} <=> $h{b} } keys %h;

-M

Free your mind


Comment on Re^2: To Hash or to Array--Uniqueness is the question.
Download Code
Re^3: To Hash or to Array--Uniqueness is the question.
by Nkuvu (Priest) on Dec 02, 2005 at 17:10 UTC
    On the contrary, you're sorting the keys of the hash, but you lost the order of the input. Given the sample data I used in my script (one foo bar one baz two quack baz) the sort method prints out bar, baz, one, quack, foo, two. Of course if one wanted the sorted order then the use of the array is pointless. But if one is looking for the order of the input data, an array (or a module) is, as far as I know, necessary.
      No, he's using the values of the hash to store the intended order, and sorting on the values instead of the keys. The downside is that instead of just keeping the order around, we're spending time sorting to get the order back.

        Bah, this is what I get for being lazy. ;)

        Instead of creating a file so that I could open it via the $fh filehandle, I used the following snippet:

        # sort method my @values = qw( one foo bar one baz two quack baz ); my %h = (); foreach (@values) { $h{ $_ } ||= $.; } print "\n\nSort method:\n"; print (join ", ", sort { $h{a} cmp $h{b} } keys %h);

        And totally missed the fact that Moron was using $. as the hash value. Thanks for pointing out what I should have already noticed.

Re^3: To Hash or to Array--Uniqueness is the question.
by shemp (Deacon) on Dec 04, 2005 at 00:08 UTC
    While i agree that this solution works, it only works in the very isolated situation where you are reading the keys from a file and that you get $. effectively for free (not really, but Perl is going to manipulate $. anyway)

    But in general this solution does not work directly. If the hash is being populated in a different way than by reading the keys from a file then $. is lost.

    Sure you could pick some other variable to be the order count for the keys, something like:

    my %h; my $h_order = 0; while ( my $key = get_key_from_somewhere() ) { if ( ! exists $h{$_} ) { $h{$_} = $h_order++; } }
    But then $h_order will need to be passed around with %h if any other values are going to be assigned, not to mention always performing the additional logic whenever assignments occur. If %h is going to be used in some function that i don't have control of the internals then the order is lost.

    I still stand by my Tie::IxHash solution because with it, other code that uses %h doesn't need to know anything special about it, it just operates like it was set up to operate.

    As for the overhead of tieing, it is rather small. Perhaps a 5-10% decrease in performance over using an untied hash, but the untied hash doesn't give the needed usage, so in implementing the functionality you need, you are losing the performance gain from not using the module.


    I use the most powerful debugger available: print!
      I agree that if $. isn't available you have to use a separate variable, which is hardly a big issue. But my criticism of tying was more a question of avoiding over-engineering than anything else. In addition, non-core modules are effectively prohibited for all the large corporate production environments I've ever worked in. Why 'effectively' rather than explicitly is another tale.

      -M

      Free your mind

        ... non-core modules are effectively prohibited for all the large corporate production environments ... Why 'effectively' rather than explicitly is another tale.

        I'd like to hear it.

        Cheers, Sören

        Ouch, thats got to suck, not being able to use non-core modules. I definitely understand where you're coming from about not just reaching for some module to do the work for you. I used to be very much against using whatever module, but i think that i've become much more for not re-inventing the wheel. I must say i had to be in a strange mood when i wrote my original response to you, I used to get pretty annoyed when someone would write 'use such and such module' in response to a complex question.

        I use the most powerful debugger available: print!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://513630]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2014-11-28 18:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (199 votes), past polls