Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

An irrational coding choice found

by Lady_Aleena (Deacon)
on Mar 18, 2012 at 22:13 UTC ( #960334=perlmeditation: print w/ replies, xml ) Need Help??

Have you ever made a coding choice that later you can not rationalize? For several months I have been trying to rationalize my eschewing arrays of hashes in favor of hashes of hashes. I know that I got a handle on hashes of hashes first, since that structure was one of the first concepts I learned when I dove in here years ago. Gradually the feeling has risen up that my sad devotion to HoH is more than likely holding me back. I know that there are places where I should be using AoH instead of HoH though I can not point to a specific example. I have to make sense of AoH before I can put them into use, but whenever I go to do it, I get frustrated with it, end up with another HoH, and move onto other things. I need a new approach to breaking me of this bad habit.

Have a cookie and a very nice day!
Lady Aleena

Comment on An irrational coding choice found
Re: An irrational coding choice found
by BrowserUk (Pope) on Mar 18, 2012 at 22:31 UTC

    First, you should ignore AoH versus HoH, and instead concentrate of array versus hash.

    If the 'keys' to your data are integers, contiguous, and smallish (or can be arranged to be smallish by the subtraction of some constant), use an array.

    Because:

    • arrays use less memory;
    • arrays are faster;

    Vis:

    $t=time; $a[ $_ ] = $_ for 1 .. 1e6; print time() - $t; print total_si +ze \@a;; 0.208095073699951 32388768 $t=time; $h{ $_ } = $_ for 1 .. 1e6; print time() - $t; print total_si +ze \%h;; 0.586999893188477 112277576

    So, 60% faster and 70% less space used. (That's a single specific example, but quite typical.)

    Wherever in a nested structure -- be it AoH or HoA or AoHoA .v. HoHoH -- that the keys lend themselves to the use of arrays, your code will benefit from using them in most cases.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      True but do not forget the distribution of the keys and the size of your data set. Size may matter too! Note the following example, one single entry with a simple key

      $ perl -MDevel::Size=total_size -E'$key=123;$foo{$key}=1;$foo[$key]=1; +say total_size\%foo;say total_size\@foo' 205 1080 $ perl -MDevel::Size=total_size -E'$key=123456;$foo{$key}=1;$foo[$key] +=1;say total_size\%foo;say total_size\@foo' 208 987744 $ perl -MDevel::Size=total_size -E'$key=123456789;$foo{$key}=1;$foo[$k +ey]=1;say total_size\%foo;say total_size\@foo' 211 987654408

      (I consider 123 to be small). Making the "simple" key smaller might be possible in many cases, but the calculation method to getting it to "smallish" will probably defeat the gain over hashes.

      To me the most important reasons to use arrays are:

      • Data must stay in original order
      • Data is not guaranteed to be unique
      • The "target" API works only with lists/arrays

      Enjoy, Have FUN! H.Merijn
        Making the "simple" key smaller might be possible in many cases, but the calculation method to getting it to "smallish" will probably defeat the gain over hashes.

        The subtraction of a simple constant makes almost no difference:

        $t=time; $a[ $_-123 ] = $_ for 123 .. 123+1e6; print time() - $t; prin +t total_size \@a;; 0.252718925476074 32388792 $t=time; $h{ $_ } = $_ for 123 .. 123+1e6; print time() - $t; print to +tal_size \%h;; 0.625 112278277
        To me the most important reasons to use arrays are: 1) Data must stay in original order 2) Data is not guaranteed to be unique 3) The "target" API works only with lists/arrays

        Of those, the first two are moot. If the data can be stored in an array, then it can also be stored in a hash whilst meeting both of those criteria.

        That is, it is the keys of a hash that must be unique, and that is easily achieved by incrementing a integer variable as you build the hash. And once you've done that retrieval in insertion order is just a matter of iterating the keys.

        For your third criteria, if the APIs don't accept hashes, then there is no choice.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      one nitpick (probably too obvious to mention, but I'm a simpleton...), You can save time using a hash if you need to access a single element of a hash, instead of searching the array for your elements. Also hash insertion's easy, as opposed to splicing into a sorted array (assuming you had to keep an array sorted). It really comes down to how you intend to use your data as to whether it saves time.

        New (or rather, newly identified) software pattern: Simpleton Object.

        one nitpick (probably too obvious to mention ...

        It's not so much "too obvious" as not really applicable to the OPs question.

        I wasn't trying to define the absolute criteria for when to use an array or a hash.

        Just clarify the reasoning for trading the habitual use of hashes for arrays where that is otherwise a practical proposition.

        In order for that trade to even begin to be a practical proposition, the "natural keys" to the data have to be small, near contiguous integers; or otherwise be simply and quickly transformable into a range of small, near contiguous integers. If that is not the case, no such trade is possible and the OPs question is moot.

        You can save time using a hash if you need to access a single element of a hash, instead of searching the array for your elements.

        In order to use the lookup abilities of an associative array, you must start with the key and either:

        • be looking for the associated value;
        • checking for the existence of that key.

        For the very specific subset of cases where the array-instead-of-hash trade is possible, it would mean that you would be starting with an index into the array. And either looking to retrieve the value at that index; or test if there is any value at that index.

        In both cases, if the initial trade is feasible then the lookup in the array will be substantially faster than lookup in the hash.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: An irrational coding choice found
by GrandFather (Cardinal) on Mar 18, 2012 at 22:38 UTC

    It's not HoH versus AoH so much as arrays versus hashes. There are places where arrays make sense and places where hashes make sense. Sometimes the choice is hard because neither is quite right, but most times asking yourself a few simple questions will show the way:

    • The data naturally comes as key/value pairs: hash
    • The data naturally comes as a series of values: array
    • I need to keep the elements in their original order: array
    • Key information must be unique: hash

    It is easy, although not as common as you might guess, to have incompatible answers to the questions, but very often the choice of appropriate type is pretty obvious. Note though that making the wrong choice often leads to very bad code as you try and work around the issues created.

    True laziness is hard work
Re: An irrational coding choice found
by JavaFan (Canon) on Mar 19, 2012 at 00:46 UTC
    I need a new approach to breaking me of this bad habit.
    I bet that in the majority of your cases, it doesn't make any difference on whether you're using a hash or an array. Unless this access is 1) a bottleneck, 2) the current performance is unacceptable, and 3) changing it to an array makes the performance acceptable, then you've made the wrong choice.

    Otherwise, it isn't worth fretting about.

      I concur; there is value to be had from learning *how* to get that last little squinch of performance out of a bit of code, but the actual doing-so should be saved for when you need it.

      The choice of an array-vs-hash for me frequently comes down to readability for myself and my teammates--someone else who isn't as skilled with Perl as I am will have to read this, so it needs to make sense. For the way we work, that usually means "hashes."

Re: An irrational coding choice found
by Ratazong (Prior) on Mar 19, 2012 at 10:29 UTC

    Coming back to your general question (and not the specific array vs. hashes-topic):

    Have you ever made a coding choice that later you can not rationalize?

    That is a common effect. In most SW-development-process-models this is addressed by documentation of your SW-design. By documenting all (important) design choices, you can find out later why you have choosen which way. However programmers tend to like coding and to dislike documentation ... thus the reasons for a decision are often forgotten. So in the real world this approach is less usefull than in theory.

    When I code, I often just "try if things work", without thinking too much which is the best way (unless I estimate that performance (runtime, memory) will be critical). Then later I often encounter the situation that I wonder why I didn't use another approach. The only rationalization is "it seemed to work that way before". However there is nothing wrong with that approach. Modern SW-development-methodology even encourages it: if the old decision is not good (enough), just do a refactoring .

    So the answer to your question is: Yes. However that is nothing uncommon, and nothing to worry too much.

    Rata
Re: An irrational coding choice found
by davies (Vicar) on Mar 19, 2012 at 11:25 UTC

    In one of my earliest nodes, Tk: pack, grid or place?, I asked for advice along similar lines. I found PodMaster's advice worth following: "Use what fits your brain/needs, and don't worry about it". If you follow the links in the update, you will see that I came across a situation where my tool of choice wasn't ideal. That's the point at which it is worth worrying about your choices - when they start causing you problems. As TGI said to you in Re^3: Seeing Perl in a new light: Epilog, "don't borrow trouble".

    Regards,

    John Davies

Re: An irrational coding choice found
by ruzam (Curate) on Mar 19, 2012 at 14:08 UTC

    I've made coding choices that I can't rationalize later. But I find it's often more insidious than that.

    Having made a coding choice for a particularly complicated problem, I'll come back to the code later to do more work on it (add features, fix bugs, etc) only to find I completely don't understand why I wrote it the way I did in the first place. With fresh ideas on how the code 'should' have been written, I begin a long process of re-factoring the whole mess into something that makes more sense. It isn't until after I've spent considerable time, unravelling large portions of code that I come to understand, why I wrote it the way I did the first time. Some times I get lucky and my 'fresh ideas' make for better code. More often than not, the original code stands and I've wasted time going down a dead end.

      only to find I completely don't understand why I wrote it the way I did in the first place.

      Now that's a good time for a comment or two. At least the second time around :)

      Of course, when you're in zone of the coding, you don't always recognise the things that'll throw you later.

      And some would no doubt use that as a justifiction for commenting everything, every time. But yah boo sucks to that.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Re: An irrational coding choice found
by Anonymous Monk on Mar 21, 2012 at 02:20 UTC
    Have you ever made a coding choice that later you can not rationalize?

    I once wrote a very large program in C++, does that count?

Re: An irrational coding choice found
by Anonymous Monk on Apr 15, 2012 at 07:42 UTC

    Have you ever made a coding choice that later you can not rationalize?

    All the time, until I started using version control and reviewing my choices on a weekly , even daily basis ( refactoring ).

    Its like when you first start writing code, you end up with 10-20 line if-blocks, 3-5-8 indentation levels deep, and it feels very comfortable because its all very fresh in your mind.

    It's not until you're forced to revisit that code a year later that you realize how difficult it is to read, and start turning each long if-block (or indentation level ) into a function (refactoring).

    schwern has a slideshow about this skimmable code where he tackles refactoring WWW::Mechanize

    There is also a shorter version of the skimmable code talk, in the longer version he subliminally introduces Method::Signatures

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://960334]
Approved by GrandFather
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2014-12-19 13:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (83 votes), past polls