Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Hash vs constant vs package vs other for data structure

by oldtechaa (Beadle)
on Mar 27, 2017 at 18:38 UTC ( [id://1186120]=perlquestion: print w/replies, xml ) Need Help??

oldtechaa has asked for the wisdom of the Perl Monks concerning the following question:

I'm using an AoAoA currently for a data structure. An example of its use would be like this: $notes[$x][$y][0] = ... As you can see, the first two dimensions refer to the location of an object and the third refers to properties of that object such as flags and other data. My problem is that although the indices always refer to the same data member, it's not very readable or maintainable if you forget the index number for the data you want.

A couple solutions I've thought of are below:

  • Use an AoAoH instead, and refer to each data member by name
  • Use constant declarations to name each index
  • Use upper-case variables to show their status as constant names for indices
  • Use a package with setter and getter functions or public data members and use an AoA with the package objects as the contents
  • Get other ideas from PM

What do you think? I don't want to use any external CPAN distributions and I'd like the solution to not require too much boilerplate. Thanks for your time.

Replies are listed 'Best First'.
Re: Hash vs constant vs package vs other for data structure
by haukex (Archbishop) on Mar 27, 2017 at 18:50 UTC
    1. Use an AoAoH instead, and refer to each data member by name
    2. Use constant declarations to name each index; Use upper-case variables to show their status as constant names for indices
    3. Use a package with setter and getter functions or public data members and use an AoA with the package objects as the contents

    The first is probably the way I would go. The second is probably mostly useful if performance is a concern; also it would of course mean you don't have to change your data structure. The third OO approach might be useful if your objects not only have properties, but also methods.

    The only thing to take into consideration with the first solution is that you don't have automatic protection against typos in property names like you do in the other two solutions. One possible solution is locked hashes, like I showed here, but note also what I mentioned lower down in that same thread (here) in regards to that they might be deprecated someday.

      I don't have methods, so I don't believe I'll go the OO route. It seems like the constant or uppercase methods perform better, are just as readable, and don't make me change my data structure. What are the advantages of the hash method?
        What are the advantages of the hash method?

        Two that I can think of off the top of my head are that you can have sparse property sets (e.g. if one object has {foo=>1,bar=>2} and the other has {quz=>1,baz=>2}, whereas you'd need four array elements to cover that, a bit of a waste), and that textual serializations of the data would be self-documenting. But if neither of those are a concern to you, then at the moment I can't think of major disadvantages to using constants for the array indicies.

        perl -MData::Dumper -E '$A[3][5][0]="foo"; die Dumper \@A'

        output:

        $VAR1 = [ undef, undef, undef, [ undef, undef, undef, undef, undef, [ 'foo' ] ] ];
        Long? Just try 300 by 500!

        perl -MData::Dumper -E '$A{3}{5}{0}="foo"; die Dumper \%A'

        output:

        $VAR1 = { '3' => { '5' => { '0' => 'foo' } } };

        smaller, but slower to iterate through if you have lots of entries. Still you can use an intermediate variable that acts like a pointer:

        perl -MData::Dumper -E '$A{3}{5}{0}="foo"; $v = $A{3}{5}; $v->{1}="bar +"; die Dumper \%A'

        if you have fixed dimensions... you can also try: $NUM = $x + $y*$WIDTH so if you have 300 pixels wide, (x,y)=(3,5) becomes 3+5*300 = 1503

        but check if you will not run above your maxint: 718414 with that method

Re: Hash vs constant vs package vs other for data structure
by AppleFritter (Vicar) on Mar 27, 2017 at 18:43 UTC

    Given this...

    the first two dimensions refer to the location of an object and the third refers to properties of that object such as flags and other data

    ...I'd definitely do this:

    Use an AoAoH instead, and refer to each data member by name

    YMMV, of course.

Re: Hash vs constant vs package vs other for data structure
by Laurent_R (Canon) on Mar 27, 2017 at 19:57 UTC
    The best way to store your data really depends on what you're doing with it afterwards. Also a top level hash might be better if your data points are sparse.

    Assuming that you'll never use the horizontal coordinate (abscissa) without the vertical coordinate (ordinate), you might even store them as a concatenated value in a hash, thereby simplifying your data structure by removing one level of nested-ness:

    my %notes; # notes is now a hash #... $notes{"$x;$y"}[0] = ...
    or:
    $notes{"$x;$y"}{...} = ...
      This is an interesting technique. It would certainly work and I do have a sparse data set, but it doesn't seem clearer. What are the performance impacts of this?
        What are the performance impacts of this?
        It would have to be measured, i.e. bench-marked, with real data.

        However, my gut feeling is that removing one level of nested-ness is likely to speed up things a bit, but probably not by a large margin. I doubt that you really care about the difference for what you're doing. So, don't worry too much about performance, unless you really have to.

        The hash solution (especially with concatenated keys) is very likely to use far less memory, at least with sparse data. Suppose you've got only one data point with coordinates (800, 1200). With an array of arrays, you have to allocate essentially 800 * 1200 array slots, that's quite a lot or memory for just one data piece. But with a hash you need to allocate only one or two hash entries; even considering that a hash entry uses more memory than an array entry, there is a significant win here.

        but it doesn't seem clearer

        Granted, but it makes things simpler (and easier) if you need to traverse your entire data structure. You essentially get a better data abstraction if you think in terms of "location", rather than "x-y coordinates".

        I should probably note here that since I do have a sparse data set, the array members are undefined until needed.
Re: Hash vs constant vs package vs other for data structure
by Anonymous Monk on Mar 27, 2017 at 19:35 UTC
    "As you can see, the first two dimensions refer to the location of an object and the third refers to properties of that object such as flags and other data."

    However you later state: "I don't have methods, so I don't believe I'll go the OO route."

    Well, too late. You already have and now you are trying to change the rules. Instead you should store the fact that an object is on the canvas and it's location. Use an array to store this info like so:

    my @widgets = ( { object => $object, x => $x_location, y => $y_location }, { object => $object, x => $x_location, y => $y_location }, { object => $object, x => $x_location, y => $y_location }, );
    Note that those variables names are just placeholders for the real variables and their data. This way you only have to be concerned with the grid points that actually have data.
Re: Hash vs constant vs package vs other for data structure
by Anonymous Monk on Mar 27, 2017 at 21:13 UTC
    This thread seems like a lot of premature optimization. If you don't have an actual performance problem (too much memory used, or some operation is taking too long), then don't worry about it, and definitely don't spend any time re-engineering your existing code.
      It actually was originally intended just to find a way to make my 3D array more readable. Optimization is a side-point that I feel must be taken into consideration when choosing a better solution.
Re: Hash vs constant vs package vs other for data structure
by BrowserUk (Patriarch) on Mar 29, 2017 at 13:22 UTC

    FWIW: my preference would be for the use of enum for the constants, and stick with the AoAoA (unless sparsity is required):

    use enum qw[ X Y ];

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
      This was another option I didn't add to the list, but sparse data support would be nice.
Re: Hash vs constant vs package vs other for data structure
by Anonymous Monk on Mar 27, 2017 at 19:04 UTC
    "the first two dimensions refer to the location of an object ..."

    Why? This seems to be the root of your problem.

      This array is holding the data from a custom GTK widget and I need to know every point in a grid on the widget to store data from. Obviously, a two-dimensional access is the easiest way of managing it.
        This makes no sense ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1186120]
Approved by AppleFritter
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-25 18:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found