Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Range overlap (with a lot of ranges)

by salva (Abbot)
on May 24, 2010 at 14:42 UTC ( #841398=note: print w/replies, xml ) Need Help??


in reply to Re^2: Range overlap (with a lot of ranges)
in thread Range overlap (with a lot of ranges)

you could make an array based structure (AoA) where first element is the start number of a range and 2nd element is size or end of that range

That's not very inefficient as it would use a lot of memory unnecessarily. It's better to just use a pair of arrays (@start, @end):

DB<2> use Devel::Size 'total_size' DB<3> total_size([map [rand, rand], 1..1000]) DB<4> use Devel::Size 'total_size' DB<5> x total_size([map [rand, rand], 1..1000]) 0 224176 DB<6> x total_size([[map rand, 1..1000], [map rand, 1..1000]]) 0 80464

Replies are listed 'Best First'.
Re^4: Range overlap (with a lot of ranges)
by Marshall (Abbot) on May 24, 2010 at 16:00 UTC
    Well of course two @vars will take less memory space than a AoA because there is at least an additional array, the array of pointers to array in the AoA!

    So why does this make sense?

    One basic reason why data is collected into combined structures like an array of pointers to array, is so that it is clear what data "goes with" with other data. This is a huge thing in software engineering!

    Another reason is that when we say sort a complex data structure that invovles pointers, we just adjust the pointer values to a single array instead of doing copies of the actual data or having to modify multiple data structures. We can do manipulations on a single "thing", again a huge plus.

    I suggested an AoA, but actually the most analogous thing to a 'C' 2-D array uses even more memory, a Perl AoH, not a Perl AoA!

    I suggested the AoA instead of AoH because the Op is "trying to get the job done" and there isn't going to be a huge amount of code. I think this could be just one page of Perl code. I don't see a need to "super optimize" this.

    Normally the rough priority of coding, in my mind is: 1) clarity, 2) clarity, 3) speed, 4) memory usage. There of course can be reasons and very good reasons to violate this priority scheme. The speed and memory usage actually get better (become less important) according to Moore's Law. The clarity of the code is fixed when you write it. It won't get any more clear even if CPU MIP's and memory prices drop dramatically. If the code will run easily within memory budget (in this case yes) and the main objective is clear code that works (in this case yes), then optimizing the memory usage is not the right thing to do.

      My top priority when coding is that: the program should work and produce the desired result in the expected time frame within the available resources. Style is just second.

      The OP is talking about big datasets and concerned about performance, so in that case memory usage matters.

      Besides that, talking about clarity without showing actual code is pointless

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://841398]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2019-11-20 15:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (98 votes). Check out past polls.

    Notices?