http://www.perlmonks.org?node_id=337760


in reply to Re: Re: Data structure challenge
in thread Data structure challenge

Okay. Forget the bareword SPARSE, it is just a filehandle, opened (R/W) to an in memory file per perlfunc:open (5.8.x):

File handles can be opened to ``in memory'' files held in Perl scalars via:
open($fh, '>', \$variable) || ..

My inspiration was that if you seek on an in-memory file, the string is extended, without being initialised. Of course it isn't if it is going to emulate a file. If you seek to a position in a random access file you don't want everything preceding it to be overwritten.

So, a pure-perl way of allocating a buffer without initialisation.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re: Re: Re: Re: Data structure challenge
by theorbtwo (Prior) on Mar 21, 2004 at 11:21 UTC

    It all comes down to what you mean by "constant time", I think. On the one hand, the time taken to access an element is not always the same: accessing elements greater then the internal size of the string is slower then if it's not been prextended.

    OTOH, it is still O(1) on the size of the pseudoarray.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      I agree completely. I seriously doubt the there is any advantage in terms of (real-time) performance of using either the scalar-as-packed-integer-array, or the uninitialised buffer as packed array over using a hash.

      I've always had problems with big-O notation. In part this comes down to the fact that O is measured in some theoretical unit that is completely divorced from any real-time unit of measurement.

      It's perfectly possible to have two algorithms to do the same task. One which is defined as being O(1) and the other as O(N) where the former is much slower in realtime than the latter. I think this is why I tend to confuse myself when accesing algorithms. My natural instinct is to conclude that the fastest realtime implementation should have a 'lower' bigO rating than a slower implementation, but that simple isn't the case. For example, a brute force combinatorial solution to a problem written in C or assembler can outstrip an 'intelligent' algorithm written in perl, even though the former rates as O(n*m) and the latter O(n or even O(log n).

      Regardless, if my reinterpretation of Abigail's challenge--a pure perl way to allocate an uninitialised buffer--was correct, and he didn't correct me, then I think that perl 5.8.x's 'memory files' meet the challenge.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
        I wanted to try and clarify the O() ("big Oh") vs. real world aspect of this topic, and your post in particular, for the benefit of the other monks -- let me know if I hit the mark (I'm sure someone will let me know where I tripped up :)
        For example, a brute force combinatorial solution to a problem written in C or assembler can outstrip an 'intelligent' algorithm written in perl, even though the former rates as O(n*m) and the latter O(n) or even O(log n).
        O() is concerned with the asymptotic behavior of an algorithm, that is, "in the limit" as the "size" of the input approaches infinity. There is an implied (and generally unspecified) constant associated with any O() statement:
        The running time of FOO is O(n).
        translates roughly to
        For sufficiently large n, the running time of FOO is very close to C*n, where C is an arbitrary constant.
        Since the theory is concerned with n larger than anything practical, it doesn't matter what C turns out to be.

        But in practice, for reasonable inputs (e.g., inputs that the algorithm can finish before the predicted end of the Universe), C can compete with the O() term. If FOO is O(n), and n == 100, but C == 100, then FOO, for practical purposes, appears to be O(n2). It is only when n approaches ~1000 that FOO appears to behave more like O(n).

        In summary, for practical problems, the constant C is not arbitrary, and cannot be dismissed without scrutiny.

        In Theory, Theory is sufficient. In Practice, it's not.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of