Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.

Re: Re: Assining data to an array is slow..

by smferris (Beadle)
on Mar 01, 2001 at 04:30 UTC ( [id://61476] : note . print w/replies, xml ) Need Help??

in reply to Re: Assining data to an array is slow..
in thread Assining data to an array is slow..

I understand that not assigning the data back to an array isn't useful. But it is still parsing the row, correct? My point was that to parse 2.1 million rows is fast. But storing it slows it considerably.

I think what's taking the time is the deletion and re-creation of the memory structure for each iteration of the loop. Uneccessary in my mind as the successive iterations (in this case) are always going to be of identical size.

Given the above, I was hoping for..

a) That the memory used by the unpack itself could be reused, rather than having to copy it to a perl structure.
b) That I could predefine the size of @data and not have it destroy with each iteration.

Of course.. I'm not a seasoned programmer and this entire thread is just a waste of everyones time in which case I apologize. 8)

I just think if unpack has to put it into it's own array(has to or how does it know what to send back) that assigning it to a perl data type shouldn't take at least 6 times as long. If course, I really don't know what the behind the scenes of perl actually does to store data in memory.

Shawn M Ferris
Oracle DBA

  • Comment on Re: Re: Assining data to an array is slow..

Replies are listed 'Best First'.
Re: Re: Re: Assining data to an array is slow..
by chromatic (Archbishop) on Mar 01, 2001 at 05:18 UTC
    The key to understanding Perl as it works is 'context'. Every operation takes place in some sort of context -- that means, its results will be evaluated as a single item or as a list of items. (There is also 'void' context and boolean context, which are more or less official but not germane to the discussion.)

    For example, evaluating a list in scalar context produces the number of elements in the list. In list context, it produces the elements of a list:

    my @list = (1, 2, 3); # list context, @list -> (1, 2, 3) my $num = @list; # scalar context, $num -> 3 my @second_list = @list; # list context, @second_list -> (1, 2, 3)
    Perl the interpreter is smart enough not to do more work than it has to (in most cases), so it usually determines the context of an operation before performing the operation to weasel out of extra work or to produce the right results for the context. You can do the same if you use wantarray().

    This is important because unpack performs differently in scalar and in list context. Its perldoc page says that in scalar context, it returns just the first value. In list context, it returns all values.

    In your first code snippet, it's evaluated in scalar context (more properly void, but we'll keep this simple). Perl can tell that you don't care about the return values, so it only has to unpack the first bit of data. It ignores the rest. (Since it's in void context, it may *completely* ignore the *entire* string, but I haven't looked at the source.)

    This means the first snippet isn't doing as much work as the second, even in the unpack statement itself. Put aside the array assignment for the moment -- besides that, the two snippets aren't doing an equal amount of work!

    To find out how much work the unpack would do in list context, put it in list context:

    while ($row = <FH>) { () = unpack("a9 a40 a15 a15 a15 a2 a9 a9 a9",$row); }
    This will be a more meaningful benchmark.

    Besides all that, Perl handles memory internally via a reference-like mechanism. None of this tedious copying-the-contents-of-one-location-to-another jive you get in C. So the overhead is creating an array structure and populating it with the things unpack returns anyway. It's a whole lot smarter about these things than C.

    In short, don't worry about memory management in Perl for now.

      Ahhh... That's much more clear. I was (guessing) that unpack called in either context still split the entire row. The statement "In scalar context, it returns merely the first value produced." is a bit misleading. (To me anyway)

      I read it to say that it still splits the entire row returning only the first value. Had this been the case, then the memory would've been allocated already. To store each value. If each value is stored very quickly internal to unpack, why would it take longer to copy it to a new structure. That isn't true and now I understand what is really happening. 8)

      As always this is a great resource for those of us that haven't read the source to perl. (Do you blame me. It's huge! ;) Even if I did, I'm not certain I'd understand it anywhere near as well as you all do.)

      Thanks for the clarification. I really appreciate it!

      Shawn M Ferris
      Oracle DBA