Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Tie::File - sorting array adds empty lines

by svenXY (Deacon)
on Sep 11, 2008 at 10:31 UTC ( #710580=perlquestion: print w/ replies, xml ) Need Help??
svenXY has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have experienced some strange behaviour with Tie::File.
The following code ties to the file, adds some elements, then sorts them and unties. After that, a newline is added to the file. Those newlines add up and clutter the file.

#!/usr/bin/perl use strict; use warnings; use Tie::File; tie my @tied_array, 'Tie::File', 'tiefile' or die "Could not tie to t +iefile: $!"; push(@tied_array, $_) for (1..3); print "\@tied_array has " . scalar @tied_array . " elements before sor +ting\n"; @tied_array = sort {uc($a) cmp uc($b)} @tied_array; print "\@tied_array has " . scalar @tied_array . " elements after sort +ing\n"; untie @tied_array;
prints:
~/dev/perl$ perl test_tie_file.pl @tied_array has 3 elements before sorting @tied_array has 4 elements after sorting ~/dev/perl$ perl test_tie_file.pl @tied_array has 7 elements before sorting @tied_array has 8 elements after sorting ~/dev/perl$ perl test_tie_file.pl @tied_array has 11 elements before sorting @tied_array has 12 elements after sorting ~/dev/perl$ cat tiefile 1 1 1 2 2 2 3 3 3 ~/dev/perl$

Sure, I could get rid of those lines easily*, but I find it strange that they are added in the first place. Can anyone shed some light on this?

Regards,
svenXY

* Update: @existing_packages = grep {!/^$/} @existing_packages; does the trick

Comment on Tie::File - sorting array adds empty lines
Select or Download Code
Re: Tie::File - sorting array adds empty lines
by ikegami (Pope) on Sep 11, 2008 at 20:09 UTC

    I can replicate your results
    with Perl 5.8.8 and Tie::File 0.97 and
    with Perl 5.10.0 and Tie::File 0.97_02.

    sort is optimized to sort in place when the source and destination are the same.

    >perl -MO=Concise -e"@a = sort @a" 2>&1 | find "sort" 7 <@> sort lK/INPLACE ->8 >perl -MO=Concise -e"@b = sort @a" 2>&1 | find "sort" 7 <@> sort lK ->8

    I don't know if it's a bug in Tie::File when dealing with sort's optimization or a bug in sort's optimization when dealing with tied arrays, but the bug can be avoided by avoiding the optimization:

    @tied_array = map $_, sort { uc($a) cmp uc($b) } @tied_array;

    Update: Better yet,

    @tied_array = ((), sort {uc($a) cmp uc($b)} @tied_array);
Re: Tie::File - sorting array adds empty lines (followup)
by ikegami (Pope) on Sep 11, 2008 at 20:29 UTC

    For what it's worth, I can't replicate it with a different type of tied array.

    use strict; use warnings; use Tie::File qw( ); use Tie::Array qw( ); for my $module ('Tie::File', 'Tie::StdArray') { print("$module:\n"); tie my @array, $module, 'tiefile'; push @array, "[$_]" for 0..2; print(scalar(@array), "\n"); @array = sort @array; print(scalar(@array), "\n"); print("\n"); }
    Tie::File: 3 4 Tie::StdArray: 3 3
Re: Tie::File - sorting array adds empty lines
by wojtyk (Friar) on Sep 11, 2008 at 22:44 UTC
    I actually spent a few hours looking into it out of curiosity. It's a very unique bug. Like ikegami said, it only occurs in the optimized case of a tied array where the sort is of this format: @a = sort @a.

    From what I can tell of gdb stumblings through Perl_pp_sort(), in the particular event of a tied array in the above format, the code branch at line 1716 of pp_sort.c will be followed (code below):

    if (av && !sorting_av) { /* simulate pp aassign of tied AV */ ... av_extend(av, max); ... }

    When the av_extend is called, max has the correct value that was returned from FETCHSIZE. However, the code that deals with tied arrays at the top of av_extend ends up pushing max+1 onto the stack prior to the EXTEND call:

    Perl_av_extend(pTHX_AV *av, I32 key) { MAGIC * const mg = SvTIED_mg((SV*)av, PERL_MAGIC_tied); if (msg) { ... PUSHs(SvTIED_obj((SV*)av, mg)); PUSHs(sv_2mortal(newSViv(key+1))); PUTBACK; call_method("EXTEND", G_SCALAR|G_DISCARD);

    I haven't the foggiest why this is, as I'm no Perl internals expert. But the result appears to be an off-by-one in the module's implementation of the EXTEND.

    I think the reason it doesn't affect many other modules is that the bulk of modules that use tied arrays (that I've tested at least) have EXTEND as a no-op function ({}). Tie::File, on the other hand, actually uses the EXTEND to determine the number of records in the file. Because of this, you always end up with an extra empty record (which in this case is a newline, since that is the default record separator) because of the off-by-one.

      Thanks. Submitting bug report for Tie::File. (Upd: CPAN RT bug #39196 )

      But the result appears to be an off-by-one in the module's implementation of the EXTEND.

      It's not an off-by-one error, at least not on the module's behalf.
      EXTEND is used to expand the internal buffer.
      STORESIZE is used to actually change the visible size of the array.
      Tie::File incorrectly treats EXTEND as STORESIZE.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://710580]
Approved by lamp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2014-08-20 10:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (110 votes), past polls