Re^4: Write large array to file, very slow

"Twice as fast" seems like a lot for in memory operations when there are also disk accesses. I'm sure there are plenty of things to consider (HW, and data size), but with the following code I couldn't get past a difference of around ~5% (although I did notice that trying that with supidly big files made my computer crash :P):

use v5.20;
use strict;
use warnings;

use Benchmark qw( cmpthese );
use Data::Dump qw( pp );

my $size = 10;
my $length = 1E6;

my @data = ('X' x $length, ) x $size;

sub write_copy 
{
  open my $fh, ">", "tmp.txt" or die "Can't open output file $!";
  $| = 0;
  my $data = shift;
  for (@$data)
  {
    print $fh "$_\n";
  }
}

sub write_simple
{
  local $\ = "\n";
  open my $fh, ">", "tmp.txt" or die "Can't open output file $!";
  $| = 0;
  my $data = shift;
  for (@$data)
  {
    print $fh $_;
  }
}

cmpthese( -15,
          {
            copy => sub { write_copy(\@data); },
            simple => sub { write_simple(\@data); },
          }
        );
__END__
         Rate   copy simple
copy   27.3/s     --    -5%
simple 28.8/s     5%     --
[download]

Comment on Re^4: Write large array to file, very slow Download Code

Replies are listed 'Best First'.
Re^5: Write large array to file, very slow by hippo (Bishop) on Aug 20, 2018 at 18:11 UTC
"Twice as fast" seems like a lot for in memory operations when there are also disk accesses. Yes, I thought so too. Looks like my data set was so large it ate into swap. :-) Re-running with a smaller data set still shows quite a decent speed up, however. Here's my bench and results: #!/usr/bin/env perl use strict; use warnings; use Benchmark 'cmpthese'; my $size = 50_000_000; my @big = (rand () x $size); cmpthese (10, { 'interp' => 'interp ()', 'Eily' => 'eily ()', 'OFS' => 'ofs ()', }); exit; sub interp { open FH, '>', 'mergedlogs.txt' or die "can't open mergedlogs.txt: +$!"; local $\| = 0; foreach (@big) { print FH "$_\n"; } close FH; } sub eily { my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_f +ile: $!"; local $\| = 0; local $\ = "\n"; foreach (@big) { print $output_fh $_; } close $output_fh; } sub ofs { my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_f +ile: $!"; local $\| = 0; local $\ = "\n"; local $, = "\n"; print $output_fh @big; close $output_fh; } [download] `s/iter interp Eily OFS interp 1.83 -- -35% -35% Eily 1.20 53% -- -1% OFS 1.19 54% 1% --` [download]	[reply] [d/l] [select]
Re^6: Write large array to file, very slow by Eily (Monsignor) on Aug 21, 2018 at 08:29 UTC
I get this (perl v5.26.2): `s/iter interp Eily OFS interp 2.98 -- -5% -7% Eily 2.83 5% -- -2% OFS 2.77 8% 2% --` [download] `my @big = (rand () x $size);` What do you expect @big to contain after this though? It looks like you wanted to make an array of random numbers. But rand is only called once so you just have one repeated value. Also x is tricky (I'd even say un-perl-like) because it depends on the operands in a way that no other operator in perl does. So the first thing I did was check how many elements are in @big: 1, with 50 000 000 copies of the random value. This means that you are just writing one item and neither the for loop nor the use of $, have much of an effect (if at all) here. So finding a significant difference between Eily and OFS would have been worrying.	[reply] [d/l] [select]
Re^7: Write large array to file, very slow by hippo (Bishop) on Aug 21, 2018 at 08:52 UTC
D'oh! One of these days I'm going to learn to check the output file instead of blithely assuming that the data is what I think it is. Thanks for the cluebat.	[reply]


Keep It Simple, Stupid
	PerlMonks