Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^4: Write large array to file, very slow

by Eily (Monsignor)
on Aug 20, 2018 at 16:14 UTC ( [id://1220732]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Write large array to file, very slow
in thread Write large array to file, very slow

"Twice as fast" seems like a lot for in memory operations when there are also disk accesses. I'm sure there are plenty of things to consider (HW, and data size), but with the following code I couldn't get past a difference of around ~5% (although I did notice that trying that with supidly big files made my computer crash :P):

use v5.20; use strict; use warnings; use Benchmark qw( cmpthese ); use Data::Dump qw( pp ); my $size = 10; my $length = 1E6; my @data = ('X' x $length, ) x $size; sub write_copy { open my $fh, ">", "tmp.txt" or die "Can't open output file $!"; $| = 0; my $data = shift; for (@$data) { print $fh "$_\n"; } } sub write_simple { local $\ = "\n"; open my $fh, ">", "tmp.txt" or die "Can't open output file $!"; $| = 0; my $data = shift; for (@$data) { print $fh $_; } } cmpthese( -15, { copy => sub { write_copy(\@data); }, simple => sub { write_simple(\@data); }, } ); __END__ Rate copy simple copy 27.3/s -- -5% simple 28.8/s 5% --

Replies are listed 'Best First'.
Re^5: Write large array to file, very slow
by hippo (Bishop) on Aug 20, 2018 at 18:11 UTC
    "Twice as fast" seems like a lot for in memory operations when there are also disk accesses.

    Yes, I thought so too. Looks like my data set was so large it ate into swap. :-)

    Re-running with a smaller data set still shows quite a decent speed up, however. Here's my bench and results:

    #!/usr/bin/env perl use strict; use warnings; use Benchmark 'cmpthese'; my $size = 50_000_000; my @big = (rand () x $size); cmpthese (10, { 'interp' => 'interp ()', 'Eily' => 'eily ()', 'OFS' => 'ofs ()', }); exit; sub interp { open FH, '>', 'mergedlogs.txt' or die "can't open mergedlogs.txt: +$!"; local $| = 0; foreach (@big) { print FH "$_\n"; } close FH; } sub eily { my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_f +ile: $!"; local $| = 0; local $\ = "\n"; foreach (@big) { print $output_fh $_; } close $output_fh; } sub ofs { my $output_file = "mergedlogs.txt"; open my $output_fh, ">", $output_file or die "Can't open $output_f +ile: $!"; local $| = 0; local $\ = "\n"; local $, = "\n"; print $output_fh @big; close $output_fh; }
    s/iter interp Eily OFS interp 1.83 -- -35% -35% Eily 1.20 53% -- -1% OFS 1.19 54% 1% --

      I get this (perl v5.26.2):

      s/iter interp Eily OFS interp 2.98 -- -5% -7% Eily 2.83 5% -- -2% OFS 2.77 8% 2% --

      my @big = (rand () x $size); What do you expect @big to contain after this though? It looks like you wanted to make an array of random numbers. But rand is only called once so you just have one repeated value. Also x is tricky (I'd even say un-perl-like) because it depends on the operands in a way that no other operator in perl does. So the first thing I did was check how many elements are in @big: 1, with 50 000 000 copies of the random value. This means that you are just writing one item and neither the for loop nor the use of $, have much of an effect (if at all) here. So finding a significant difference between Eily and OFS would have been worrying.

        D'oh! One of these days I'm going to learn to check the output file instead of blithely assuming that the data is what I think it is. Thanks for the cluebat.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1220732]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2024-04-20 13:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found