Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: What's the most efficient way to write out many lines of data?

by flounder99 (Friar)
on Jul 09, 2003 at 20:34 UTC ( [id://272818]=note: print w/replies, xml ) Need Help??


in reply to What's the most efficient way to write out many lines of data?

Without seeing your code I don't know for sure but I think your problem is not Perl but I/O. I created this test program to create a file with 554,152 100 character records. I then reopen the file and split it into 10 character comma delimited fields using a regexp which I thought would slow.
use Time::HiRes qw/ gettimeofday /; use strict; my $starttime = gettimeofday; open OUTFILE, ">file.txt" or die $!; for (1 .. 554152) { print OUTFILE "X"x100, "\n"; } close OUTFILE; print "Creating file took ", gettimeofday - $starttime, " seconds\n"; $starttime = gettimeofday; open INFILE, "<file.txt" or die $!; open OUTFILE, ">file1.txt" or die $!; while (<INFILE>) { chomp; print OUTFILE join (",", /(.{,10})/g), "\n" } print "Splitting file using regexp took ", gettimeofday - $starttime, +" seconds\n"; __OUTPUT__ Creating file took 8.515625 seconds Splitting file using regexp took 1.5 seconds
This is on Win2k/Activeperl 806 using a fast P4 with a 10k rpm hard drive and 1Gb ram so the read is probably all from the disk cache. Check your code and make sure you aren't opening the output file for every line. I've seen people do that and slow things to a crawl.

--

flounder

Replies are listed 'Best First'.
Re: Re: What's the most efficient way to write out many lines of data?
by hagen (Friar) on Jul 10, 2003 at 04:43 UTC
    Thanks for the interesting example... I had to try it as the times you quoted seemed very fast. But on my P4 with 512Mb RAM, and <who knows?> disc speed your program creates the file in around 3.5 secs!

    However the "converted file" only contained "\n"s.

    I'm no guru, but I worked out that (I think) you need a split before the join, or you won't have a list that join requires.

    print OUTFILE join (",",  (split /(XXXXXXXXXX)/)), "\n";

    worked, sort of, for me - I couldn't get your /(.{,10})/ pattern to work, although I think I understand what it's trying to match - any 10 chars exactly and reflect those in the stream as well as the other characters - which in this case aren't any. The resulting file simply had "\n"s as the split didn't seem to find a match.

    This then took around 20 seconds...

    $ perl file.pl Creating file took 3.40489602088928 seconds Splitting file using regexp took 19.5581229925156 seconds

    This results in lines containing collections of 10 sets of the following ",XXXXXXXXXX," so that 2 commas appear between adjacent groups of X's and at the beginning and end of each line.

      D'oh!

      You'd think I'd at least look at the output file! I changed the regex line to

      print OUTFILE join (",", /(.{1,10})/g), "\n"
      and I got the results:
      Creating file took 9.90625 seconds Splitting file using regexp took 12 seconds
      a lot slower but nowhere near 16 minutes.

      --

      flounder

        Yup! That did it.

        However this is an idiom I'm not familiar with - I've only got the Camel's head here at work (Perl in a Nutshell) and the join entry doesn't expand on it. It's a great example of DWIM!

        With my use of the explicit split, as I said, the split file had lines like:-

        ,XXXXXXXXX,,XXXXXXXXXX,,XXXXXXXXXX,,...

        however, when I changed it to your use - even using my beginner's pattern of /(XXXXXXXXXX)/g - it DWIM'd!

        If you (or another brother) have time I'd appreciate an "expanded" Perl baby-talk explanation - i.e. put in all the bits that aren't necessary so we can see what's been left out.

        I (think I) understand the default $_ as being the input, but it's the use of a pattern to generate list context that's new to me.

        Thanks in advance

        hagen

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://272818]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-03-19 03:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found