Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
As has been pointed out before, the fundamental cause of the slowness of your orginal code is that you are doing an open/write/close operation for each input line. That's no good -- open and close are slow things to do.

Here's something else you can consider -- anonymous open filehandles aggregated and managed within a container.

Huh?

Check it out...

I created 2 programs. make_words.pl that creates some number of words (well, string of "a" of length between 1 and 10) and file_words.pl that puts them into files named LENGTH.words. I run them together, with: ./make_words.pl 100 | ./file_words.pl (The 100 is the number of "words" I want to generate.)


make_words.pl

#!/usr/bin/perl -w

use strict;

my $len;

foreach ( 1 .. $ARGV[0] ) {
    my $len = (int(10 * rand())) + 1;
    print (('a' x $len), "\n");
}

exit 0;

file_words.pl

#!/usr/bin/perl -w

use strict;
# use Data::Dumper;

my @filehandles = ();

while ( my $word = <> ) {

    my $len = length($word) - 1;

#    print Dumper(\@filehandles);

    unless ( $filehandles[$len] ) {

	open($filehandles[$len], "> $len.words") or do {
	    warn "$len.words already exists!\n";
	    next;
	};
    }
    print {$filehandles[$len]} $word;
}

foreach ( @filehandles ) {
    close($_) if $_;
}

exit 0;

make_words.pl is too simple for analysis. (Right?)

In file_words.pl, I create an array that is meant to store my anonymous filehandles. Of coure, there isn't anything in it at the begining of the program.

Looping through all the words, I determine the length of the word in question. (Why bother chomp-ing? We'll use the newline later).

Then, I want to write the word to the file. The next thing to do is to open an appropriately-named filehandle for writing... unless there already is an appropriate filehandle. In which case, just print the word to that filehandle.

Notice that I'm using a scalar as my filehandle. Perl will autovivify an anonymous filehandle and assign it to that scalar, assuming it's an assignable scalar... such as what can be found in an array element. (The array is my aggregator -- I aggregate [collect] my filehandles within it.)

At the end of the program, I go through my array and close any filehandles that are stored within it. I don't just want to close every element in the array, as some of them might be "undef" values.

Note that if you uncomment the 2 commented lines, you'll get some data-dumper output that shows you the gradual population of elements within the @filehandles array with anon filehandles.

Note that older version of Perl 5 won't support this kind of filehandle autovivification of empty assignable scalars. (In which case, you can still use this technique, but with minor modifications. But ask me about this later if you like.)

Here's a bit of a screencapture of the whole procedure...

rdice@tanru:~/tmp/test$ rm *.words; ./make_words.pl 100 | ./file_test.pl
rdice@tanru:~/tmp/test$ wc -l *.words
      8 1.words
     11 10.words
      8 2.words
     15 3.words
      7 4.words
     11 5.words
     10 6.words
      9 7.words
     10 8.words
     11 9.words
    100 total

Cheers,
Richard


In reply to Re: Re: Re: Extreme Example of TMTOWTDI by Dice
in thread Extreme Example of TMTOWTDI by Cody Pendant

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others drinking their drinks and smoking their pipes about the Monastery: (19)
    As of 2014-07-23 14:11 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (144 votes), past polls