Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Increase speed script

by marto9 (Beadle)
on Jul 11, 2008 at 13:05 UTC ( [id://696956]=perlquestion: print w/replies, xml ) Need Help??

marto9 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm a beginner in perl and I made some code to clean log files. But the speed isn't fast enough. So I was wondering if a more advanced user could change it a little. Thx in advance
#!/usr/bin/perl use warnings; use strict; use Term::ReadKey; print "[-] Enter the filename: "; chomp($file = <STDIN>); open(LIST,$file); @file1 = <LIST>; close(LIST); if (-e "cleaned.txt") { print "[-] Can cleaned.txt be deleted? "; $answer = <stdin>; if ($answer =~ /yes/i) { unlink "cleaned.txt"; } else { &exit } } print "[-] Cleaning...\n"; foreach $data (@file1) { ($w1) = split(/:/,$data); open(OUTPUT,">>cleaned.txt"); print OUTPUT $w1."\n" unless ($data =~ /-/); close(OUTPUT); } &exit; sub exit() { ReadMode 2; print "\n[-] Press ENTER to exit..."; <STDIN>; exit; }
edit: Sorry people. I just saw that I posted this in the wrong section. Can a mod please move this.

Replies are listed 'Best First'.
Re: Increase speed script
by roboticus (Chancellor) on Jul 11, 2008 at 13:21 UTC
    marto9:

    I believe that the primary speed problem is due to you opening and closing the output file in your cleaning loop. You should open the file before starting the loop, then write out all your data, and close at the end:

    print "[-] Cleaning...\n"; open(OUTPUT,">>cleaned.txt") or die "Can't open cleaned.txt!"; foreach $data (@file1) { ($w1) = split(/:/,$data); print OUTPUT $w1."\n" unless ($data =~ /-/); } close(OUTPUT) or die "close error";

    A couple of minor items:

    • Always check your file operations (the or die "msg" bits above) to ensure that they were successful.
    • If cleaned.txt is a "valuable" file, you should probably delete it only after successfully building your new one. You could, for example, build the file as cleaned.txt.tmp. Then, only after the final close is successful, you could rename cleaned.txt to cleaned.txt.bak and then rename cleaned.txt.tmp to cleaned.txt.

    ...roboticus

    UPDATE: Fixed a bit of awkward grammar, added some formatting for readability.

      Thx roboticus! That speeded up my script A LOT. :) And I also added some "or die" code in it.
Re: Increase speed script
by karavelov (Monk) on Jul 11, 2008 at 16:57 UTC
    It the files to be cleaned are quite big it is better to process them line by line. I have in mind something like
    open LIST,$file; open OUTPUT,">>",'cleaned.txt'; print "[-] Cleaning...\n"; while (<LIST>) { ($w1) = split(/:/,$_); print OUTPUT $w1."\n" unless ($data =~ /-/); } close LIST; close OUTPUT;
      Ty for you advice. But could you pls tell me why it's better to use the while loop instead of the for loop?
        Because the for brings the whole file to the memory before walking thru the lines. This is usually slower than going line-by-line in the file (as the while does) because:
        1. it fills many pages of the memory with the contents of the file, potentially swapping stuff out -- instead of just allocating a couple of pages for a file buffer;
        2. then it walks thru those pages, potentially using cache lines -- instead of pulling the file buffer to the same cache line over and over, freeing cache lines to other stuff;
        3. goes to the disk all at once, blocking until it has read everything -- instead of going to the disk when it had already processed the last chunk of information and then blocking for less time.
        []s, HTH, Massa
Re: Increase speed script
by kyle (Abbot) on Jul 11, 2008 at 17:06 UTC

    The suggestion from roboticus is a good one.

    I'd like to add to that a more general suggestion. When you have a performance problem, there are tools available to help tell where the problem is. Using Devel::DProf or Devel::Profile, you can get profiling data on a per-sub basis. For a tutorial, see Profiling your code.

    For something like what you have, where there aren't many subs, just a short program of lines, Devel::NYTProf and Devel::SmallProf are more appropriate tools.

    Try them out. You may be surprised at what they show (I usually am).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://696956]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-25 06:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found