Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Controlling file size when writing

by Vautrin (Hermit)
on Mar 02, 2004 at 18:28 UTC ( #333339=perlquestion: print w/replies, xml ) Need Help??

Vautrin has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that logs its progress to a file. This is important to catch errors, however, when I leave the script running for long periods the logs can get very big (I just restarted the script and deleted a 3GB log). Is there any way to tell perl that when a file you are writing to gets over x lines long, to delete a line for every line you insert?

Want to support the EFF and FSF by buying cool stuff? Click here.

Replies are listed 'Best First'.
Re: Controlling file size when writing
by Limbic~Region (Chancellor) on Mar 02, 2004 at 18:51 UTC
    Yes - probably multiple ways.

    One way to do it would be to use Tie::File. This will treat the log like an array. You can then shift off one line before using push to add a line. I do have to caution you that this will slow things down and may use quite a bit of memory.

    Another solution involve writing to a logging daemon instead of directly to the log. The daemon can then monitor size and buffer input while rolling a log. I believe you might find something on CPAN for this, but am not sure.

    Cheers - L~R

    Update: After having a discussion with borisz via /msg it appears that rotatelogs is very similar to my second solution. It does not require Apache. The difference is it uses time based rotation instead of size. This may not be what you want but if you can't find a canned solution on CPAN it something.

Re: Controlling file size when writing
by TomDLux (Vicar) on Mar 02, 2004 at 20:09 UTC

    Many of the logging modules, such as Log4Perl, include provisions to rotate logs when they reach a certain size.

    You say running the script for long periods results in large log files ... for what value of "long period"? If it's several weeks, normal log module/rotatelog practice of rotating logs daily and keeping the last four or five days, will be quite sufficient and more economical. If "long periods" means more than ten minutes, perhaps you should re-evaluate what is logged, and what is not. Obviously you need different levels of logging when things are runnning well, and when you are tracking down a bug; I suggest using a logging module.

    Update: Have the server log whether each URL was successfully processed or not. Then you can go back and run the script manually with mega- mega- mega-MEGA debugging detail on the problem URL ... while the ordinary instances generate a few MB. You might have problem URLs saved to a special file, or the script could send you email with the details.


      Well, the process is multithreaded. So even though over a day I might generate 100 MB at most at the top logging level, after 40 - 50 forks we're talking about 4GB - 5GB a day. This has compounded the problem because I'm trying to keep the logs sorted and rotated. And, even though I can turn down the detail, the bugs I am finding require a high level of detail for testing.

      (The script is a web spider. Most of the bugs I encounter with it involve bizarre / broken HTML in web pages. Problem is that in order to figure out just what is going on I want to log lots of info if there are any anomalies. The problem becomes how to do that without being too processor intensive)

      Want to support the EFF and FSF by buying cool stuff? Click here.

        And this is why you need production and dev boxes. In the production box, you only need enough logging so as to know when something breaks. In your dev box, you can re-harvest the offending pages and see the errors in all its glory, including the ability to add instrumentation to the code on the fly.

        If you cannot have the two boxes, perhaps you can run a second instance of your spider manually, when needed. I bet this is less resource-intensive that managing such huge logs.

        That being said, take a look at Logfile::Rotate.

        Best regards

        -lem, but some call me fokat

Re: Controlling file size when writing
by borisz (Canon) on Mar 02, 2004 at 18:42 UTC
    No, I use rotatelogs to write my logs and a cron job that gzip all expect the newesest logfile. rotatelogs is part of apache.
Re: Controlling file size when writing
by zentara (Archbishop) on Mar 03, 2004 at 16:28 UTC
    Here is an example of a simple way to limit the log to n number of lines.
    #!/usr/bin/perl use strict; use Fcntl qw(:flock :seek); my $stuff_from_myform = 'something to add'; my $MyFile = "$0.zzz"; open MYFILE, "+< $MyFile" or die "Cannot Open $MyFile: $!"; flock MYFILE, LOCK_EX or die "Unable to acquire exclusive lock for $MyFile: $!"; my @data = ( <MYFILE>, "$stuff_from_myform\n" ); shift @data while @data > 10; #limit to 10 lines seek MYFILE, 0, SEEK_SET; truncate MYFILE, 0; print MYFILE @data; close MYFILE;

    I'm not really a human, but I play one on earth. flash japh

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://333339]
Approved by broquaint
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2022-05-28 02:11 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (98 votes). Check out past polls.