Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Memory Management Problem

by swngnmonk (Pilgrim)
on Nov 21, 2003 at 07:02 UTC ( #308819=note: print w/replies, xml ) Need Help??

in reply to Memory Management Problem


Can you provide a little more information about the contents of the Bench file, and what print_report() is doing?

From what I gather, the bench file is simply a list of absolute file paths on the filesystem (since you're using a find call to populate %today). What exactly are you trying to track?

Another question - have you confirmed your find command on your machine? On my box (redhat 9), that call to find (assuming $search_files is a scaler for a text match of some kind) would return every file on the filesystem. Are you sure you're getting the correct results?

Now that I think about it, I've got an idea on a general approach, assuming you've got access to the standard Unix utils - use sort, uniq, and diff, and parse the output of the diff. e.g.

`cat benchmark_files|sort|uniq -c > benchmark_counted`; `find / $search_files -print |sort | uniq -c > todays_find`; open IN, "diff benchmark_counted todays_find|" or die "$!"; while (<IN>) { ## parse diff output into %yesterday and %today ## an exercise for the reader } close IN;

By using the unix tools, you've now got the same output as you had after the call to _scan_system(). Note - diff will flag identical lines with different counts (that's what the -c option to uniq does) - you'd have to account for that when parsing the diff output.

This assumes, of course, that the real memory hog is %yesterday, before a pile of keys are deleted in building %today. If I'm wrong, and at the end of processing %yesterday and %today are both too big to handle by print_report(), you may well need to look at some kind of BerkeleyDB-type solution, but realize it's going to slow things down by a lot.

I hope this helps - sort/diff/uniq can be a great way to reduce the load on perl when processing large files.

Replies are listed 'Best First'.
Re: Re: Memory Management Problem
by thospel (Hermit) on Nov 21, 2003 at 19:51 UTC

      Doh. Point made. :)

      let that line read:

      `sort benchmark_files | uniq -c > benchmark_counted`;

      And all can be right in the world.

        Should I tell him that sort has a -u option (though it doesn't add counts) ? Naaah, find doesn't generate duplicates anyways unless some target names are reachable from multiple starting directories.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://308819]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2018-04-21 00:49 GMT
Find Nodes?
    Voting Booth?