Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

What's eating all your disk space?

by hawson (Monk)
on Jul 12, 2000 at 03:25 UTC ( #22130=sourcecode: print w/ replies, xml ) Need Help??

Category: Utility Scripts
Author/Contact Info jesse_becker@yahoo.com
Description: I'm constantly having to clean out space on lots of computers, and looking at several screens of 'du' output hurts. So I wrote this little script to parse and format the output from 'du'. I know, I know, it's not strictly perl, but monks should be aware that there are thing that exist outside these cloistered walls. N.B. Since this is meant to be used in a pipe, it's usually all on a single line, and without comments.
du -sk . * | perl -e '
  $sum=<>;     # Get the otal space used from the first line
               # This is so we don't run 'du' twice
  while (<>) {
    ($size, $inode)=split;
    $inode .= "/" if (-d $inode);
    printf("%30s | %5d | %5.2f%%\n",$inode,$size,$size/$sum*100);
  }'
| sort -rn -k 3 | head

Comment on What's eating all your disk space?
Download Code
RE: What's eating all your disk space?
by ivey (Beadle) on Jul 12, 2000 at 22:09 UTC
      A very handy little script, i will use it a lot

      Orthanc

      I like this script a lot; very handy. It was taking too long on some of my larger directory trees, though, so I took the liberty of speeding it up. The following does the sorting in Perl, and also calculates the sum internally to eliminate the '.' from the du call. This saves du from having to walk the directory tree twice (once for '.' and once for the individual '*' arguments) and sped things up a lot for me.
      #! /usr/bin/env perl open(DU, "du -sk *|") || die "Can't exec du: $!\n"; while (<DU>) { ($size, $inode)=split; chop($size); $sum += $size; push @entries, { size => $size, inode => $inode }; } close(DU); @entries = sort { $b->{size} <=> $a->{size} } @entries; foreach $e (@entries[0 .. 10]) { printf("%30s | %5d | %2.2f%%\n",$e->{inode},$e->{size},$e->{siz +e}/$sum*1000); }
      Thanks for a cool script!

        I believe your script above has issues:

        • Why did you chop $size? You are only taking off the last digit. If you meant to take off the metric notation (G, M, or k) you could just s/G|M|k//o
        • STRICT and WARNINGS!
        • You have have a percentage multiplied by 1000. I believe you meant 100.
        • If you want the metric notations (G,M, or k), you will have to do some funny math to get them all to the same measurement (kilobytes).
        J. J. Horner
        Linux, Perl, Apache, Stronghold, Unix
        jhorner@knoxlug.org http://www.knoxlug.org/
        
        I think that you meant $sum*100 in the end...
        :)

        Here is a small improvement. It adds a '/' to directory names,
        . . . ($size, $inode)=split; $inode .= "/" if (-d $inode); chop($size); . . .
        also, the printf should be changed. Instead of "%2.2f" you probably meant "%5.2f". The first digit is the total number of digits, including the period.
        I thought about building the sorting but decided against it ("One tool does one thing"). I figure that I'll leave sorting to 'sort'. ;-)

        As to 'df' walking the tree twice, I looked at that as well. In my tests, it looked like the results were cached somewhere, and thus 'df -sk . *' is quite fast. A prior version of the script did something horrible along the lines of: 'du -sk *|perl -e `$sum=du -sk .` while(<>) {....}, so this is an improvement already.

        This is pretty quick, and I use it on 40GB raid arrays. :-)

Re: What's eating all your disk space? -- duke!
by gremio (Acolyte) on Jul 14, 2001 at 22:26 UTC
    Hi Hawson,

    I have to do the same kind of task routinely, and try to farm as much of it off to the users themselves as possible, and wrote this to help both of us out.

    It does close to the same thing as du (though any help on figuring out how du actually comes up with its numbers would be appreciated!), and is pretty handy for not having to dig through directory trees doing du's over and over again in subdirectories. Though it's noticeably slower than du on large directories, I find it's actually faster because I have to do the du only once, even for several levels of nested dirst.

    It also displays age, which can be very useful to determine what needs killing, and I find novices have little problem understanding the output. YMMV, but I hope you like it.

    I call it "duke" --Gremio

      If your $size is an integer multiple of your $blksize, you'll overstate $size by $blksize in

      $size = $blksize*(1+(int($size / $blksize)));
Re: What's eating all your disk space?
by csh (Novice) on Jan 26, 2004 at 21:10 UTC
    A quick rewrite:
    use strict; use IO::File; my $size; my $inode; my $sum = 0; my @entries; my $e; my $percent = 0; my $remsum = 0; my $counter = 0; my $du = new IO::File; if (@ARGV) { chdir "$ARGV[0]" or die "cannot change to [ $ARGV[0] ]\n"; } $du->open("du -sk *|") or die "cannot open du program and pipe"; while (<$du>) { ($size, $inode)=split; $inode .= "/" if (-d $inode); $sum += $size; push @entries, { size => $size, inode => $inode }; } @entries = sort { $b->{size} <=> $a->{size} } @entries; $du->close; foreach $e (@entries) { $percent = $e->{size}/$sum*100; if ($counter < 10) { printf( "%30s | %5d | %5.2f%%\n", $e->{inode}, $e->{size}, $percent); } else { $remsum += $e->{size}; } $counter++; } if ($remsum > 0) { printf( "%30s | %5d | %5.2f%%\n", "REMAINING FILES", $remsum, $remsum/$sum*100); }
    Edits: * moved sort out of loop
Re: What's eating all your disk space?
by chibiryuu (Beadle) on Apr 19, 2005 at 03:07 UTC

    I once wrote a script much like the one above.  du -b "$@" | sort -n is nice, but du -h is nice too, so I had a shell script for du -b "$@" | perl -pe's/ome/complicated/regex' | sort -n for a while.

    Recently, I rewrote it in pure Perl.

    #!/usr/bin/perl -w use strict; use File::Find; my %conf = (a => 0, c => 0, s => 0, x => 0); my @dirs = (); while (defined ($_ = shift)) { if ($_ eq "--") {push @dirs, @ARGV; last} elsif (/^-(.*)$/s) { for (split //, $1) { if ($_ eq "a" and !$conf{s}) {$conf{a} = 1} elsif ($_ eq "c") {$conf{c} = 1} elsif ($_ eq "s" and !$conf{a}) {$conf{s} = 1} elsif ($_ eq "x") {$conf{x} = 1} else { print STDERR "$0 [-a] [-c] [-s] [-x] [--] ...\n"; exit 1; } } } else {push @dirs, $_} } s/\/*$//s for @dirs; @dirs = qw(.) unless @dirs; my %spec = (no_chdir => 1); if ($conf{a}) { $spec{wanted} = sub { stat; my $s = -f _ ? -s _ : 0; $File::Find::name =~ /^\Q$dirs[0]\E\/?(.*)$/s; my @a = split /\//, $1; for (unshift @a, $dirs[0]; @a; pop @a) { $_{join "/", @a} += $s; } }; } elsif ($conf{s}) { $spec{wanted} = sub { stat; $_{$dirs[0]} += -f _ ? -s _ : 0; }; } else { $spec{wanted} = sub { stat; my $s = -f _ ? -s _ : 0; $File::Find::name =~ /^\Q$dirs[0]\E\/?(.*)$/s; my @a = split /\//, $1; ! -d _ and pop @a; for (unshift @a, $dirs[0]; @a; pop @a) { $_{join "/", @a} += $s; } }; } if ($conf{x}) { $spec{preprocess} = sub { my $dev = (lstat $File::Find::dir)[0]; grep {$dev == (lstat "$File::Find::dir/$_")[0]} @_; }; } while (@dirs) { find(\%spec, $dirs[0] eq "" ? "/" : $dirs[0]); $_{""} += $_{$dirs[0]} if $conf{c}; shift @dirs; } $_{$_} < 1024 ** 1 ? printf "%s %-6.6sB %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 0), $_ : $_{$_} < 1024 ** 2 ? printf "%s %-6.6sK %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 1), $_ : $_{$_} < 1024 ** 3 ? printf "%s %-6.6sM %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 2), $_ : $_{$_} < 1024 ** 4 ? printf "%s %-6.6sG %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 3), $_ : $_{$_} < 1024 ** 5 ? printf "%s %-6.6sT %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 4), $_ : $_{$_} < 1024 ** 6 ? printf "%s %-6.6sP %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 5), $_ : $_{$_} < 1024 ** 7 ? printf "%s %-6.6sE %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 6), $_ : $_{$_} < 1024 ** 8 ? printf "%s %-6.6sZ %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 7), $_ : printf "%s %-6.6sY %s\n", $_{$_}, sprintf("%6.6 +f", "$_{$_}" / 1024 ** 8), $_ for sort {$_{$a} <=> $_{$b} or $a eq "" ? 1 : $a cmp $b} keys %_;
    • Not so good:
      • I don't implement qw(-D --exclude-from -l -L --max-depth -S -X) like the real du does.
      • I probably shouldn't do my own argument parsing, and I hope there's a better way to do the printing at the end.
    • That being said,
      • it seems to work well, and
      • I like the output format.
        0 0.0000B /usr/src/linux
        298763905 284.92M /usr/src/linux-2.6.11.6
        306941731 292.72M /usr/src/linux-2.6.11-morph6
        306986302 292.76M /usr/src/linux-2.6.11-morph5
        912691938 870.41M 

    You like?

Re: What's eating all your disk space?
by chanio (Priest) on Aug 18, 2005 at 21:49 UTC
    I want to thank you all for these valuable pieces of code. All the page is very useful.

    Perhaps, it could help others if I show the way I used it. (based on the main node)

    ##[ my_df.sh ]## chdir ~ du -sk . * .* | perl -e ' ## ADDED .files $sum=<>; while (<>) { ($size, $inode) =split; $inode .= "/" if (-d $inode); ## /_SHOW SIZE IN Kb + GRAPHIC: _\ printf("%25s | %4d Kb |%6.2f%% [%+11s]\n",$inode,int($size/1024),$ +size/$sum*100,"<".("=" x (int($size/$sum*10)))) unless ($inode=~/\.\. +/); ## EVERY LINE LOOKS LIKE THIS: ## Documents/ | 710 Kb | 52.20% [ <=====] }'| sort -rn -k 3 | head | xmessage -center -file -
    The output would show in an xmessage screen (come on, burn me). But it could work well with a TK one, as well as with the full perl way of doing it all, who doubts that :) .

    { \ ( ' v ' ) / }
    ( \ _ / ) _ _ _ _ ` ( ) ' _ _ _ _
    ( = ( ^ Y ^ ) = ( _ _ ^ ^ ^ ^
    _ _ _ _ \ _ ( m _ _ _ m ) _ _ _ _ _ _ _ _ _ ) c h i a n o , a l b e r t o
    Wherever I lay my KNOPPIX disk, a new FREE LINUX nation could be established
Re: What's eating all your disk space? (Treemaps)
by jimX11 (Friar) on Aug 19, 2005 at 01:02 UTC

    Treemaps!! First heard about them at OSCON 05 when Tim O'Reilly mentioned them in a keynote.

    The paragraph below is from Treemaps for space-constrained visualization of hierarchies

    During 1990, in response to the common problem of a filled hard disk, I became obsessed with the idea of producing a compact visualization of directory tree structures. Since the 80 Megabyte hard disk in the HCIL was shared by 14 users it was difficult to determine how and where space was used. Finding large files that could be deleted, or even determining which users consumed the largest shares of disk space were difficult tasks.

    Just today I found treemap on the CPAN.

    Wonder how the Perl sourcecode could be treemapped? The linux kernel is.

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://22130]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2014-08-30 15:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls