What's eating all your disk space?

http://www.perlmonks.org?node_id=22130

Category:	Utility Scripts
Author/Contact Info	jesse_becker@yahoo.com
Description:	I'm constantly having to clean out space on lots of computers, and looking at several screens of 'du' output hurts. So I wrote this little script to parse and format the output from 'du'. I know, I know, it's not strictly perl, but monks should be aware that there are thing that exist outside these cloistered walls. N.B. Since this is meant to be used in a pipe, it's usually all on a single line, and without comments.
`du -sk . * \| perl -e ' $sum=<>; # Get the otal space used from the first line # This is so we don't run 'du' twice while (<>) { ($size, $inode)=split; $inode .= "/" if (-d $inode); printf("%30s \| %5d \| %5.2f%%\n",$inode,$size,$size/$sum*100); }' \| sort -rn -k 3 \| head`

Comment on What's eating all your disk space? Download Code

Replies are listed 'Best First'.

Re: What's eating all your disk space?
by csh (Novice) on Jan 26, 2004 at 21:10 UTC


use strict;
use IO::File;

my $size;
my $inode;
my $sum = 0;
my @entries;
my $e;
my $percent = 0;
my $remsum = 0;
my $counter = 0;

my $du = new IO::File;

if (@ARGV) {
    chdir "$ARGV[0]" or die "cannot change to [ $ARGV[0] ]\n";
}

$du->open("du -sk *|") or
    die "cannot open du program and pipe";

while (<$du>) {
    ($size, $inode)=split;
    $inode .= "/" if (-d $inode);
    $sum += $size;
    push @entries, { size => $size, inode => $inode };
}

@entries = sort { $b->{size} <=> $a->{size} } @entries;

$du->close;

foreach $e (@entries) {
    $percent = $e->{size}/$sum*100;

    if ($counter < 10) {
        printf(
            "%30s | %5d | %5.2f%%\n",
            $e->{inode},
            $e->{size}, 
            $percent);
    }
    else {
        $remsum += $e->{size};
    }
    $counter++;
}

if ($remsum > 0) {
    printf(
        "%30s | %5d | %5.2f%%\n",
        "REMAINING FILES",
        $remsum,
        $remsum/$sum*100);
}
[download]

RE: What's eating all your disk space?
by ivey (Beadle) on Jul 12, 2000 at 22:09 UTC

-- michael d. ivey, ivey@gweezlebur.com

RE: RE: What's eating all your disk space?

by knight (Friar) on Jul 25, 2000 at 19:22 UTC

    #! /usr/bin/env perl
    open(DU, "du -sk *|") || die "Can't exec du:  $!\n";
    while (<DU>) {
       ($size, $inode)=split;
       chop($size);
       $sum += $size;
       push @entries, { size => $size, inode => $inode };
    }
    close(DU);
    @entries = sort { $b->{size} <=> $a->{size} } @entries;
    foreach $e (@entries[0 .. 10]) {
       printf("%30s | %5d | %2.2f%%\n",$e->{inode},$e->{size},$e->{siz
+e}/$sum*1000);
    }
[download]

RE: RE: RE: What's eating all your disk space?

by jjhorner (Hermit) on Jul 25, 2000 at 19:36 UTC

I believe your script above has issues:

Why did you chop $size? You are only taking off the last digit. If you meant to take off the metric notation (G, M, or k) you could just s/G|M|k//o
STRICT and WARNINGS!
You have have a percentage multiplied by 1000. I believe you meant 100.
If you want the metric notations (G,M, or k), you will have to do some funny math to get them all to the same measurement (kilobytes).

J. J. Horner
Linux, Perl, Apache, Stronghold, Unix
jhorner@knoxlug.org http://www.knoxlug.org/

RE: RE: RE: RE: What's eating all your disk space?

by hawson (Monk) on Jul 27, 2000 at 06:32 UTC

RE: RE: RE: What's eating all your disk space?

by fundflow (Chaplain) on Jul 25, 2000 at 19:45 UTC

$sum*100

.
.
.
  ($size, $inode)=split;
  $inode .= "/" if (-d $inode);
  chop($size);
.
.
.
[download]

[reply]
[d/l]
[select]

RE: RE: RE: What's eating all your disk space?

by hawson (Monk) on Jul 27, 2000 at 06:48 UTC

As to 'df' walking the tree twice, I looked at that as well. In my tests, it looked like the results were cached somewhere, and thus 'df -sk . *' is quite fast. A prior version of the script did something horrible along the lines of: 'du -sk *|perl -e `$sum=du -sk .` while(<>) {....}, so this is an improvement already.

This is pretty quick, and I use it on 40GB raid arrays. :-)

RE: What's eating all your disk space?

by orthanc (Monk) on Jul 25, 2000 at 17:30 UTC

Orthanc

Re: What's eating all your disk space? -- duke!
by gremio (Acolyte) on Jul 14, 2001 at 22:26 UTC

Read more... (5 kB)

Re: Re: What's eating all your disk space? -- duke!

by bikeNomad (Priest) on Jul 14, 2001 at 23:18 UTC

  $size = $blksize*(1+(int($size / $blksize)));
[download]

Re: What's eating all your disk space?
by chibiryuu (Beadle) on Apr 19, 2005 at 03:07 UTC

I once wrote a script much like the one above. du -b "$@" | sort -n is nice, but du -h is nice too, so I had a shell script for du -b "$@" | perl -pe's/ome/complicated/regex' | sort -n for a while.

Recently, I rewrote it in pure Perl.

#!/usr/bin/perl -w
use strict;
use File::Find;
my %conf = (a => 0, c => 0, s => 0, x => 0);
my @dirs = ();
while (defined ($_ = shift)) {
    if ($_ eq "--") {push @dirs, @ARGV; last}
    elsif (/^-(.*)$/s) {
        for (split //, $1) {
            if ($_ eq "a" and !$conf{s}) {$conf{a} = 1}
            elsif ($_ eq "c") {$conf{c} = 1}
            elsif ($_ eq "s" and !$conf{a}) {$conf{s} = 1}
            elsif ($_ eq "x") {$conf{x} = 1}
            else {
                print STDERR "$0 [-a] [-c] [-s] [-x] [--] ...\n";
                exit 1;
            }
        }
    }
    else {push @dirs, $_}
}
s/\/*$//s for @dirs;
@dirs = qw(.) unless @dirs;
my %spec = (no_chdir => 1);
if ($conf{a}) {
    $spec{wanted} = sub {
        stat;
        my $s = -f _ ? -s _ : 0;
        $File::Find::name =~ /^\Q$dirs[0]\E\/?(.*)$/s;
        my @a = split /\//, $1;
        for (unshift @a, $dirs[0]; @a; pop @a) {
            $_{join "/", @a} += $s;
        }
    };
}
elsif ($conf{s}) {
    $spec{wanted} = sub {
        stat;
        $_{$dirs[0]} += -f _ ? -s _ : 0;
    };
}
else {
    $spec{wanted} = sub {
        stat;
        my $s = -f _ ? -s _ : 0;
        $File::Find::name =~ /^\Q$dirs[0]\E\/?(.*)$/s;
        my @a = split /\//, $1;
        ! -d _ and pop @a;
        for (unshift @a, $dirs[0]; @a; pop @a) {
            $_{join "/", @a} += $s;
        }
    };
}
if ($conf{x}) {
    $spec{preprocess} = sub {
        my $dev = (lstat $File::Find::dir)[0];
        grep {$dev == (lstat "$File::Find::dir/$_")[0]} @_;
    };
}
while (@dirs) {
    find(\%spec, $dirs[0] eq "" ? "/" : $dirs[0]);
    $_{""} += $_{$dirs[0]} if $conf{c};
    shift @dirs;
}
$_{$_} < 1024 ** 1 ? printf "%s �%-6.6sB� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 0), $_ :
$_{$_} < 1024 ** 2 ? printf "%s �%-6.6sK� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 1), $_ :
$_{$_} < 1024 ** 3 ? printf "%s �%-6.6sM� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 2), $_ :
$_{$_} < 1024 ** 4 ? printf "%s �%-6.6sG� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 3), $_ :
$_{$_} < 1024 ** 5 ? printf "%s �%-6.6sT� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 4), $_ :
$_{$_} < 1024 ** 6 ? printf "%s �%-6.6sP� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 5), $_ :
$_{$_} < 1024 ** 7 ? printf "%s �%-6.6sE� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 6), $_ :
$_{$_} < 1024 ** 8 ? printf "%s �%-6.6sZ� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 7), $_ :
                     printf "%s �%-6.6sY� %s\n", $_{$_}, sprintf("%6.6
+f", "$_{$_}" / 1024 ** 8), $_
    for sort {$_{$a} <=> $_{$b} or $a eq "" ? 1 : $a cmp $b} keys %_;
[download]

Not so good:
- I don't implement qw(-D --exclude-from -l -L --max-depth -S -X) like the real du does.
- I probably shouldn't do my own argument parsing, and I hope there's a better way to do the printing at the end.

That being said,

it seems to work well, and

I like the output format.

0 �0.0000B� /usr/src/linux
298763905 �284.92M� /usr/src/linux-2.6.11.6
306941731 �292.72M� /usr/src/linux-2.6.11-morph6
306986302 �292.76M� /usr/src/linux-2.6.11-morph5
912691938 �870.41M�

You like?

Re: What's eating all your disk space? (Treemaps)
by jimX11 (Friar) on Aug 19, 2005 at 01:02 UTC

Treemaps!! First heard about them at OSCON 05 when Tim O'Reilly mentioned them in a keynote.

The paragraph below is from Treemaps for space-constrained visualization of hierarchies

During 1990, in response to the common problem of a filled hard disk, I became obsessed with the idea of producing a compact visualization of directory tree structures. Since the 80 Megabyte hard disk in the HCIL was shared by 14 users it was difficult to determine how and where space was used. Finding large files that could be deleted, or even determining which users consumed the largest shares of disk space were difficult tasks.

Just today I found treemap on the CPAN.

Wonder how the Perl sourcecode could be treemapped? The linux kernel is.

Re: What's eating all your disk space?
by chanio (Priest) on Aug 18, 2005 at 21:49 UTC

Perhaps, it could help others if I show the way I used it. (based on the main node)

##[ my_df.sh ]##
chdir ~
du -sk . * .* | perl -e ' ## ADDED .files
$sum=<>; 
while (<>)
{
    ($size, $inode) =split;       
    $inode .= "/" if (-d $inode);  ## /_SHOW SIZE IN Kb + GRAPHIC: _\
    printf("%25s | %4d Kb |%6.2f%% [%+11s]\n",$inode,int($size/1024),$
+size/$sum*100,"<".("=" x (int($size/$sum*10)))) unless ($inode=~/\.\.
+/);
## EVERY LINE LOOKS LIKE THIS: 
##        Documents/ |  710 Kb | 52.20% [     <=====]
}'| sort -rn -k 3 | head | xmessage -center -file -
[download]

{

\

(

'

v

'

)

/

}

(

\

_

/

)

_

_

_

_

`

(

)

'

_

_

_

_

(

=

(

^

Y

^

)

=

(

_

_

^

^

^

^

_

_

_

_

\

_

(

m

_

_

_

m

)

_

_

_

_

_

_

_

_

_

)

c

h

i

a

n

o

,

a

l

b

e

r

t

o

Wherever I lay my KNOPPIX disk, a new FREE LINUX nation could be established

Back to Code Catacombs