Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
If all the files of interest are in one directory, and they all match a simple file-name pattern, and you just want to print the name of the largest file, here's one easy way to do that in Perl:
#!/usr/bin/perl use strict; use warnings; my $path = "/tmp/dir1"; my %filesize = map { $_ => -s } <$path/d1*>; my @sorted = sort { $filesize{$b} <=> $filesize{$a} } keys %files; printf "Largest file: %s (%d bytes)\n", $sorted[0], $filesize{$sorted[ +0]};
Of course, I expect it would be better to have the path and file-name pattern of interest be a command line argument (or a sensible default, like all files in the current working directory), because hard-coding this in the script is bothersome. So I'd rather do it like this:
#!/usr/bin/perl use strict; use warnings; my $glob_pattern = shift || './*'; my %files; for ( glob( $glob_pattern )) { $files{$_} = -s _ if ( -e ); } if ( scalar keys %files == 0 ) { warn "No files matched $glob_pattern\nUsage: $0 [path/name*]\n"; exit(1); } my @sorted = sort { $files{$b} <=> $files{$a} } keys %files; printf( "Largest file that matches %s is %s (%d bytes)\n", $glob_pattern, $sorted[0], $files{$sorted[0]} );
Now, in that case, when I run the script, I have to put the command-line argument in quotes, because otherwise, the shell will do the glob expansion, and my script will only see the first file name that matches the glob. In other words, if the script is called "show-biggest", the command line would have to be:
show-biggest '/tmp/dir1/d1*' # note the single quotes # or: show-biggest /tmp/dir1/d1\* # note the backslash escape for "*"
BTW, in trying this out, I learned that there is a subtle difference between this:
my @files = <something>;
and this:
my $glob = "something"; my @files = glob( $glob );
In the first approach, if "something" doesn't match anything, @files will be empty, but in the second approach, it will have one element, which is the string that was passed to the glob() function. The difference goes away if the value of $glob contains any wild-card characters (* or ? or square brackets) - I haven't checked, but I'll bet this is documented behavior... That's why I added a test for file existence (-e) in my second version of the script above.

I also learned that glob( $glob_pattern ) does the right thing, where <$glob_pattern> doesn't. (Perl treats the latter as an unopened file handle.)


In reply to Re: find biggest file and use awk by graff
in thread find biggest file and use awk by mxtime

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 13:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found