http://www.perlmonks.org?node_id=151232

I was never able to find a decent piece of code that was able to parse out the extension of a given file delimited by a '.' regardless of how many '.'s were in the filename. So, I wrote one.

HOW TO USE:

&parse_out_extension($filename);

Comments appreciated on code.

TIA, guys. SnafuX

Updated:
Changed code by removing all DEBUG stuff.
Changed comments in description by removing all references to DEBUG stuff and also removed future plans.
FIXED: where the sub should actually die().

sub parse_out_extension { die("No filename to parse.\n") if ( ! @_ ); my ($file) = @_; my @pieces; map { push(@pieces,$_) } split(/\./,$file); my $end = pop(@pieces); $file =~ s/\.$end//; return($file) if $file; }

Replies are listed 'Best First'.
Re: Parse out the extension of a filename - return base of filename.
by rob_au (Abbot) on Mar 12, 2002 at 23:45 UTC
    I was never able to find a decent piece of code that was able to parse out the extension of a given file delimited by a '.' regardless of how many '.'s were in the filename

    While I do admire your industrial spirit, you can already do this with File::Basename ...

    use File::Basename; my $fname = "/usr/local/isp/system.update.perl"; my ($name, $path, $suffix) = fileparse($fname, '\.[^\.]*'); print STDOUT $name, "\n";

    The key to this working is the regular expression passed to the fileparse method - The regular expression in the example above matches the suffix of a file as being the last dot and what follows it, excluding any subsequent dots ([^\.]).

     

    perl -e 's&&rob@cowsnet.com.au&&&split/[@.]/&&s&.com.&_&&&print'

      The key to this working is the regular expression passed to the fileparse method

      Too bad File::Basename doesn't set a reasonable default for the fileparse() RE based on the OS like it does for the path separator. If you understand REs well enough to supply a correct one to fileparse(), you most certainly don't need File::Basename to get the file extension!

      Yes. I was aware of the availability of function from File::Basename. However, in order to use it you needed to have prior knowledge of the filename which was being passed to the module ie you needed to know how many periods are in the filename. This wouldn't work very well if you were going through a whole list of files that were not structured the same. That is one reason I wrote this lil sub because it doesn't care how many period '.' delimiters are in the filename. It will always return the full filename minus the last part of the filename which is what I have always considered to be the extension.

      _ _ _ _ _ _ _ _ _ _
      - Jim
      Insert clever comment here...

        Umm.. try running the example he gave (or reading the node more closely). It does match the last part of the filename following the final dot. So if you run:

        perl -MFile::Basename -e 'print join ":", fileparse("foo.bar.baz", "\\ +.[^.]+"), "\n"'

        It produces foo.bar as the filename, ./ as the directory and baz as the extension. A few more filenames and results follow:

        Input filenameBase nameExtension
        foo.bar.bazfoo.barbaz
        foo.bar foo bar
        foo foo none

        -ben

Re: Parse out the extension of a filename - return base of filename. (boo)
by boo_radley (Parson) on Mar 12, 2002 at 22:04 UTC
    at first :
    my $file = "@_" || die(sub_usage($funcName));
    and then, later , in sub_usage :
    die("\n\tThis function requires that you call ".
    where do you really want to die? :-)

    Aside from this, I've always interpreted "base filename" and "extention" to be 2 particular and discrete things, so if you do have a file named foo.bar.baz.quux, you've got a base filename of "foo.bar.baz" and an extention of "quux"; do you have a particular need or use to split on every dot, or were you being (lowercase) lazy?

    in any case, it really seems like you're doing too much work here. Additionally, I would not use any module that required me to declare a DEBUG variable (or constant, in this case) in my own package, but perhaps this could be addressed if you make your File::Basename::foo package. File::Basename::foo::DEBUG would be ok with me.

    here's a AWTDI, using sexegers, for comparison.

    use Carp; ($f, $e) = &sex_ext ($ARGV[0]); print "fname : $f ; ext : $e\n"; sub sex_ext { local $_ = reverse $_[0] or carp "no filename supplied to sex_ext!\n"; #no period or ext found, return original arg return $_[0] unless /(.*?)\.(.+)/; my $fname = reverse$2; my $ext= reverse $1; return ($fname, $ext); }
    Update : You could also reverse and split /./,$_,2 rather than use the regex in the unless line... so many ways... :-)
      So far, your comments have been the most helpful for me understand:

      where do you really want to die? :-)

      Good catch! I should have caught this. I will rectify this asap.

      Aside from this, I've always interpreted "base filename" and "extention" to be 2 particular and discrete things, so if you do have a file named foo.bar.baz.quux, you've got a base filename of "foo.bar.baz" and an extention of "quux";

      My desire was to actually remove the extension from the filename base. I agree with you on the two distinct entities. I need to go back through my comments because I am not sure why you are making this statement. I must have said something somewhere that wasn't too clear :)

      do you have a particular need or use to split on every dot, or were you being (lowercase) lazy?

      Well, I was only solving this the way I thought it in my mind would be best to solve it. I wasn't trying to win a Nobel Peace Prize or anything =P. Perhaps I should have placed this snippet of code some place else on PM?

      in any case, it really seems like you're doing too much work here

      I admit, I am doing more work than you are doing :). Nice example. It never occurred to me to use reverse(). :(

      Additionally, I would not use any module that required me to declare a DEBUG variable (or constant, in this case) in my own package, but perhaps this could be addressed if you make your File::Basename::foo package. File::Basename::foo::DEBUG would be ok with me.

      If I did make a package out of this I wouldn't include the DEBUG stuff. At least not as a requirement. This is a snippet of code I was using in a script of mine that I actually though was a good piece of code (heh, I guess I was wrong). This snippet was written to work in my already existing script, therefore, there was superfluous stuff in it already that I did not take out. I was merely suggesting to anyone who wanted to use the snippet would either need to edit it slightly or use it as-is and simply create a global variable called DEBUG.

      If I wrote this as a module, I would get rid of the DEBUG necessity altogether and simply do as you suggested with the File::Basename::foo::DEBUG idea.

      _ _ _ _ _ _ _ _ _ _
      - Jim
      Insert clever comment here...

Re: Parse out the extension of a filename - return base of filename.
by Juerd (Abbot) on Mar 12, 2002 at 21:41 UTC

    sub without_ext { my ($file) = @_; return substr($file, 0, rindex($file, '.')); } sub ext_only { my ($file) = @_; return substr($file, rindex($file, '.') + 1); }
    And even that is a lousy solution.

    U28geW91IGNhbiBhbGwgcm90MTMgY
    W5kIHBhY2soKS4gQnV0IGRvIHlvdS
    ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
    geW91IHNlZSBpdD8gIC0tIEp1ZXJk
    

      According to everything I have read on PM, the use of rindex() and substr() is a poor way to do this task. Therefore, I wrote a solution that didn't use them. Obviously, using a pure regex solution would work, but that is cludgy and even ill-advised here on PM.
      It would seem that many have felt I have done something wrong here?

      Its just a code snippet....one that works, no less!

      _ _ _ _ _ _ _ _ _ _
      - Jim
      Insert clever comment here...

        According to everything I have read on PM, the use of rindex() and substr() is a poor way to do this task.

        What the...? Can anyone second that (with examples and vivid explanation please)?
        I kind of refuse to believe this job should not be done with substr and rindex.

        Its just a code snippet....one that works, no less!

        One that prints data to the screen, even when not debugging. Not quite useable in most circumstances.
        Besides, it doesn't work with all valid filenames:

        parse_out_extension 'foo.b(ar'; parse_out_extension 'foo.**';

        No, using arrays and several iterations, printing useless text and not escaping is not a better solution than a pure regex one.
        Substr+rindex is the best solution for this, followed by a substitution, but "solutions" like yours are, imho, out of the question.

        No offense meant.

        U28geW91IGNhbiBhbGwgcm90MTMgY
        W5kIHBhY2soKS4gQnV0IGRvIHlvdS
        ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
        geW91IHNlZSBpdD8gIC0tIEp1ZXJk
        

        It would seem that many have felt I have done something wrong here?

        Not necessarily wrong, just another way to do it ... :-)

        There are a couple of points that I would highlight with your code:

        • The use of the perlfunc:map command in a void context is generally considered bad form as the strength of this function lies in the list context that it returns. The use of this function in a void context generally means that you are using the wrong function for the task - For example, where you have:

          map { push(@pieces,$_) } split(/\./,$file);

          It would be much better to perform this with a simple for or foreach loop. eg.

          push( @pieces, $_ ) foreach split( /\./, $file );

          This is also referenced in the node - What's wrong with using grep or map in a void context?.

        • At the point in the function where you remove the file extension and return the remaining portion of the file name, there are a couple of ways by which this could be done possibly more efficiently (not having performed code benchmarks at this point), the most immediate being the following:

          pop @pieces; return join '.', @pieces;

          In this instance, there is no need for the regular expression against the original file name or test for definition of file name as join will return an empty string if @pieces is empty.

        One aspect about your code however that did seem redundant given that the functionality which you are seeking is available within File::Basename (see my node here for a code example that gives you exactly what you desire from the fileparse method).

         

        perl -e 's&&rob@cowsnet.com.au&&&split/[@.]/&&s&.com.&_&&&print'

Re: Parse out the extension of a filename - return base of filename.
by ariels (Curate) on Apr 17, 2002 at 14:54 UTC

    Why not regexp it?

    $file =~ s/\.[^.]*$//;

    (That said, you probably should use File::Basename: it's more likely to DTRT on platforms with different conventions for extensions...).