Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Regex To Remove File Extension

by Anonymous Monk
on Dec 10, 2008 at 18:56 UTC ( #729477=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a filename foo.bar.txt and I want to remove the file extension so that only foo.bar remains. Note, the file extension might not always be .txt.

I've tried s/\..*/ but that removes everything after the first period and I'm left with only foo for my filename.

Thank you. Any help you could provide would be greatly appreciated.

Comment on Regex To Remove File Extension
Select or Download Code
Re: Regex To Remove File Extension
by dreadpiratepeter (Priest) on Dec 10, 2008 at 19:01 UTC
    that is because you told it to take everything after a dot using a "greedy" match. You want a "non-greedy", using the ? modifier. try: /\..*?$/, alternately you could use /\.[^.]+$/ to remove a dot followed by one or more non-dot characters at the end of the string


    -pete
    "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
      /\..*?$/ won't work because it still matches off the first period - I thought of that one at first, too. The second expression(/\.[^.]+$/) works fine.
        You are of course correct. I should pay more attention when responding to posts. In my defense, I was hard at work on some code and was distracted.


        -pete
        "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
Re: Regex To Remove File Extension
by kennethk (Monsignor) on Dec 10, 2008 at 19:03 UTC
    You could try s/(.*)\..*/$1/, which will greedily match up to the last '.' and then keep that part. For common regexes, you should probably also check out Regexp::Common.
Re: Regex To Remove File Extension
by jethro (Monsignor) on Dec 10, 2008 at 19:07 UTC

    There are a lot of ways to skin this cat:

    s/\..*+$//; s/\-[^\.]*$//;

    Both of these use $ at the end to anchor the regex to the end of the string. The first uses .*+, the non-greedy version of .*, the second uses the character class of all chars except '.' to only get the last suffix

    Another possibility is to use the perl module File::Basename and this is probably the best way, because you don't need to worry about getting it right, someone else did that already

    UPDATE: kennethk is right, the first version doesn't work. Obviously the regex engine never matches from right to left even when anchored to the right

    UPDATE2: Seems to be not my day. 3 errors in two lines is quite depressing

      Both of these fail. .*? is the non-greedy version, not .*+, so s/\..*+$// fails on compile, and still doesn't work right if debugged because it's matching off of the first period. Your second expression has a typo (- in place of .) so it should read s/\.[^\.]*$//, as per dreadpiratepeter's post.

      File::Basename says

      $basename = basename($fullname,@suffixlist);

      If @suffixes are given each element is a pattern (either a string or a qr//) matched against the end of the $filename. The matching portion is removed and becomes the $suffix.

      So File::Basename doesn't solve the original problem, it requires the solution before it can be used.

        Of course if you have a manageable list of extensions, you could populate the suffix list and use File::Basename very easily. That is, the original post says "extension might not always be txt," but that doesn't indicate the scope of potential extensions. Maybe it's just txt, html, htm, pl and cgi (just a random group of extensions chosen). In which case I'd lean towards the File::Basename solution rather than creating a regex unique to this script.

        Or maybe the AnonyMonk means to be able to remove any extension, in which case File::Basename isn't the best solution. Obviously the AnonyMonk will need to choose the best approach, but I wouldn't discount File::Basename for a limited number of extensions.

Re: Regex To Remove File Extension (split)
by toolic (Chancellor) on Dec 10, 2008 at 19:21 UTC
    I know your title asks for a regex, but maybe split will do:
    use strict; use warnings; while (<DATA>) { chomp; my @parts = split /\./; pop @parts; my $file_no_ext = join '.', @parts; print "file_no_ext = $file_no_ext\n"; } __DATA__ foo.bar.txt goo.doc boo.hoo.moo

    prints:

    file_no_ext = foo.bar file_no_ext = goo file_no_ext = boo.hoo
Re: Regex To Remove File Extension (rindex)
by kyle (Abbot) on Dec 10, 2008 at 19:44 UTC

    Just for the TIMTOWTDI, here's one using substr and rindex:

    my $n = 'foo.bar.txt'; $n = substr $n, 0, rindex( $n, q{.} );

    In my own code, I'd write s{ \. [^.]+ \z }{}xms

      ...or, using substr() as an lvalue:

      substr($n,rindex $n,'.') = '';
Re: Regex To Remove File Extension
by n3toy (Friar) on Dec 10, 2008 at 20:04 UTC
    You have plenty of good examples to chose from. This might work for you also:
    my $string = 'foo.bar.txt'; $string =~ s/\.\w{3}$//;
    Jamie
      You're assuming too much, your regex will fail on:
      index.html foo.pl CGI.pm video.mpeg foo.pl~
      and you'll get a bad result with *nix dotfiles
      .foo .bar

      Focus on the requirements - 1) A file must contain an extension 2) the extension is everything following the final dot

      my @names = qw/ index.html foo.pl CGI.pm video.mpeg foo.pl~ .bash_hist +ory .bash_rc /; foreach my $string ( @names ) { print "$string -> "; $string =~ s/(.+)\.[^.]+$/$1/; print "$string\n"; }
      grep
      One dead unjugged rabbit fish later...
        Yours is a much better solution than my simple one.

        Although my solution works for the example given, it does assume that all filenames are of the format provided in the original post and does not take into consideration the examples you provided.

        Thanks for the feedback and great sample code!
Re: Regex To Remove File Extension
by johngg (Abbot) on Dec 11, 2008 at 00:14 UTC

    I have been using this regular expression to parse *nix paths as part of a module.

    my $rxParsePath = qr {(?x) # Use extended regular expression syntax to # allow comments and white space ^ # Anchor pattern to beginning of string (?=.) # Zero-width look ahead assertion to ensure # that there must be at least one character # for the match to succeed (.*/)? # A memory grouping (1st) for path, greedy # match of any characters up to and including # the rightmost slash (the path part) with a # quantifier of '?' (0 or 1), i.e. there # may or may not be a directory part ( # Open memory grouping (2nd) for file name (.*?) # A memory grouping (3rd) for file name stub # of a non-greedy match of any character # without a quantifier since, if there is a # file name part, at least some of it will # form a stub otherwise it would be a dot-file ( # A memory grouping (4th) for file name # extension (?<=[^/]) # zero width look behind assertion such # that following pattern will only succeed # if preceded by any caracter other than # a slash '/' \.[^.]+ # a literal dot '.' followed by one or more # non-dots )? # Close memory grouping (4th) with a quantifier # of '?' (0 or 1), i.e. there may or may not # be a file name extension part )? # Close memory grouping (2nd) with a quantifier # of '?' (0 or 1), i.e. there may or may not # be a file name part $ # Anchor pattern to end of string };

    Here is a short script using it, without extended syntax for brevity, to pull out the elements of a *nix path.

    use strict; use warnings; my $rxParsePath = qr{^(?=.)(.*/)?((.*?)((?<=[^/])\.[^.]+)?)?$}; print <<EOT; Path Directory File Name File Stub Extension EOT print map { sprintf qq{%20s%15s%15s%10s%10s\n}, @$_ } map { [ $_, map { defined $_ ? $_ : q{} } m{$rxParsePath} ] } qw { /etc/motd /var/adm/messages.1 .alias a.html ab.cd.txt /bin };

    The results

    Path Directory File Name File Stub Extension /etc/motd /etc/ motd motd /var/adm/messages.1 /var/adm/ messages.1 messages .1 .alias .alias .alias a.html a.html a .html ab.cd.txt ab.cd.txt ab.cd .txt /bin / bin bin

    I hope this will be useful.

    Cheers,

    JohnGG

    Update: Corrected superfluous text from C&P error.

Reaped: Re: Regex To Remove File Extension
by NodeReaper (Curate) on Dec 05, 2010 at 07:48 UTC

      What about doing this:

      $filename =~ /(.*)\./; my $basename = $1;

      Since the regex above is greedy, it should capture everything up to the last period. It seems to work for me. Am I missing something?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://729477]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2014-10-26 02:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (149 votes), past polls