Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Extract string from rear of string

by Anonymous Monk
on Dec 28, 2001 at 02:26 UTC ( #134747=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've got $filename, I wanna find out the extension.

How do I regex my $filename as follows:

1. Begin search from the end of the string.
2. Search upto the n'th ocurrence of 'a_string'

Comment on Extract string from rear of string
Re: Extract string from rear of string
by grinder (Bishop) on Dec 28, 2001 at 02:35 UTC
    You can use rindex to find the index of the rightmost character of string, which you could then modify with substr, but what you really want is File::Basename, it's part of the standard distribution.

    update: Indeed, Aigherach is right about the the fact that rindex takes a substring, but that is not germane to the discussion. If you're trying to identify the extension, that probably means you're looking for the rightmost dot in a string.

    substr( $s, 0, rindex( $s, '.' )); # the part before the dot substr( $s, rindex( $s, '.' ) + 1); # the part after the dot

    But that has portability considerations, and as such is dealt with by fileparse from the File::Basename module.

    --
    g r i n d e r
    just another bofh

    print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';
      Actually, it returns the position of the rightmost occurrence of a substring rather than a character. Though, you could use it to check for a substring that is only one character.
             rindex STR,SUBSTR,POSITION
             rindex STR,SUBSTR
                     Works just like index() except that it returns the
                     position of the LAST occurrence of SUBSTR in STR.
                     If POSITION is specified, returns the last occur­
                     rence at or before that position.
      

      --
      Snazzy tagline here
      or
      $ext = (split /\./, $fname)[-1];

      But neither take note of the fact that the file might not have an extention...

        ..or more than one dot... like linux-2.4.17.tar.gz
Re: Extract string from rear of string
by dmmiller2k (Chaplain) on Dec 28, 2001 at 03:30 UTC
      That's rather amusing... By following your link I actually get on to another post that also mentions this particular thread and so i get stuck in an infinite loop ;D.

      But, yeah, actually, is does sort of seem like someone's trying to get his homework question done by exploiting monks' sense of selflesness. What you think? In general, I'm against students who practice this. Unless you figure this out on your own, there's no way you'd be able to learn. It's absolutely OK when you get stuck on some obnoxious bugs or what not (like I always do ;)... however, the question this anonymous monk has posted seems more like a homework question from some nightmarish Perl class hehe. Therefore, I would be hesitant to help... sorry.


      "There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith
        ... I would be hesitant to help...

        My point exactly!

        Update: See the discussion at 'Beware the Trolls!'.

        dmm

        You can give a man a fish and feed him for a day ...
        Or, you can
        teach him to fish and feed him for a lifetime
Re: Extract string from rear of string
by talexb (Canon) on Dec 28, 2001 at 08:36 UTC
    Something like
    $FileName = "/var/log/apache/error.log"; $FileName =~ m/\.(\w*)$/; $Extension = $1;
    should do the trick. In this code fragment I'm looking for a real period, followed by any number of word characters, followed by the end of the string. I put that in brackets so that I can grab it as $1 later on.

    This of course makes the assumption that your definition of a file extension is the group of characters to the right of the last period in a file name .. so the file extension of "foo.bar.baz" would be "baz". Your code should also handle the situation where there are no periods in the filename.

    ps I highly recommend reading Programming Perl (Third Edition) by Wall, Christiansen & Orwant, published by O'Reilly.

    "Excellent. Release the hounds." -- Monty Burns.

      Your really should not use \w as many file systems will allow characters other than alpha-numerics. For instance 'file.$#@%' would be a valid name but would break your code.
Re: Extract string from rear of string
by ppg (Initiate) on Dec 28, 2001 at 16:05 UTC
    I think there's a File::Basename module that works similar to the unix basename command which should be able to return the extension somehow, but my books a bit hazzy on how it works.
Re: Extract string from rear of string
by thunders (Priest) on Dec 29, 2001 at 02:52 UTC
    Update: Juerd is correct in his post below. I made a mistake in my post regarding the split operator. I have corrected this.

    Here is an example of File::Basename, applied to the current directory. This is not perfect, but should get you started.
    #/usr/bin/perl -w use File::Basename; use strict; my @files = <*>; foreach my $file (@files) { my ($name,$dir,$type) = fileparse($file,'\..*'); print sprintf("file= %30s", $name), sprintf(" ext= %10s", $type), +"\n"; }

    you can refine this by crafting a better regex as the second argument to fileparse. This module is overkill unless you are dealing with full filepaths. for a single directory you could just use
    my($filename,$ext) = split(/\./,$file);
    or something similar.

      print sprintf("file= %30s", $name), sprintf(" ext= %10s", $type),"\n";
      Personally, I'd use:
      printf "file = %30s ext= %10s\n", $name, $type;

      my($filename,$ext) = split('\..*',$file);
      While this is valid syntax, using a string as split's first argument might be confusing to beginners. Every string that is not a single space (\x20) is interpreted as a regex. Using slashes or another m// makes your intention clear.
      my ($filename, $ext) = split /\..*/, $file;

      Splitting on /\..*/ would return ('foo', undef) for 'foo.bar'.
      Splitting on /\./ would probably fix this, but you don't want ('foo', 'bar', 'baz') or (using a limit) ('foo', 'bar.baz').
      So using a regex without split would probably be best:

      my ($filename, $ext) = $file =~ /^(.+)(?:\.(.*))?$/s
      (The first .+ will grab as much as it can, because it is greedy. The /s was added just in case someone has a linefeed in his filename, the anchors are there just to clarify the code, they don't serve a real. I used .+ for dotfiles (filenames beginning with a dot are hidden files in *nix). The extention part is optional ( (?:)? ) because not all files have an extention.)

      2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://134747]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-07-31 12:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (248 votes), past polls