Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Perl string matching

by wizard341 (Acolyte)
on Oct 28, 2002 at 16:51 UTC ( #208543=perlquestion: print w/ replies, xml ) Need Help??
wizard341 has asked for the wisdom of the Perl Monks concerning the following question:

Im picking up a project from where another employee left off, and not knowing perl all that well it was a disadvantage that he was a sloppy coder and didnt comment a line of his code. I am trying to finish up a program that will redirect users to a page if the one they are looking for isnt available. But i have a few questions.
if ($Path =~ m{^(.*)\/([^\/]*)}) { $Dir = $1; $File = $2; $Ext=$3; } else { $File = $Path; } if ($File =~ m{^(.*)\.([^\.]*)}) { $File = $1; $Dot = "."; $Ext = $2; } else { $Dot = ""; $Ext = ""; } if (-d "/$Dir/$File") { $Dir .= "/$File"; $File = ""; }
I cant understand exactly what the first 2 statements do, ive been told the first statement will match anything, because its using the .* operator, but i dont know. If anyone would be so kind as to tell me what the first to pattern matching strings are looking for, i think that would lessen my headache and my extreme hatred for this old programmer!

Comment on Perl string matching
Download Code
Re: Perl string matching
by broquaint (Abbot) on Oct 28, 2002 at 17:00 UTC
    For explanations of regexes use japhy's ever helpful YAPE::Regex::Explain e.g
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(q<^(.*)\/([^\/]*)>)->explain(); __output__ The regular expression: (?-imsx:^(.*)\/([^\/]*)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \/ '/' ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- [^\/]* any character except: '\/' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    For more info on the syntax of perl see. perlsyn and for regexes perlre.
    HTH

    _________
    broquaint

Re: Perl string matching
by Molt (Chaplain) on Oct 28, 2002 at 17:06 UTC

    What they're trying to do is to get seperate the file from the end of the URL, so if the URL is 'http://www.wherever.com/whee/index.html' then $Dir should end up with 'http://www.wherever.com/', and $File as 'whee'

    He's trying to do this by matching 'any number of any character' (The (.*) part) followed by a literal / (The \/ part), and then any number of characters which aren't / (The.. err.. ([^\/]*) part. The reason it doesn't get the 'whee' is because it's not actually anchored at the end of the regexp, which I think could cause you some fun bugs in itself with HTML files which aren't in document root..

    Ugly ugly ugly! When you're rewriting this have a look at URI::URL which has methods for handling URI manipulation in a nice way.

    Update: For some reason my brain read this as handling URLs.. guess this is a good argument for comments. Use the module Roik recommends below if it is files. Sorry

Re: Perl string matching
by fruiture (Curate) on Oct 28, 2002 at 17:09 UTC
    if ($Path =~ m{^(.*)\/([^\/]*)}) # if whatever-is-in-$Path matches that pattern, # which should imho better be written as # m{^(?:([^/]*/)+([^/]*)} { # then $Dir is "the whole thing before the last slash" # when you use my pattern, you'll have to chop() it $Dir = $1; # $File is averething after the last slash $File = $2; # $Ext will always be undef $Ext=$3; } else { # otherwise the $File is the whole $Path $File = $Path; } # next part if ($File =~ m{^(.*)\.([^\.]*)}) # again "death to dot star": # m{^([^.]*.)+([^.]*)} { # if that one matches, $File is everythin before the # last dot # $Dot becomes '.' # $Ext is everything after the last period $File = $1; $Dot = "."; $Ext = $2; } else { # otherwise $Dot and $Ext are empty strings $Dot = ""; $Ext = ""; } # next part if (-d "/$Dir/$File") # if "whetever results when you join $dir and $file with # a slash and preceed it by another slash" truns out to be # resolved as directory { # then $File, preceeded by slash, is appended to $Dir $Dir .= "/$File"; # and $File becomes an empty string $File = ""; }

    All in all this is rather ugly, although not yet insecure, which depends on what happens next...

    --
    http://fruiture.de
Re: Perl string matching
by roik (Scribe) on Oct 28, 2002 at 17:12 UTC
    It does look pretty messy.

    The first line looks like it is meant to match a file path, but it doesn't seem to be entirely correct.

    Each part in a pair of braces () in the regex will be held in one of the special variables $1, $2... etc.

    I this case ^ means the regex will start looking from the beginning of the string. It will match any character (including null!) up to the first / in the string. I think the \ before the / is a red herring as it is escaping a character that does not need to be escaped here. / is a common regex delimiter and if / had been used in place of {} then the / would be correct.

    It also tries to use three special variables ($1, $2, $3) with only two sets of braces to assign to them!

    I would be tempted to rewrite the code using the File::Path module to separate the file and the path in this case.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://208543]
Approved by fglock
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (8)
As of 2014-12-25 09:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls