Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Perl string matching

by wizard341 (Acolyte)
on Oct 28, 2002 at 16:51 UTC ( #208543=perlquestion: print w/ replies, xml ) Need Help??
wizard341 has asked for the wisdom of the Perl Monks concerning the following question:

Im picking up a project from where another employee left off, and not knowing perl all that well it was a disadvantage that he was a sloppy coder and didnt comment a line of his code. I am trying to finish up a program that will redirect users to a page if the one they are looking for isnt available. But i have a few questions.
if ($Path =~ m{^(.*)\/([^\/]*)}) { $Dir = $1; $File = $2; $Ext=$3; } else { $File = $Path; } if ($File =~ m{^(.*)\.([^\.]*)}) { $File = $1; $Dot = "."; $Ext = $2; } else { $Dot = ""; $Ext = ""; } if (-d "/$Dir/$File") { $Dir .= "/$File"; $File = ""; }
I cant understand exactly what the first 2 statements do, ive been told the first statement will match anything, because its using the .* operator, but i dont know. If anyone would be so kind as to tell me what the first to pattern matching strings are looking for, i think that would lessen my headache and my extreme hatred for this old programmer!

Comment on Perl string matching
Download Code
Re: Perl string matching
by broquaint (Abbot) on Oct 28, 2002 at 17:00 UTC
    For explanations of regexes use japhy's ever helpful YAPE::Regex::Explain e.g
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(q<^(.*)\/([^\/]*)>)->explain(); __output__ The regular expression: (?-imsx:^(.*)\/([^\/]*)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \/ '/' ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- [^\/]* any character except: '\/' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    For more info on the syntax of perl see. perlsyn and for regexes perlre.
    HTH

    _________
    broquaint

Re: Perl string matching
by Molt (Chaplain) on Oct 28, 2002 at 17:06 UTC

    What they're trying to do is to get seperate the file from the end of the URL, so if the URL is 'http://www.wherever.com/whee/index.html' then $Dir should end up with 'http://www.wherever.com/', and $File as 'whee'

    He's trying to do this by matching 'any number of any character' (The (.*) part) followed by a literal / (The \/ part), and then any number of characters which aren't / (The.. err.. ([^\/]*) part. The reason it doesn't get the 'whee' is because it's not actually anchored at the end of the regexp, which I think could cause you some fun bugs in itself with HTML files which aren't in document root..

    Ugly ugly ugly! When you're rewriting this have a look at URI::URL which has methods for handling URI manipulation in a nice way.

    Update: For some reason my brain read this as handling URLs.. guess this is a good argument for comments. Use the module Roik recommends below if it is files. Sorry

Re: Perl string matching
by fruiture (Curate) on Oct 28, 2002 at 17:09 UTC
    if ($Path =~ m{^(.*)\/([^\/]*)}) # if whatever-is-in-$Path matches that pattern, # which should imho better be written as # m{^(?:([^/]*/)+([^/]*)} { # then $Dir is "the whole thing before the last slash" # when you use my pattern, you'll have to chop() it $Dir = $1; # $File is averething after the last slash $File = $2; # $Ext will always be undef $Ext=$3; } else { # otherwise the $File is the whole $Path $File = $Path; } # next part if ($File =~ m{^(.*)\.([^\.]*)}) # again "death to dot star": # m{^([^.]*.)+([^.]*)} { # if that one matches, $File is everythin before the # last dot # $Dot becomes '.' # $Ext is everything after the last period $File = $1; $Dot = "."; $Ext = $2; } else { # otherwise $Dot and $Ext are empty strings $Dot = ""; $Ext = ""; } # next part if (-d "/$Dir/$File") # if "whetever results when you join $dir and $file with # a slash and preceed it by another slash" truns out to be # resolved as directory { # then $File, preceeded by slash, is appended to $Dir $Dir .= "/$File"; # and $File becomes an empty string $File = ""; }

    All in all this is rather ugly, although not yet insecure, which depends on what happens next...

    --
    http://fruiture.de
Re: Perl string matching
by roik (Scribe) on Oct 28, 2002 at 17:12 UTC
    It does look pretty messy.

    The first line looks like it is meant to match a file path, but it doesn't seem to be entirely correct.

    Each part in a pair of braces () in the regex will be held in one of the special variables $1, $2... etc.

    I this case ^ means the regex will start looking from the beginning of the string. It will match any character (including null!) up to the first / in the string. I think the \ before the / is a red herring as it is escaping a character that does not need to be escaped here. / is a common regex delimiter and if / had been used in place of {} then the / would be correct.

    It also tries to use three special variables ($1, $2, $3) with only two sets of braces to assign to them!

    I would be tempted to rewrite the code using the File::Path module to separate the file and the path in this case.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://208543]
Approved by fglock
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (14)
As of 2014-07-23 14:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (144 votes), past polls