Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Why cant regex parse this string?

by jayto (Acolyte)
on Jul 12, 2012 at 13:49 UTC ( #981401=perlquestion: print w/ replies, xml ) Need Help??
jayto has asked for the wisdom of the Perl Monks concerning the following question:

Alright Monks, so I made a subroutine that is supposed to break down links into 3 pieces, the domain,the path,and filename. The subroutine works for most urls:

sub createStruct{ my %song = (); my $orig_url = $song{url} = shift; my $url = URI->new( "$orig_url" ); my $domain = $song{domain} = $url->host; my @split_url = split('/',$url); my $filename = $song{filename} = $split_url[-1]; $orig_url = $url; $orig_url =~ s/$domain//g; $orig_url =~ s/$filename//g; $orig_url =~ s/http:\/\/|https:\/\/|ftp:\/\///g; my $dir = $song{dir} = $orig_url; return \%song; }

but when I pass this url into the subroutine, it cannot remove the filename from the original url. This is the URL: http://freemp3files.hostoi.com/mp3_6744/Club SoundZ We Vol.15 (2010).mp3 This is the line that doesnt work : $orig_url =~ s/$filename//g; Why cant the filename be removed for this particular url?

Comment on Why cant regex parse this string?
Download Code
Re: Why cant regex parse this string?
by moritz (Cardinal) on Jul 12, 2012 at 13:57 UTC

    $filename is a string, but if you use it in a regex, it is interpreted as a regex. So all sorts of characters (like parenthesis, for example) have a special meaning. To prevent that, use s/\Q$filename\E//. See perlre for more information.

    But, you are already using the URI module, why do you do all those path manipulations yourself? $url->path gives you the path, which is the URL without schema or domain name, so you don't have to manually remove that from the original URL. Or path_segments, which gives you the different parts of the path delimited by slashes:

    # untested: my @path_chunks = $url->path_segments; pop @path_chunks; # remove the last one, which is the file name my $dir = $song{dir} = join '/', @path_chunks;

    (Updated to use $uri->path_segments, daxim++, and s/shift/pop/, johngg++)

      Thank you, that is what I suspected the problem was, but i wasn't sure how to fix it. I also used URI->path() instead of those lines of regex that i had before. Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://981401]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-11-26 09:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (165 votes), past polls