Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Why cant regex parse this string?

by jayto (Acolyte)
on Jul 12, 2012 at 13:49 UTC ( #981401=perlquestion: print w/ replies, xml ) Need Help??
jayto has asked for the wisdom of the Perl Monks concerning the following question:

Alright Monks, so I made a subroutine that is supposed to break down links into 3 pieces, the domain,the path,and filename. The subroutine works for most urls:

sub createStruct{ my %song = (); my $orig_url = $song{url} = shift; my $url = URI->new( "$orig_url" ); my $domain = $song{domain} = $url->host; my @split_url = split('/',$url); my $filename = $song{filename} = $split_url[-1]; $orig_url = $url; $orig_url =~ s/$domain//g; $orig_url =~ s/$filename//g; $orig_url =~ s/http:\/\/|https:\/\/|ftp:\/\///g; my $dir = $song{dir} = $orig_url; return \%song; }

but when I pass this url into the subroutine, it cannot remove the filename from the original url. This is the URL: http://freemp3files.hostoi.com/mp3_6744/Club SoundZ We Vol.15 (2010).mp3 This is the line that doesnt work : $orig_url =~ s/$filename//g; Why cant the filename be removed for this particular url?

Comment on Why cant regex parse this string?
Download Code
Re: Why cant regex parse this string?
by moritz (Cardinal) on Jul 12, 2012 at 13:57 UTC

    $filename is a string, but if you use it in a regex, it is interpreted as a regex. So all sorts of characters (like parenthesis, for example) have a special meaning. To prevent that, use s/\Q$filename\E//. See perlre for more information.

    But, you are already using the URI module, why do you do all those path manipulations yourself? $url->path gives you the path, which is the URL without schema or domain name, so you don't have to manually remove that from the original URL. Or path_segments, which gives you the different parts of the path delimited by slashes:

    # untested: my @path_chunks = $url->path_segments; pop @path_chunks; # remove the last one, which is the file name my $dir = $song{dir} = join '/', @path_chunks;

    (Updated to use $uri->path_segments, daxim++, and s/shift/pop/, johngg++)

      Thank you, that is what I suspected the problem was, but i wasn't sure how to fix it. I also used URI->path() instead of those lines of regex that i had before. Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://981401]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (11)
As of 2014-08-29 06:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (276 votes), past polls