Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Why cant regex parse this string?

by jayto (Acolyte)
on Jul 12, 2012 at 13:49 UTC ( #981401=perlquestion: print w/replies, xml ) Need Help??
jayto has asked for the wisdom of the Perl Monks concerning the following question:

Alright Monks, so I made a subroutine that is supposed to break down links into 3 pieces, the domain,the path,and filename. The subroutine works for most urls:

sub createStruct{ my %song = (); my $orig_url = $song{url} = shift; my $url = URI->new( "$orig_url" ); my $domain = $song{domain} = $url->host; my @split_url = split('/',$url); my $filename = $song{filename} = $split_url[-1]; $orig_url = $url; $orig_url =~ s/$domain//g; $orig_url =~ s/$filename//g; $orig_url =~ s/http:\/\/|https:\/\/|ftp:\/\///g; my $dir = $song{dir} = $orig_url; return \%song; }

but when I pass this url into the subroutine, it cannot remove the filename from the original url. This is the URL: SoundZ We Vol.15 (2010).mp3 This is the line that doesnt work : $orig_url =~ s/$filename//g; Why cant the filename be removed for this particular url?

Replies are listed 'Best First'.
Re: Why cant regex parse this string?
by moritz (Cardinal) on Jul 12, 2012 at 13:57 UTC

    $filename is a string, but if you use it in a regex, it is interpreted as a regex. So all sorts of characters (like parenthesis, for example) have a special meaning. To prevent that, use s/\Q$filename\E//. See perlre for more information.

    But, you are already using the URI module, why do you do all those path manipulations yourself? $url->path gives you the path, which is the URL without schema or domain name, so you don't have to manually remove that from the original URL. Or path_segments, which gives you the different parts of the path delimited by slashes:

    # untested: my @path_chunks = $url->path_segments; pop @path_chunks; # remove the last one, which is the file name my $dir = $song{dir} = join '/', @path_chunks;

    (Updated to use $uri->path_segments, daxim++, and s/shift/pop/, johngg++)

      Thank you, that is what I suspected the problem was, but i wasn't sure how to fix it. I also used URI->path() instead of those lines of regex that i had before. Thanks again!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://981401]
Front-paged by Corion
[ambrus]: Ok, but if there's a POE integration, then I again suggest that you consider using IO::Async::Loop:: Poe . I don't know if that will work.
[ambrus]: So that, or if you really want something agnostic to the loop, then use curl multi. Its interface is quite reasonable (unless you use an old version),
[ambrus]: though it's very C-like (serious use of varargs so easy to pass the wrong type), and there are some minor problems (busy loop because it doesn't use the same timer as your event loop, so you have to artificially delay timer callbacks by a small amount).

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2016-12-07 16:21 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (130 votes). Check out past polls.