Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Why cant regex parse this string?

by jayto (Acolyte)
on Jul 12, 2012 at 13:49 UTC ( #981401=perlquestion: print w/replies, xml ) Need Help??
jayto has asked for the wisdom of the Perl Monks concerning the following question:

Alright Monks, so I made a subroutine that is supposed to break down links into 3 pieces, the domain,the path,and filename. The subroutine works for most urls:

sub createStruct{ my %song = (); my $orig_url = $song{url} = shift; my $url = URI->new( "$orig_url" ); my $domain = $song{domain} = $url->host; my @split_url = split('/',$url); my $filename = $song{filename} = $split_url[-1]; $orig_url = $url; $orig_url =~ s/$domain//g; $orig_url =~ s/$filename//g; $orig_url =~ s/http:\/\/|https:\/\/|ftp:\/\///g; my $dir = $song{dir} = $orig_url; return \%song; }

but when I pass this url into the subroutine, it cannot remove the filename from the original url. This is the URL: SoundZ We Vol.15 (2010).mp3 This is the line that doesnt work : $orig_url =~ s/$filename//g; Why cant the filename be removed for this particular url?

Replies are listed 'Best First'.
Re: Why cant regex parse this string?
by moritz (Cardinal) on Jul 12, 2012 at 13:57 UTC

    $filename is a string, but if you use it in a regex, it is interpreted as a regex. So all sorts of characters (like parenthesis, for example) have a special meaning. To prevent that, use s/\Q$filename\E//. See perlre for more information.

    But, you are already using the URI module, why do you do all those path manipulations yourself? $url->path gives you the path, which is the URL without schema or domain name, so you don't have to manually remove that from the original URL. Or path_segments, which gives you the different parts of the path delimited by slashes:

    # untested: my @path_chunks = $url->path_segments; pop @path_chunks; # remove the last one, which is the file name my $dir = $song{dir} = join '/', @path_chunks;

    (Updated to use $uri->path_segments, daxim++, and s/shift/pop/, johngg++)

      Thank you, that is what I suspected the problem was, but i wasn't sure how to fix it. I also used URI->path() instead of those lines of regex that i had before. Thanks again!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://981401]
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2017-07-23 11:16 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (346 votes). Check out past polls.