Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Regex to match non image urls

by tobyink (Canon)
on Jan 18, 2013 at 09:20 UTC ( [id://1013993]=note: print w/replies, xml ) Need Help??


in reply to Regex to match non image urls

In these cases it is generally easier to write a regexp which matches only image URLs, and then use the negative match operator !~ instead of the match operator =~.

if ($url !~ m{^.*(jpe?g|gif|xbm|png|bmp)$}) { say "$url is not an image"; }
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Replies are listed 'Best First'.
Re^2: Regex to match non image urls
by userdefinable (Initiate) on Jan 18, 2013 at 09:23 UTC
    Thanks but I'm trying to get the instances of this out of a chunk of text though. Like this:
    s~m(\[img\]\s*.*?\.(?:jpe?g|png|svg|gif|bmp)\s*\[/img\])x~Invalid~isg;
    but it doesn't work, so I'm clearly not using it correctly.

      Ah, OK. You never said it was part of a s/// operator.

      Maybe something along these lines?

      use 5.010; use strict; use warnings; my $text = do { local $/ = <DATA> }; my ($start, $end) = map quotemeta, qw([img] [/img]); $text =~ s{($start(.+?)$end)}{ my $link = $1; my $uri = $2; $uri =~ /\.(?:jpe?g|png|svg|gif|bmp)$/ ? $link : 'Invalid' }eg; print $text; __DATA__ Foo [img]http://example.com/[/img] Bar [img]http://example.com/logo.jpeg[/img]

      That said, what you are doing is conceptually broken. It is perfectly valid for an image on the web to have a URL ending with ".cgi", ".php", or even ".html". Whether something is an image or not is decided by its HTTP Content-Type header, not by the last few characters of the URL.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      /p
        That said, what you are doing is conceptually broken. It is perfectly valid for an image on the web to have a URL ending with ".cgi", ".php", or even ".html". Whether something is an image or not is decided by its HTTP Content-Type header, not by the last few characters of the URL.

        I couldn't agree more, sir.

        That being said, I like the elegance of your code snippet.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1013993]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-25 15:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found