Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Regex to match non image urls

by userdefinable (Initiate)
on Jan 18, 2013 at 09:00 UTC ( #1013989=perlquestion: print w/ replies, xml ) Need Help??
userdefinable has asked for the wisdom of the Perl Monks concerning the following question:

I need a regex to match everything except
[img]anything[/img]
where "anything" is a string ending in one of the following: .jpg .jpeg .png .gif .bmp I have this:
\[img\].*[^jpg]\[\/img\]
Which matches anything except string ending in jpg within the tags, but when I try to add the "or" bits it goes horribly wrong. Please help, thanks

Comment on Regex to match non image urls
Select or Download Code
Re: Regex to match non image urls
by muba (Priest) on Jan 18, 2013 at 09:12 UTC

    [^jpg] is a character class - meaning you match either j, p, org.

    Update: and the ^ negates the character class, so [^jpg] matches a single character that is not j, p, or g.

    Other than that, I feel your list is incomplete. Why won't you allow .svg images?

    And are you aware that it's completely possible to write a script in, say some CGI capable language such as Perl indeed, that would be called my-image-script.(txt|html|bananajuice) that would still generate a valid image in whatever format? But your regexp would disallow it. Is that intentional?

    Anyway, this should help you along...

    m( \[img\] # Match opening tag \s* # Maybe there are spaces inside .*? # Non-greedy match \. # A dot (?: # Non-capturing group jpe?g | png | svg | gif | bmp ) # End of group \s* # Maybe there are more spaces still \[/img\] # Match closing tag )x
      Thanks, not quite sure how to use this in the context of replacing occurrences in a block of text.
        Basically this doesn't work:
        $text =~ s~(\[img\]\s*.*?\.(?:jpe?g|png|svg|gif|bmp)\s*\[/img\])~Inval +id~isg;
Re: Regex to match non image urls
by Anonymous Monk on Jan 18, 2013 at 09:13 UTC

    Which matches anything except string ending in jpg within the tags, but when I try to add the "or" bits it goes horribly wrong. Please help, thanks

    Sorry, no it doesn't , it forbids a single character, one of g or j or p

    Read perlintro, perlrequick, perlfaq6, Parse::BBCode

Re: Regex to match non image urls
by tobyink (Abbot) on Jan 18, 2013 at 09:20 UTC

    In these cases it is generally easier to write a regexp which matches only image URLs, and then use the negative match operator !~ instead of the match operator =~.

    if ($url !~ m{^.*(jpe?g|gif|xbm|png|bmp)$}) { say "$url is not an image"; }
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Thanks but I'm trying to get the instances of this out of a chunk of text though. Like this:
      s~m(\[img\]\s*.*?\.(?:jpe?g|png|svg|gif|bmp)\s*\[/img\])x~Invalid~isg;
      but it doesn't work, so I'm clearly not using it correctly.

        Ah, OK. You never said it was part of a s/// operator.

        Maybe something along these lines?

        use 5.010; use strict; use warnings; my $text = do { local $/ = <DATA> }; my ($start, $end) = map quotemeta, qw([img] [/img]); $text =~ s{($start(.+?)$end)}{ my $link = $1; my $uri = $2; $uri =~ /\.(?:jpe?g|png|svg|gif|bmp)$/ ? $link : 'Invalid' }eg; print $text; __DATA__ Foo [img]http://example.com/[/img] Bar [img]http://example.com/logo.jpeg[/img]

        That said, what you are doing is conceptually broken. It is perfectly valid for an image on the web to have a URL ending with ".cgi", ".php", or even ".html". Whether something is an image or not is decided by its HTTP Content-Type header, not by the last few characters of the URL.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        /p

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1013989]
Approved by muba
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2014-08-23 11:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (173 votes), past polls