Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Regex to match non image urls

by userdefinable (Initiate)
on Jan 18, 2013 at 09:00 UTC ( #1013989=perlquestion: print w/ replies, xml ) Need Help??
userdefinable has asked for the wisdom of the Perl Monks concerning the following question:

I need a regex to match everything except
[img]anything[/img]
where "anything" is a string ending in one of the following: .jpg .jpeg .png .gif .bmp I have this:
\[img\].*[^jpg]\[\/img\]
Which matches anything except string ending in jpg within the tags, but when I try to add the "or" bits it goes horribly wrong. Please help, thanks

Comment on Regex to match non image urls
Select or Download Code
Re: Regex to match non image urls
by muba (Priest) on Jan 18, 2013 at 09:12 UTC

    [^jpg] is a character class - meaning you match either j, p, org.

    Update: and the ^ negates the character class, so [^jpg] matches a single character that is not j, p, or g.

    Other than that, I feel your list is incomplete. Why won't you allow .svg images?

    And are you aware that it's completely possible to write a script in, say some CGI capable language such as Perl indeed, that would be called my-image-script.(txt|html|bananajuice) that would still generate a valid image in whatever format? But your regexp would disallow it. Is that intentional?

    Anyway, this should help you along...

    m( \[img\] # Match opening tag \s* # Maybe there are spaces inside .*? # Non-greedy match \. # A dot (?: # Non-capturing group jpe?g | png | svg | gif | bmp ) # End of group \s* # Maybe there are more spaces still \[/img\] # Match closing tag )x
      Thanks, not quite sure how to use this in the context of replacing occurrences in a block of text.
        Basically this doesn't work:
        $text =~ s~(\[img\]\s*.*?\.(?:jpe?g|png|svg|gif|bmp)\s*\[/img\])~Inval +id~isg;
Re: Regex to match non image urls
by Anonymous Monk on Jan 18, 2013 at 09:13 UTC

    Which matches anything except string ending in jpg within the tags, but when I try to add the "or" bits it goes horribly wrong. Please help, thanks

    Sorry, no it doesn't , it forbids a single character, one of g or j or p

    Read perlintro, perlrequick, perlfaq6, Parse::BBCode

Re: Regex to match non image urls
by tobyink (Abbot) on Jan 18, 2013 at 09:20 UTC

    In these cases it is generally easier to write a regexp which matches only image URLs, and then use the negative match operator !~ instead of the match operator =~.

    if ($url !~ m{^.*(jpe?g|gif|xbm|png|bmp)$}) { say "$url is not an image"; }
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Thanks but I'm trying to get the instances of this out of a chunk of text though. Like this:
      s~m(\[img\]\s*.*?\.(?:jpe?g|png|svg|gif|bmp)\s*\[/img\])x~Invalid~isg;
      but it doesn't work, so I'm clearly not using it correctly.

        Ah, OK. You never said it was part of a s/// operator.

        Maybe something along these lines?

        use 5.010; use strict; use warnings; my $text = do { local $/ = <DATA> }; my ($start, $end) = map quotemeta, qw([img] [/img]); $text =~ s{($start(.+?)$end)}{ my $link = $1; my $uri = $2; $uri =~ /\.(?:jpe?g|png|svg|gif|bmp)$/ ? $link : 'Invalid' }eg; print $text; __DATA__ Foo [img]http://example.com/[/img] Bar [img]http://example.com/logo.jpeg[/img]

        That said, what you are doing is conceptually broken. It is perfectly valid for an image on the web to have a URL ending with ".cgi", ".php", or even ".html". Whether something is an image or not is decided by its HTTP Content-Type header, not by the last few characters of the URL.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        /p

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1013989]
Approved by muba
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2014-10-23 17:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (126 votes), past polls