Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Eugh, regex :(

by ultranerds (Friar)
on Mar 25, 2009 at 09:21 UTC ( #753064=perlquestion: print w/replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:


I'm trying to get this regex working :/

Sample value of $post_message is:


(plus other images, and more content too)

The regex I have is:

$post_message =~ s%\Q[URL=\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q][img]\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q[/img][/URL]\E%gix;

..but I keep getting this error :

Backslash found where operator expected at /var/home/domain/ +/www/admin/Plugins/GForum/ line 283, near "while ($post_m +essage =~ m%\" (Might be a runaway multi-line %% string starting on line 278) (Do you need to predeclare while?) Backslash found where operator expected at /var/home/domain/ +/www/admin/Plugins/GForum/ line 283, near "img\"

Anyone got any suggestions? I'm all out of ideas :(



Replies are listed 'Best First'.
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 10:06 UTC
    your regex was far too sophisticated IMHO.
    #!/usr/bin/perl use strict; use warnings; use re "debug"; # will help you understand your regexes my $post_message = '[URL= +3598vt7.jpg]' . '[IMG] +598vt7.jpg[/IMG][/URL]'; my @fields = $post_message =~ m#\Q[URL=\E([^]]+)\Q][img]\E([^]]+)\Q[/i +mg][/URL]\E#gix; ## ^^^ one or more of not ']' ## should be enough here. print join("\n", @fields, "\n");


      Thanks for the reply :) Although your example works (all tested out fine), I still can't get it going in my script :(

      print STDERR qq|\n\n---------------------------OLD post was: $ +post_message\n\n---------------------------|; $post_message =~ s|\Q[URL=\E([^]]+)\Q][IMG]\E([^]]+)\Q[/IMG][/ +URL]\E|$2|sig; print STDERR qq|\n\n---------------------------new post was: $ +post_message\n\n---------------------------|; <code> <br /><br /> All that does, is print out: <br /><br /> <code>---------------------------OLD post was: sdfsdfsdfsdf [img] +puter-keyboard_web.jpg[/img] [URL=][IMG=htt +p://][/IMG][/URL] [signature] --------------------------- ---------------------------new post was: sdfsdfsdfsdf [img] +puter-keyboard_web.jpg[/img] [URL=][IMG=htt +p://][/IMG][/URL] [signature] ---------------------------

      (as you can see, it hasn't been edited at all :()

      Any more ideas? Otherwise, just gonna call it quits with this til after I get back from vacation - maybe looking at it with fresh eyes will reveal something :/


Re: Eugh, regex :(
by haoess (Curate) on Mar 25, 2009 at 09:31 UTC

    Please, post your real code. The error message says something about matching: ... =~ m%..., but your code is something about substitution, and it does not compile:

    Substitution replacement not terminated at 753064 line 1.

    -- Frank


      Thats all the code you should need.

      $post_message is just a text string, with the example I gave - and then it runs this:

      $post_message =~ s%\Q[URL=\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q][img]\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q[/img][/URL]\E%gix;

      (which is the line thats giving the error)

      Maybe its something to do with the regex?


        % cat 753064 $post_message =~ s%\Q[URL=\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q][img]\E([\?\ +%\:\/a-zA-Z0-9_\-\.]+)\Q[/img][/URL]\E%gix; % perl -Mdiagnostics 753064 Substitution replacement not terminated at 753064 line 1 (#1) (F) The lexer couldn't find the final delimiter of an s/// or s{}{ +} construct. Remember that bracketing delimiters count nesting leve +l. Missing the leading $ from variable $s may cause this error. Uncaught exception from user code: Substitution replacement not terminated at 753064 line 1. at 753064 line 1

        The error message says it all, if you want to substitute, you should tell perl, whats your substitution is.

        -- Frank

Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 09:43 UTC

    your title could be a bit more descriptive, even more as you do not explain what you are aiming at. What are you trying to cut out of that HTML-code? Also - your error message says =~ m%..., but your code reads =~ s%... which - for me - spits out a different error message. However, to me it is not clear what you want to do.

    Another point: Parsing HTML source code with regexes is error prone and I would strongly suggest one of the parsers like HTML::Parser and it's derivatives, but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that...

      but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that

      May I recommend Parse::BBCode by tinita?

      (Disclaimer: I'm a bit biased because I wrote a few tests for that module, and discussed some design questions with the author).

        >>May I recommend Parse::BBCode by tinita? <<

        This is for some forum software already - so when I get the data, it comes as BBCode, so no point converting back/forth :) Just need to get rid of the damn URL stuff =)


      this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that
      Of course, there is at least one: Parse::RecDescent (just needs to be whipped up in shape).

      All this is aiming to do, is change stuff like:



      (i.e remove the URL stuff)


        Quick and dirty one line example. you need to escape the [] in the substitution.
        perl -e '$post_message="[URL= +dsc03598vt7.jpg][IMG] +vt7.jpg[/IMG][/URL]"; $post_message =~ s|^.*?\Q[img]\E([\?\%\:\/a-zA- +Z0-9_\-\.]+)\Q[/img]\E.*$|\[img\]$1\[/img\]|sig;print "$post_message\ +n";'
        My regex was a bit different. find [URL..blah] or [/URL..blah] and delete them with substitution.
        #!/usr/bin/perl -w use strict; my $example = '[URL= +t7.jpg][IMG][/ +IMG][/URL]'; print "$example \n"; $example =~ s/\[\/*URL.*?\]//g; print $example; #prints [URL=][IMG]htt +p://[/IMG][/URL] [IMG][/IMG]
        Update:I recommend Jeffrey Friedl's "Mastering Regular Expressions". This a "classic". But I figure this like nuclear weapons! The vast majority of regex problems can be solved by shooting the problem 1x or 2x or maybe even 3x with simplex regex'es in a sequence. Also I've found that the performance can be just as fast as a single complex regex (and sometimes faster)!
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 11:43 UTC
Re: Eugh, regex :(
by SFLEX (Chaplain) on Mar 25, 2009 at 12:01 UTC
    My AUBBC module should be able to handle that.
    Spiel auf Hündinnen.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://753064]
Approved by svenXY
[marinersk]: choroba++
[choroba]: it's a long running test, so it's normally skipped unless an env var is set
[choroba]: nobody has been bothered to set the variable in the last 3 years
[marinersk]: sub newtest{my $expected_result = &target('foo'); my $actual_result = &target('foo'); if ($actual_result eq $expected_result) { &tdd_success(); } else { &tdd_fail(); } } # Test works after three years!
[choroba]: or nobody bothered...
[choroba]: The problem was bigger, as the test tried to call a method that didn't exist anymore
[marinersk]: :: ducking ::
[choroba]: because, someone renamed the method, but didn't notice it was used in the test, as the test was skipped
[marinersk]: Well, if the method doesn't exist, it would be hard to pass the test.
[choroba]: later, someone removed the new method, as all its usage places were safe, but didn't notice the test still used the old name

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (11)
As of 2017-05-25 15:08 GMT
Find Nodes?
    Voting Booth?