Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Eugh, regex :(

by ultranerds (Friar)
on Mar 25, 2009 at 09:21 UTC ( #753064=perlquestion: print w/replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to get this regex working :/

Sample value of $post_message is:

[URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL]

(plus other images, and more content too)

The regex I have is:

$post_message =~ s%\Q[URL=\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q][img]\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q[/img][/URL]\E%gix;

..but I keep getting this error :

Backslash found where operator expected at /var/home/domain/domain.com +/www/admin/Plugins/GForum/Post_post.pm line 283, near "while ($post_m +essage =~ m%\" (Might be a runaway multi-line %% string starting on line 278) (Do you need to predeclare while?) Backslash found where operator expected at /var/home/domain/domain.com +/www/admin/Plugins/GForum/Post_post.pm line 283, near "img\"


Anyone got any suggestions? I'm all out of ideas :(

TIA!

Andy

Replies are listed 'Best First'.
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 10:06 UTC
    Hi,
    your regex was far too sophisticated IMHO.
    #!/usr/bin/perl use strict; use warnings; use re "debug"; # will help you understand your regexes my $post_message = '[URL=http://img207.imageshack.us/my.php?image=dsc0 +3598vt7.jpg]' . '[IMG]http://img207.imageshack.us/img207/2964/dsc03 +598vt7.jpg[/IMG][/URL]'; my @fields = $post_message =~ m#\Q[URL=\E([^]]+)\Q][img]\E([^]]+)\Q[/i +mg][/URL]\E#gix; ## ^^^ one or more of not ']' ## should be enough here. print join("\n", @fields, "\n");

    Regards,
    svenXY
      Hi,

      Thanks for the reply :) Although your example works (all tested out fine), I still can't get it going in my script :(

      print STDERR qq|\n\n---------------------------OLD post was: $ +post_message\n\n---------------------------|; $post_message =~ s|\Q[URL=\E([^]]+)\Q][IMG]\E([^]]+)\Q[/IMG][/ +URL]\E|$2|sig; print STDERR qq|\n\n---------------------------new post was: $ +post_message\n\n---------------------------|; <code> <br /><br /> All that does, is print out: <br /><br /> <code>---------------------------OLD post was: sdfsdfsdfsdf [img]http://images.voyageforum.org/photos/thumbs/0/73780-04_34_2---com +puter-keyboard_web.jpg[/img] [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG=htt +p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg][/IMG][/URL] [signature] --------------------------- ---------------------------new post was: sdfsdfsdfsdf [img]http://images.voyageforum.org/photos/thumbs/0/73780-04_34_2---com +puter-keyboard_web.jpg[/img] [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG=htt +p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg][/IMG][/URL] [signature] ---------------------------


      (as you can see, it hasn't been edited at all :()

      Any more ideas? Otherwise, just gonna call it quits with this til after I get back from vacation - maybe looking at it with fresh eyes will reveal something :/

      TIA

      Andy
Re: Eugh, regex :(
by haoess (Curate) on Mar 25, 2009 at 09:31 UTC

    Please, post your real code. The error message says something about matching: ... =~ m%..., but your code is something about substitution, and it does not compile:

    Substitution replacement not terminated at 753064 line 1.

    -- Frank

      Hi,

      Thats all the code you should need.

      $post_message is just a text string, with the example I gave - and then it runs this:

      $post_message =~ s%\Q[URL=\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q][img]\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q[/img][/URL]\E%gix;

      (which is the line thats giving the error)

      Maybe its something to do with the regex?

      Cheers

      Andy
        % cat 753064 $post_message =~ s%\Q[URL=\E([\?\%\:\/a-zA-Z0-9_\-\.]+)\Q][img]\E([\?\ +%\:\/a-zA-Z0-9_\-\.]+)\Q[/img][/URL]\E%gix; % perl -Mdiagnostics 753064 Substitution replacement not terminated at 753064 line 1 (#1) (F) The lexer couldn't find the final delimiter of an s/// or s{}{ +} construct. Remember that bracketing delimiters count nesting leve +l. Missing the leading $ from variable $s may cause this error. Uncaught exception from user code: Substitution replacement not terminated at 753064 line 1. at 753064 line 1

        The error message says it all, if you want to substitute, you should tell perl, whats your substitution is.

        -- Frank

Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 09:43 UTC
    Hi,

    your title could be a bit more descriptive, even more as you do not explain what you are aiming at. What are you trying to cut out of that HTML-code? Also - your error message says =~ m%..., but your code reads =~ s%... which - for me - spits out a different error message. However, to me it is not clear what you want to do.

    Another point: Parsing HTML source code with regexes is error prone and I would strongly suggest one of the parsers like HTML::Parser and it's derivatives, but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that...

    Regards,
    svenXY
      but this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that

      May I recommend Parse::BBCode by tinita?

      (Disclaimer: I'm a bit biased because I wrote a few tests for that module, and discussed some design questions with the author).

        >>May I recommend Parse::BBCode by tinita? <<

        This is for some forum software already - so when I get the data, it comes as BBCode, so no point converting back/forth :) Just need to get rid of the damn URL stuff =)

        Cheers

        Andy
      this here seems to be some pseudo-code for forums etc - I have no idea if there are modules to parse that
      Of course, there is at least one: Parse::RecDescent (just needs to be whipped up in shape).
      Hi,

      All this is aiming to do, is change stuff like:

      [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL]

      ..to:

      [IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG]

      (i.e remove the URL stuff)

      Cheers

      Andy
        Quick and dirty one line example. you need to escape the [] in the substitution.
        perl -e '$post_message="[URL=http://img207.imageshack.us/my.php?image= +dsc03598vt7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598 +vt7.jpg[/IMG][/URL]"; $post_message =~ s|^.*?\Q[img]\E([\?\%\:\/a-zA- +Z0-9_\-\.]+)\Q[/img]\E.*$|\[img\]$1\[/img\]|sig;print "$post_message\ +n";'
        My regex was a bit different. find [URL..blah] or [/URL..blah] and delete them with substitution.
        #!/usr/bin/perl -w use strict; my $example = '[URL=http://img207.imageshack.us/my.php?image=dsc03598v +t7.jpg][IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/ +IMG][/URL]'; print "$example \n"; $example =~ s/\[\/*URL.*?\]//g; print $example; #prints [URL=http://img207.imageshack.us/my.php?image=dsc03598vt7.jpg][IMG]htt +p://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG][/URL] [IMG]http://img207.imageshack.us/img207/2964/dsc03598vt7.jpg[/IMG]
        Update:I recommend Jeffrey Friedl's "Mastering Regular Expressions". This a "classic". But I figure this like nuclear weapons! The vast majority of regex problems can be solved by shooting the problem 1x or 2x or maybe even 3x with simplex regex'es in a sequence. Also I've found that the performance can be just as fast as a single complex regex (and sometimes faster)!
Re: Eugh, regex :(
by svenXY (Deacon) on Mar 25, 2009 at 11:43 UTC
Re: Eugh, regex :(
by SFLEX (Chaplain) on Mar 25, 2009 at 12:01 UTC
    My AUBBC module should be able to handle that.
    Spiel auf Hündinnen.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://753064]
Approved by svenXY
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2016-10-01 17:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?






    Results (3 votes). Check out past polls.