Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Possible with a regex?

by ultranerds (Pilgrim)
on May 23, 2011 at 15:32 UTC ( #906310=perlquestion: print w/ replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to find a solution to this. For some annoying reason, people have been using invalid BBCode, and I'm trying to fix this up. For example:
my $test = qq| sdfiojs pfojsdfs fs [img]sdj fpsdofj spojf sfsf [b] [img]http://www.te +st.com/image.gif[/img] dfs fs s fsf sfd [img]test.gif[/img] |; while ($test =~ /\[img\](.+?)\[\/img\]/g) { print "$1 \n"; }
..gives me:
C:\Users\Andy>perl test.pl sdj fpsdofj spojf sfsf [b] [img]http://www.test.com/image.gi test.gif
What I need really, is:
C:\Users\Andy>perl test.pl http://www.test.com/image.gif test.gif
Is this possible? I could do it inside the while () loop, by checking the value - but if possible, I would prefer to just do it in a regex :)

TIA

Andy

Comment on Possible with a regex?
Select or Download Code
Re: Possible with a regex?
by BrowserUk (Pope) on May 23, 2011 at 15:43 UTC

    If you can assume that urls will not contain (unencoded) spaces, then this might work for you:

    my $test = qq| sdfiojs pfojsdfs fs [img]sdj fpsdofj spojf sfsf [b] [img]http://www.te +st.com/image.gif[/img] dfs fs s fsf sfd [img]test.gif[/img] |; while ($test =~ /\[img\](\S+?)\[\/img\]/g) { print "$1 \n"; } __END__ c:\test>junk http://www.test.com/image.gif test.gif

    Alternatively, maybe this is better:

    while ($test =~ /\[img\]([^\[\]]+?)\[\/img\]/g) { print "$1 \n"; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hi,

      Thanks - your second option works perfectly :)

      Cheers

      Andy
Re: Possible with a regex?
by Corion (Pope) on May 23, 2011 at 15:49 UTC

    Depending on how much you can limit your input, one easy way would be to disallow [ in your URLs:

    while ($test =~ m!\[img\]([^\[]+)\[\/img\]!g) { print "$1 \n"; }

    If you really want to allow [ and ] in your tags as well, you can, I think, create some complex look-ahead regex that disallows for [img] to appear, by allowing for [ if it's not followed by i and allowing [i if it's not followed by m and allowing [im if it's not followed by g and so on:

    while ($test =~ m!\[img\]((?:[^\[]+|\[[^i]|\[i[^m]|\[im[^g]|...)+\[\/i +mg\]!g) { print "$1 \n"; }

    Personally, I would restrict the input to disallow [ in URLs or just split on /\[img\]/ and then discard all strings that don't contain [/img].

      Thanks. Images can only use a valid image URL (as well as "dynamic URLs" .. i.e image.cgi?sess=234wfdsfsf) ... so the more basic version works for me :)
Re: Possible with a regex?
by wind (Priest) on May 23, 2011 at 17:10 UTC

    Note: try using m{...} instead of m/.../ anytime you want to have a regex that contains forward slashes. That way you don't have to escape them:

    my $test = qq| sdfiojs pfojsdfs fs [img]sdj fpsdofj spojf sfsf [b] [img]http://www.te +st.com/image.gif[/img] dfs fs s fsf sfd [img]test.gif[/img] |; while ($test =~ m{\[img\]([^\[\]]*)\[/img\]}g) { print "$1\n"; }
Re: Possible with a regex?
by John M. Dlugosz (Monsignor) on May 23, 2011 at 17:40 UTC
    There should already be modules on CPAN that parse or convert BBCode.
Re: Possible with a regex?
by JavaFan (Canon) on May 23, 2011 at 19:11 UTC
    Some basic loop-unrolling: (untested)
    m{\[img]([^[]*(?:\[(?!img])[^[]*)*)\[/img]}g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://906310]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2014-12-26 07:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (168 votes), past polls