Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Possible with a regex?

by ultranerds (Friar)
on May 23, 2011 at 15:32 UTC ( #906310=perlquestion: print w/ replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to find a solution to this. For some annoying reason, people have been using invalid BBCode, and I'm trying to fix this up. For example:
my $test = qq| sdfiojs pfojsdfs fs [img]sdj fpsdofj spojf sfsf [b] [img]http://www.te +st.com/image.gif[/img] dfs fs s fsf sfd [img]test.gif[/img] |; while ($test =~ /\[img\](.+?)\[\/img\]/g) { print "$1 \n"; }
..gives me:
C:\Users\Andy>perl test.pl sdj fpsdofj spojf sfsf [b] [img]http://www.test.com/image.gi test.gif
What I need really, is:
C:\Users\Andy>perl test.pl http://www.test.com/image.gif test.gif
Is this possible? I could do it inside the while () loop, by checking the value - but if possible, I would prefer to just do it in a regex :)

TIA

Andy

Comment on Possible with a regex?
Select or Download Code
Re: Possible with a regex?
by BrowserUk (Pope) on May 23, 2011 at 15:43 UTC

    If you can assume that urls will not contain (unencoded) spaces, then this might work for you:

    my $test = qq| sdfiojs pfojsdfs fs [img]sdj fpsdofj spojf sfsf [b] [img]http://www.te +st.com/image.gif[/img] dfs fs s fsf sfd [img]test.gif[/img] |; while ($test =~ /\[img\](\S+?)\[\/img\]/g) { print "$1 \n"; } __END__ c:\test>junk http://www.test.com/image.gif test.gif

    Alternatively, maybe this is better:

    while ($test =~ /\[img\]([^\[\]]+?)\[\/img\]/g) { print "$1 \n"; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hi,

      Thanks - your second option works perfectly :)

      Cheers

      Andy
Re: Possible with a regex?
by Corion (Pope) on May 23, 2011 at 15:49 UTC

    Depending on how much you can limit your input, one easy way would be to disallow [ in your URLs:

    while ($test =~ m!\[img\]([^\[]+)\[\/img\]!g) { print "$1 \n"; }

    If you really want to allow [ and ] in your tags as well, you can, I think, create some complex look-ahead regex that disallows for [img] to appear, by allowing for [ if it's not followed by i and allowing [i if it's not followed by m and allowing [im if it's not followed by g and so on:

    while ($test =~ m!\[img\]((?:[^\[]+|\[[^i]|\[i[^m]|\[im[^g]|...)+\[\/i +mg\]!g) { print "$1 \n"; }

    Personally, I would restrict the input to disallow [ in URLs or just split on /\[img\]/ and then discard all strings that don't contain [/img].

      Thanks. Images can only use a valid image URL (as well as "dynamic URLs" .. i.e image.cgi?sess=234wfdsfsf) ... so the more basic version works for me :)
Re: Possible with a regex?
by wind (Priest) on May 23, 2011 at 17:10 UTC

    Note: try using m{...} instead of m/.../ anytime you want to have a regex that contains forward slashes. That way you don't have to escape them:

    my $test = qq| sdfiojs pfojsdfs fs [img]sdj fpsdofj spojf sfsf [b] [img]http://www.te +st.com/image.gif[/img] dfs fs s fsf sfd [img]test.gif[/img] |; while ($test =~ m{\[img\]([^\[\]]*)\[/img\]}g) { print "$1\n"; }
Re: Possible with a regex?
by John M. Dlugosz (Monsignor) on May 23, 2011 at 17:40 UTC
    There should already be modules on CPAN that parse or convert BBCode.
Re: Possible with a regex?
by JavaFan (Canon) on May 23, 2011 at 19:11 UTC
    Some basic loop-unrolling: (untested)
    m{\[img]([^[]*(?:\[(?!img])[^[]*)*)\[/img]}g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://906310]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2015-07-05 18:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls