Beefy Boxes and Bandwidth Generously Provided by pair Networks kudra
P is for Practical
 
PerlMonks  

Help with Regex

by rsiedl (Friar)
on Feb 09, 2007 at 01:07 UTC ( [id://599168]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks.

Can anybody help me with this regex?
<code> #!/usr/bin/perl use strict

Replies are listed 'Best First'.
Re: Help with Regex
by davido (Cardinal) on Feb 09, 2007 at 01:20 UTC

    Several issues here: First, you need a non-greedy quantifier. And while we're at it, I'm sure you want something captured, so lets use the + quantifier instead of * (which permits nothing).

    Next, you must also realize that your text has "newline" characters embedded within it. The . (dot) metacharacter excludes newlines by default, unless you use the /s modifier on your regexp. Here's a "repaired" version:

    my $source = << 'END'; adf <!-- InstanceBeginEditable name="guts" --> this is the part we want <!-- InstanceEndEditable --> adf <!-- InstanceBeginEditable name="crap" --> adf <!-- InstanceEndEditable --> adf END # Pull out just what we need my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.?)<!-- InstanceEndEditable -->/s;

    You're also not checking whether your regexp successfully matched or not, so when it silently fails, you simply get no text captured (in this example). You really ought to test whether a match took place or not.

    One other word of warning: parsing HTML with a regular expression is fragile. You know, it's possible that a newline gets embedded in one of your HTML comments too, and that would break the regexp test. It's always advisable to use a HTML parser rather than rolling your own regexp approach.


    Dave

Re: Help with Regex
by siva kumar (Pilgrim) on Feb 09, 2007 at 02:38 UTC
    Try this
    1. Use ungreedy (.*?)
    2. Treat the whole string as one line. use /s
    my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.*?)<!-- InstanceEndEditable -->/s;
    instead of
    my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.*)<!-- InstanceEndEditable -->/;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://599168]
Approved by davido
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.