Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

PERL Regular expression

by Anonymous Monk
on Sep 03, 2010 at 18:21 UTC ( #858775=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:


This might be a merely question; but made me to be in perplexed state. The following regular expression I wrote is to validate a self closing tag in the XML. If there is more than one tag in the current line; I would handle in elsif (next line).

#!/usr/bin/perl $line = '<log dest="calllog"> directory <value expr="callRecDirPath"/> +'; if ( $line =~ /<(\w(?:[^ \/>]+))(?:(?:(?:\s+)([^=>]+)\s*=\s*("|')([^\3 +]*)\3))\s*\/>/) { print ("1=" . $1. "\n2=" . $2 . "\n3=" . $3 . "\n4=" . $4 . "\ +n5=" . $5 . "\n"); }

But I got the flowing result:

1=log 2=dest 3=" 4=calllog"> directory <value expr="callRecDirPath

From the above result; the value for 4 is the issue. The value for 3 is "; therefore ([^\3]*) should not have this double quote in its value. But the result 4 has two double quotes. How this happen? Can you please help me in this regard?

Thanks in advance.

G. Indragoby.

Replies are listed 'Best First'.
Re: PERL Regular expression
by kennethk (Abbot) on Sep 03, 2010 at 18:48 UTC
    Please read Writeup Formatting Tips. In particular, code should be wrapped in <code> tags so that formatting is preserved. Note how your character classes got converted to links - this is a direct result of omitting <code> tags.

    I looked through the documentation to find you a link on this (perlre, perlretut) but was unable to track one down. Essentially, the problem is that backreferences appear not to work as you seem to expect in a character class. I suspect this is because the backreference is not defined when the pattern is compiled. One work around for you would be to A bit of magic: executing Perl code in a regular expression:

    if ( $line =~ /<(\w(?:[^ \/>]+))(?:(?:(?:\s+)([^=>]+)\s*=\s*("|')((??{ +"[^$3]*"}))\3))\s*\/>/) { print ("1=" . $1. "\n2=" . $2 . "\n3=" . $3 . "\n4=" . $4 . "\n5=" +. $5 . "\n"); }

    In general, however, when you need to resort to using (?{...}) or (??{...}), you are trying too hard. Your code will likely be fragile and unmaintainable. See point 2 below.

    Side notes:

    1. Since double quotes interpolate (Quote and Quote like Operators), you could more simply write your print statement as:

      print ("1=$1\n2=$2\n3=$3\n4=$4\n5=$5\n");

    2. You probably shouldn't be parsing XML yourself as there are plenty of modules to do it for you (XML::Twig, XML::Simple, XML::Parser,...).

Re: PERL Regular expression
by Marshall (Abbot) on Sep 03, 2010 at 19:02 UTC
    This is a good reason not to try writing your own regex for complex, well known problems - using a std module is almost always better. your code formatting was botched, but I took a couple of guesses.

    I used YAPE::Regex::Explain to analyze what you did. See below. as for your question, \3 means not "\ or 3" in the set context that you have, not the value of the thing that $3 is.

    #!/usr/bin/perl -w use strict; use YAPE::Regex::Explain; my $line = '<log dest="calllog"> directory <value expr="callRecDirPath +"/>'; if ( $line =~ /<(\w(?:[^ \/>]+))(?:(?:(?:\s+)([^=>]+)\s*=\s*("|')([^\3 +]*)\3))\s*\/>/) { print "1=" . $1. "\n2=" . $2 . "\n3=" . $3 . "\n4=" . $4 . "\n5=" . $5 + . "\n"; } my $REx = q{$line =~ /<(\w(?:[^ \/>]+))(?:(?:(?:\s+)([^=>]+)\s*=\s*("| +')([^\3]*)\3))\s*\/>/}; my $exp = YAPE::Regex::Explain->new($REx)->explain; print $exp;
    Output of Analysis:
      Thank you very much for your explanation & timely help.
Re: PERL Regular expression
by dasgar (Priest) on Sep 03, 2010 at 18:45 UTC

    Are you sure that this ran on your system and that you copied it correctly? I tried running your code, but the if statement is evaluating to be false, which means nothing is printed.

    Here's a few observations.

    • In your regex, you have ("|'). However, there is no closing single or double quotes. Also, if you don't have a | in $line (as you currently don't), this part of the regex will cause it to evaluate to false.
    • In your regex, you have \3 twice. If you want to match the number 3, there is no need to escape it. If you want to match on a backslash, you'll to have it escaped (\\).

    At the moment, I think you have some stuff in your regex that is doing something other than what you're wanting to do. Of course, being no regex expert myself, I could be very wrong about that. Perhaps someone else with more knowledge and/or experience can shed more light on the situation.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://858775]
Approved by kennethk
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2018-06-24 07:19 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.