Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

XML:Simple read tag value with regex

by filipebean (Novice)
on May 07, 2013 at 12:54 UTC ( #1032474=perlquestion: print w/replies, xml ) Need Help??
filipebean has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm using XML:Simple to read a configuration from a XML file. The code is working, but Im facing now a problem when it read the regex that includes an xml tag. xml file below:

Config file:

<sourcetype> <name>G1</name> <desc>group 1 to decode</desc> <rules> <rule>['^\d(.)','14']</rule> <rule>['^(<xyz>)']</rule> <rule>['^(</xyz>)']</rule> </rules> </sourcetype>


my $xml = new XML::Simple( KeyAttr=>[] ); my $data = $xml->XMLin( $config_file ); foreach my $sourcetype ( @{$data->{sourcetypes}{sourcetype}} ) { print " " . $sourcetype->{name} . "\t\t" . $sourcetype->{desc} . "\ +n"; }

When I run the script it complains:

Opening and ending tag mismatch: xyz line 10 and rule Opening and ending tag mismatch: rule line 11 and rule at /XML/LibXML/ line 64 at /5.8.4/XML/ line 362

Is it possible to have regex as a tag value like '^(<xyz>)' and read it as a string?

Thank you in advance, Best regards.

Replies are listed 'Best First'.
Re: XML:Simple read tag value with regex
by toolic (Bishop) on May 07, 2013 at 13:04 UTC
    Your Perl code is probably fine, but that is not valid XML syntax. Use XML entity references (something like this):
Re: XML:Simple read tag value with regex
by mirod (Canon) on May 07, 2013 at 13:19 UTC

    As said before, you can escape the < by using &lt; or you can use CDATA sections:

    <sourcetype> <name>G1</name> <desc>group 1 to decode</desc> <rules> <rule><![CDATA[['^\d(.)','14']]]></rule> <rule><![CDATA[['^(<xyz>)']]]></rule> <rule><![CDATA[['^(</xyz>)']]]></rule> </rules> </sourcetype>

    That's still ugly, but a little easier to read than the version with &lt;, and at least you can cut and paste code in the rule elements.

    The <![CDATA[...]]> construct prevents everything in the section from being parsed as XML. You still need it to be valid text (unicode by default) and not to include ]]>, but anything else is fine.

Re: XML:Simple read tag value with regex
by kcott (Chancellor) on May 07, 2013 at 13:34 UTC

    G'day filipebean,

    There's a number of special characters that may need to be escaped in XML content: "<" to "&lt;"; ">" to "&gt;"; and "&" to "&amp;" (see XML Predefined Entities). So,


    would become


    Given the readability issues, it might be better to use CDATA blocks (see XML CDATA Sections) to escape the entire regex, e.g.


    [Note: I haven't investigated how XML::Simple interacts with these constructs.]

    -- Ken

      Hi all,

      thank for your answers. I tried the CDATA as you suggest and it works perfectly :)

      best regards.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1032474]
Approved by toolic
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2017-05-29 23:31 GMT
Find Nodes?
    Voting Booth?