Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

XML:Simple read tag value with regex

by filipebean (Novice)
on May 07, 2013 at 12:54 UTC ( [id://1032474]=perlquestion: print w/replies, xml ) Need Help??

filipebean has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm using XML:Simple to read a configuration from a XML file. The code is working, but I’m facing now a problem when it read the regex that includes an xml tag. xml file below:

Config file:

<sourcetype> <name>G1</name> <desc>group 1 to decode</desc> <rules> <rule>['^\d(.)','14']</rule> <rule>['^(<xyz>)']</rule> <rule>['^(</xyz>)']</rule> </rules> </sourcetype>

Script:

my $xml = new XML::Simple( KeyAttr=>[] ); my $data = $xml->XMLin( $config_file ); foreach my $sourcetype ( @{$data->{sourcetypes}{sourcetype}} ) { print " " . $sourcetype->{name} . "\t\t" . $sourcetype->{desc} . "\ +n"; }

When I run the script it complains:

Opening and ending tag mismatch: xyz line 10 and rule Opening and ending tag mismatch: rule line 11 and rule at …/XML/LibXML/SAX.pm line 64 at …/5.8.4/XML/Simple.pm line 362

Is it possible to have regex as a tag value like “'^(<xyz>)'” and read it as a string?

Thank you in advance, Best regards.

Replies are listed 'Best First'.
Re: XML:Simple read tag value with regex
by toolic (Bishop) on May 07, 2013 at 13:04 UTC
    Your Perl code is probably fine, but that is not valid XML syntax. Use XML entity references (something like this):
    <rule>['^(&lt;xyz&gt;)']</rule>
Re: XML:Simple read tag value with regex
by mirod (Canon) on May 07, 2013 at 13:19 UTC

    As said before, you can escape the < by using &lt; or you can use CDATA sections:

    <sourcetype> <name>G1</name> <desc>group 1 to decode</desc> <rules> <rule><![CDATA[['^\d(.)','14']]]></rule> <rule><![CDATA[['^(<xyz>)']]]></rule> <rule><![CDATA[['^(</xyz>)']]]></rule> </rules> </sourcetype>

    That's still ugly, but a little easier to read than the version with &lt;, and at least you can cut and paste code in the rule elements.

    The <![CDATA[...]]> construct prevents everything in the section from being parsed as XML. You still need it to be valid text (unicode by default) and not to include ]]>, but anything else is fine.

Re: XML:Simple read tag value with regex
by kcott (Archbishop) on May 07, 2013 at 13:34 UTC

    G'day filipebean,

    There's a number of special characters that may need to be escaped in XML content: "<" to "&lt;"; ">" to "&gt;"; and "&" to "&amp;" (see XML Predefined Entities). So,

    <rule>['^(<xyz>)']</rule>

    would become

    <rule>['^(&lt;xyz&gt;)']</rule>

    Given the readability issues, it might be better to use CDATA blocks (see XML CDATA Sections) to escape the entire regex, e.g.

    <rule><![CDATA[['^(<xyz>)']]]></rule>

    [Note: I haven't investigated how XML::Simple interacts with these constructs.]

    -- Ken

      Hi all,

      thank for your answers. I tried the CDATA as you suggest and it works perfectly :)

      best regards.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1032474]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-03-29 12:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found