http://www.perlmonks.org?node_id=417802

esharris has asked for the wisdom of the Perl Monks concerning the following question:

I need a subroutine that takes an arbitrary string and returns that string in XML format. For example, it should change "IBM & Microsoft" into "IBM & Microsoft". The hypothetical subroutine would be simiar to CGI::escape, which returns a given string in HTML format. Does such a subroutine exist under CPAN? Though I looked around CPAN and used Google, I am having difficulty finding it. I figure there is no sense re-inventing the wheel.

Edited by Chady -- fixed &

Replies are listed 'Best First'.
Re: creating an XML string
by Joost (Canon) on Dec 28, 2004 at 18:03 UTC
Re: creating an XML string
by holli (Abbot) on Dec 28, 2004 at 18:18 UTC
    change "IBM & Microsoft" into "IBM & Microsoft"?
    i assume you mean change "IBM & Microsoft" into "IBM & Microsoft", or even &.

    if you know the codes of the chars you want to change, you can do something like the following:
    my %ent= ( "&" => 38, "<" => 60, ">" => 62, #... ); $string = "IBM & Microsoft > Sun\n"; print $string; for ( keys %ent ) { $string =~ s/$_/&#$ent{$_};/g } print $string; #prints IBM &#38; Microsoft &#62; Sun
    Another option is to put the string in question into a <![CDATA[...]]>-section.
      damn. too slow ;-)
Re: creating an XML string
by steves (Curate) on Dec 28, 2004 at 18:15 UTC

    HTML::Entities will do it for HTML character entities. XML entities could differ. If they're a subset, HTML::Entities lets you specify which should be converted.

Re: creating an XML string
by PreferredUserName (Pilgrim) on Dec 28, 2004 at 18:05 UTC
    Quick &amp; dirty:
    sub encode { join '', map { /[\w\d ]/ ? $_ : '&#' . ord() . ';' } split //, shi +ft; }
        Well, they don't call it "quick and dirty" because it's elegant and complete.
Re: creating an XML string
by steves (Curate) on Dec 29, 2004 at 16:12 UTC

    Regarding XML and entities, someone may benefit from some knowledge I picked up:

    • There is no need to declare numeric entities in order for them to parse okay with most (or maybe all) XML parsers. i.e., only named entities need to be declared. XML, of course, has three named entities pre-defined since they're so core: &amp;, &lt;, and &gt;.
    • You can pull in all standard HTML entities by using this sort of DOCTYPE declaration:
      <!DOCTYPE doc [ <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent"> %HTMLspecial; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent"> %HTMLsymbol; ]>

    I've found the above knowledge useful since so much of the XML I create is for the purpose of transmitting web based content, which includes HTML entities.