Applying XSL stylesheet specified in XML file to the XML

by blm (Hermit)
on Mar 24, 2009
blm has asked for the wisdom of the Perl Monks concerning the following question:


I have an xml file that I get using WWW::Mechanise and it is XML. It starts with:

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="/foo/test.xsl"?>

I want to parse the xml with XML::LibXML, find the xsl file url, retrieve it and apply it. At least that is what I think if have to do. I don't know how to get XML::LibXML to give me the declarations at the top (ie the <? ?> things).

All I really want is the HTML that results from applying the XSL to the XML. (I know, all you really want is a pony but this is about my question at the moment ;-) )

Thanks for any and all help. I am not fixed on using XML::LibXML so I would be interested in any other useful modules.

Re: Applying XSL stylesheet specified in XML file to the XML
by dHarry on Mar 24, 2009

    You need libxslt for that (in a libxml context). There are alternatives of course, see CPAN. I'm not too familiar with those alternatives. My personal favorite is Xalan.


      Hi, Thanks for the reply. I realized that I need libxslt. But unless I am missing something I don't see how to pull the xsl uri out of the xml and feed it to libxslt (XML::LibXSLT). (Maybe I just need to grep for it.) That is my problem. Can you show me some code?

      Here is my code:
      use lib qw|/home/blm/perl/lib|; use strict; use WWW::Mechanize; use XML::LibXML; use XML::LibXSLT; my $mech = WWW::Mechanize->new(agent => 'Mozilla/5.0 (X11; U; Linux i6 +86; en-US;+ rv: Gecko/2008070206 Firefox/3.0.1' ); my $url = ''; $mech->delete_header('accept-encoding'); $mech->get($url); $mech->update_html($mech->content()); print $mech->content; my $parser = XML::LibXML->new(); my $style_parser = XML::LibXML->new(); my $xslt = XML::LibXSLT->new(); my $doc = $parser->parse_string($mech->content()); print $doc->toString(); my $stylesheet_location = ***Here is my problem*** $mech->get($stylesheet_location); my $stylesheet_string = $mech->content(); my $styledoc = $style_parser->parse_string($stylesheet_string); my $stylesheet = $xslt->parse_stylesheet($styledoc); my $results = $xslt->transform($doc); print $results;

        Ah, I see. And I have to disappoint you, I use XML::Twig for XML processing in Perl and my own tools for XSLT stuff. I would not grep for it, instead there must be more XML-ish way of doing things. After all it's just a Node of a specific type, i.e. NodeType 'processing-instruction'. So I imagine parsing the file and retrieving the information should do the trick, i.e. walk the DOM tree. Suddenly the grepping doesn't sound so bad anymore;) Another option is to use SAX, I see a processingInstructionSAXFunc in the libxml2 API. However there is another Perl module that might come in handy: XML::LibXML::PI can't you do a getData?

        Mind you, in "my" environment it's as simple as one method call: getAssociatedStylesheet()!

Re: Applying XSL stylesheet specified in XML file to the XML
by ForgotPasswordAgain on Mar 25, 2009

    I don't know how to tell what href is relative to, but even just to get the value of href is a bit gimpy for "processing instructions", which is what <? ... ?> are.

    #!/usr/bin/perl -w use strict; use XML::LibXML; my $parser = XML::LibXML->new; my $doc = $parser->parse_string(<<'EOX'); <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href ='abc"efg'?> <_/> EOX foreach my $node ($doc->findnodes('//processing-instruction()')) { my $name = $node->nodeName; if ($name eq 'xml-stylesheet') { # getData is a string like q{type="text/xsl" href="/test.xsl"} # which is what makes it annoying my $attr_str = $node->getData; # manually parse the string like href='abc"efg'; # there might be a better way of doing this $attr_str =~ m{href\s*=\s*(['"])([^\1]+)\1}; my $href = defined $2 ? $2 : ''; print "$name href: >>>$href<<<\n"; } }

