Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

I've been able to mostly avoid XML until today. We need to update hundreds of MS vs2010 project (XML) files automatically. Tedious and error-prone to do by hand, so I'd like to write a script to do it. I've prepared an illustrative cut-down example of such a script, which changes the directory "ReleaseDLL" to "ReleaseDLL32" in various places in the XML.

Since this is my first attempt to parse XML using Perl, I welcome any advice you may have to offer. In particular:

  • After some random googling, I chose to use XML::LibXML. Is that a wise choice?
  • Given that I want to make minor updates to many XML files, is the overall approach below ok? Is there a better approach?
  • I had a hell of a time getting XPath to work (see code below). And I don't really understand what I did with namespaces, though it does appear to work. Suggestions welcome.
  • The XPath query "PropertyGroup[contains(\@Condition,'$proj')]" is inelegant in that it selects the required PropertyGroup, then manually iterates through each element in the group. It seems better to select the required nodes directly as part of a more complicated XPath expression and avoid the iteration, but I have no clue how to write an XPath query to do that.

Here is an example (cut-down) project XML file to be updated, fred.vcxproj:

<?xml version="1.0" encoding="utf-8"?> <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schem +as.microsoft.com/developer/msbuild/2003"> <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug Tan +dem|x64'"> <OutDir>.\DebugTandem\</OutDir> <IntDir>.\DebugTandem\</IntDir> <TargetName>fred$(ProjectName)</TargetName> </PropertyGroup> <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release D +LL|Win32'"> <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> <IntDir>.\ReleaseDLL\</IntDir> <LinkIncremental>false</LinkIncremental> <TargetName>fred$(ProjectName)</TargetName> </PropertyGroup> <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release D +LL|x64'"> <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> <IntDir>.\ReleaseDLL\</IntDir> <LinkIncremental>false</LinkIncremental> <TargetName>fred$(ProjectName)</TargetName> </PropertyGroup> </Project>

Here is my cut-down test program, txml1.pl:

use strict; use warnings; use XML::LibXML; use XML::LibXML::XPathContext; sub read_file_contents { my $fname = shift; open( my $fh, '<', $fname ) or die "error: open '$fname': $!\n"; binmode $fh; local $/ = undef; # slurp mode my $s = <$fh>; close($fh); return $s; } sub write_file_contents { my ( $fname, $data ) = @_; my $overw = -e $fname ? " (overwriting)" : ""; print "creating '$fname'$overw..."; open( my $fh, '>', $fname ) or die "error: open '$fname': $!"; binmode($fh); print {$fh} $data or die "error: write '$fname': $!"; close($fh); print "done.\n"; } my $fname = shift or die "usage: $0 fname\n"; print "xml file : '$fname'\n"; my $xmlstring = read_file_contents($fname); # XXX: Hack for utf8 BOM. # my $UTF8_BOM = chr(0xef) . chr(0xbb) . chr(0xbf); my $UTF8_BOM = ""; # XXX: Without this damned billygates namespace I could not get XPath +to work. my $xpath_ns = 'billygates'; my $vs2010_ns = 'http://schemas.microsoft.com/developer/msbuild/2003'; my $outfile = 'fred.tmp'; my $proj = 'Release DLL|Win32'; my $targ = 'ReleaseDLL'; my $repl = 'ReleaseDLL32'; my $query = "PropertyGroup[contains(\@Condition,'$proj')]"; my $ns_query = "//$xpath_ns:$query"; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($xmlstring); my $xc = XML::LibXML::XPathContext->new( $doc->documentElement( +) ); $xc->registerNs( $xpath_ns => $vs2010_ns ); print "query : $ns_query:\n"; for my $q ( $xc->findnodes($ns_query) ) { print $q->nodeName(), ":\n"; for my $c ( $q->childNodes() ) { my $name = $c->nodeName(); my $val = $c->textContent(); print " ", ref($c), ":", $name, ":\n"; if ( defined($val) && $val =~ m{[/\\](?:$targ)[/\\]} ) { print " $name: val=$val: matches '$targ'\n"; for my $t ( $c->childNodes() ) { my $v = $t->data; print " ", ref($t), ":", $t->nodeName(), ":", $v, ":\n" +; print " old:", $v, ":\n"; $v =~ s{([/\\])$targ([/\\])}{$1$repl$2} or die "oops"; $t->setData($v); print " new:", $v, ":\n"; } } } } write_file_contents( $outfile, $UTF8_BOM . $doc->toString(0) );

An example run of this program seems to more-or-less work, as shown below:

$ perl txml1.pl fred.vcxproj xml file : 'fred.vcxproj' query : //billygates:PropertyGroup[contains(@Condition,'Release DL +L|Win32')]: PropertyGroup: XML::LibXML::Text:#text: XML::LibXML::Element:OutDir: OutDir: val=.\../../products/bin/ReleaseDLL\: matches 'ReleaseDLL' XML::LibXML::Text:#text:.\../../products/bin/ReleaseDLL\: old:.\../../products/bin/ReleaseDLL\: new:.\../../products/bin/ReleaseDLL32\: XML::LibXML::Text:#text: XML::LibXML::Element:IntDir: IntDir: val=.\ReleaseDLL\: matches 'ReleaseDLL' XML::LibXML::Text:#text:.\ReleaseDLL\: old:.\ReleaseDLL\: new:.\ReleaseDLL32\: XML::LibXML::Text:#text: XML::LibXML::Element:LinkIncremental: XML::LibXML::Text:#text: XML::LibXML::Element:TargetName: XML::LibXML::Text:#text: creating 'fred.tmp' (overwriting)...done. $ diff fred.vcxproj fred.tmp 2c2 < <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://sch +emas.microsoft.com/developer/msbuild/2003"> --- > <Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" + DefaultTargets="Build" ToolsVersion="4.0"> 9,10c9,10 < <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> < <IntDir>.\ReleaseDLL\</IntDir> --- > <OutDir>.\../../products/bin/ReleaseDLL32\</OutDir> > <IntDir>.\ReleaseDLL32\</IntDir>


In reply to Some questions from beginning user of XML::LibXML and XPath by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (9)
    As of 2015-07-07 23:45 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (93 votes), past polls