Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

I've been able to mostly avoid XML until today. We need to update hundreds of MS vs2010 project (XML) files automatically. Tedious and error-prone to do by hand, so I'd like to write a script to do it. I've prepared an illustrative cut-down example of such a script, which changes the directory "ReleaseDLL" to "ReleaseDLL32" in various places in the XML.

Since this is my first attempt to parse XML using Perl, I welcome any advice you may have to offer. In particular:

  • After some random googling, I chose to use XML::LibXML. Is that a wise choice?
  • Given that I want to make minor updates to many XML files, is the overall approach below ok? Is there a better approach?
  • I had a hell of a time getting XPath to work (see code below). And I don't really understand what I did with namespaces, though it does appear to work. Suggestions welcome.
  • The XPath query "PropertyGroup[contains(\@Condition,'$proj')]" is inelegant in that it selects the required PropertyGroup, then manually iterates through each element in the group. It seems better to select the required nodes directly as part of a more complicated XPath expression and avoid the iteration, but I have no clue how to write an XPath query to do that.

Here is an example (cut-down) project XML file to be updated, fred.vcxproj:

<?xml version="1.0" encoding="utf-8"?> <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schem +as.microsoft.com/developer/msbuild/2003"> <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug Tan +dem|x64'"> <OutDir>.\DebugTandem\</OutDir> <IntDir>.\DebugTandem\</IntDir> <TargetName>fred$(ProjectName)</TargetName> </PropertyGroup> <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release D +LL|Win32'"> <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> <IntDir>.\ReleaseDLL\</IntDir> <LinkIncremental>false</LinkIncremental> <TargetName>fred$(ProjectName)</TargetName> </PropertyGroup> <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release D +LL|x64'"> <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> <IntDir>.\ReleaseDLL\</IntDir> <LinkIncremental>false</LinkIncremental> <TargetName>fred$(ProjectName)</TargetName> </PropertyGroup> </Project>

Here is my cut-down test program, txml1.pl:

use strict; use warnings; use XML::LibXML; use XML::LibXML::XPathContext; sub read_file_contents { my $fname = shift; open( my $fh, '<', $fname ) or die "error: open '$fname': $!\n"; binmode $fh; local $/ = undef; # slurp mode my $s = <$fh>; close($fh); return $s; } sub write_file_contents { my ( $fname, $data ) = @_; my $overw = -e $fname ? " (overwriting)" : ""; print "creating '$fname'$overw..."; open( my $fh, '>', $fname ) or die "error: open '$fname': $!"; binmode($fh); print {$fh} $data or die "error: write '$fname': $!"; close($fh); print "done.\n"; } my $fname = shift or die "usage: $0 fname\n"; print "xml file : '$fname'\n"; my $xmlstring = read_file_contents($fname); # XXX: Hack for utf8 BOM. # my $UTF8_BOM = chr(0xef) . chr(0xbb) . chr(0xbf); my $UTF8_BOM = ""; # XXX: Without this damned billygates namespace I could not get XPath +to work. my $xpath_ns = 'billygates'; my $vs2010_ns = 'http://schemas.microsoft.com/developer/msbuild/2003'; my $outfile = 'fred.tmp'; my $proj = 'Release DLL|Win32'; my $targ = 'ReleaseDLL'; my $repl = 'ReleaseDLL32'; my $query = "PropertyGroup[contains(\@Condition,'$proj')]"; my $ns_query = "//$xpath_ns:$query"; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($xmlstring); my $xc = XML::LibXML::XPathContext->new( $doc->documentElement( +) ); $xc->registerNs( $xpath_ns => $vs2010_ns ); print "query : $ns_query:\n"; for my $q ( $xc->findnodes($ns_query) ) { print $q->nodeName(), ":\n"; for my $c ( $q->childNodes() ) { my $name = $c->nodeName(); my $val = $c->textContent(); print " ", ref($c), ":", $name, ":\n"; if ( defined($val) && $val =~ m{[/\\](?:$targ)[/\\]} ) { print " $name: val=$val: matches '$targ'\n"; for my $t ( $c->childNodes() ) { my $v = $t->data; print " ", ref($t), ":", $t->nodeName(), ":", $v, ":\n" +; print " old:", $v, ":\n"; $v =~ s{([/\\])$targ([/\\])}{$1$repl$2} or die "oops"; $t->setData($v); print " new:", $v, ":\n"; } } } } write_file_contents( $outfile, $UTF8_BOM . $doc->toString(0) );

An example run of this program seems to more-or-less work, as shown below:

$ perl txml1.pl fred.vcxproj xml file : 'fred.vcxproj' query : //billygates:PropertyGroup[contains(@Condition,'Release DL +L|Win32')]: PropertyGroup: XML::LibXML::Text:#text: XML::LibXML::Element:OutDir: OutDir: val=.\../../products/bin/ReleaseDLL\: matches 'ReleaseDLL' XML::LibXML::Text:#text:.\../../products/bin/ReleaseDLL\: old:.\../../products/bin/ReleaseDLL\: new:.\../../products/bin/ReleaseDLL32\: XML::LibXML::Text:#text: XML::LibXML::Element:IntDir: IntDir: val=.\ReleaseDLL\: matches 'ReleaseDLL' XML::LibXML::Text:#text:.\ReleaseDLL\: old:.\ReleaseDLL\: new:.\ReleaseDLL32\: XML::LibXML::Text:#text: XML::LibXML::Element:LinkIncremental: XML::LibXML::Text:#text: XML::LibXML::Element:TargetName: XML::LibXML::Text:#text: creating 'fred.tmp' (overwriting)...done. $ diff fred.vcxproj fred.tmp 2c2 < <Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://sch +emas.microsoft.com/developer/msbuild/2003"> --- > <Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" + DefaultTargets="Build" ToolsVersion="4.0"> 9,10c9,10 < <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> < <IntDir>.\ReleaseDLL\</IntDir> --- > <OutDir>.\../../products/bin/ReleaseDLL32\</OutDir> > <IntDir>.\ReleaseDLL32\</IntDir>


In reply to Some questions from beginning user of XML::LibXML and XPath by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (5)
    As of 2014-12-19 04:19 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (70 votes), past polls