Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Some questions from beginning user of XML::LibXML and XPath

by Jim (Curate)
on Oct 16, 2012 at 16:27 UTC ( #999367=note: print w/replies, xml ) Need Help??


in reply to Some questions from beginning user of XML::LibXML and XPath

I would have used regular expression pattern matching for this seemingly trivial text substitution (insertion) problem. The formatting of the XML is quite regular and straightforward. Both the string you're matching and the string you're replacing (enhancing) it with are distinct and uncomplicated. You say you "had a hell of a time getting XPath to work." I wouldn't have had the patience to try.

You're explicitly handling both the input text and the output text as binary data rather than as Unicode text? Why?

Here's the operation reduced to a Unicode-conformant one-liner:

C:\Temp>perl -CiO -i.bak -pe "s{(?<=[/\\]ReleaseDLL)(?=[/\\])}{32} if +m{^\s*<(?:Out|Int)Dir>}" fred.vcxproj C:\Temp>diff fred.vcxproj.bak fred.vcxproj 9,10c9,10 < <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> < <IntDir>.\ReleaseDLL\</IntDir> --- > <OutDir>.\../../products/bin/ReleaseDLL32\</OutDir> > <IntDir>.\ReleaseDLL32\</IntDir> 15,16c15,16 < <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> < <IntDir>.\ReleaseDLL\</IntDir> --- > <OutDir>.\../../products/bin/ReleaseDLL32\</OutDir> > <IntDir>.\ReleaseDLL32\</IntDir> C:\Temp>od -h -N 3 fred.vcxproj 0000000000 EF BB BF 0000000003 C:\Temp>

Modify the anchoring regular expression patterns to taste.

Doing it this way avoids the needless and undesirable reordering of the attributes of the <Project> element—and a lot of other XML folderol besides. It also handles the input and output properly as Unicode text rather than as binary data and leaves the existing UTF-8 byte order mark intact.

Modifying this one-liner to support file and folder name globs (wildcards) is left as an exercise for the reader.

UPDATE:  With modern versions of Perl, you can use the special look-behind assertion \K to obviate the separate pattern match used to anchor the substitution (insertion) to just those lines that have <OutDir> and <IntDir> elements on them.

C:\>perl -CiO -i.bak -pe "INIT { @ARGV = <@ARGV> } s{^\s*<(?:Out|Int)D +ir>.+?[/\\]ReleaseDLL\K}{32}" */*.vcxproj C:\>diff Temp\fred.vcxproj.bak Temp\fred.vcxproj 9,10c9,10 < <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> < <IntDir>.\ReleaseDLL\</IntDir> --- > <OutDir>.\../../products/bin/ReleaseDLL32\</OutDir> > <IntDir>.\ReleaseDLL32\</IntDir> 15,16c15,16 < <OutDir>.\../../products/bin/ReleaseDLL\</OutDir> < <IntDir>.\ReleaseDLL\</IntDir> --- > <OutDir>.\../../products/bin/ReleaseDLL32\</OutDir> > <IntDir>.\ReleaseDLL32\</IntDir> C:\>

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://999367]
help
Chatterbox?
LanX has to go/
[ambrus]: I hope we didn't mess up the spam filter rules again.
[ambrus]: Our spam filter rules disallow links to certain domains, and some suspicious pharses that have appeared in previous spam advertising cheap online whatevers.
[LanX]: some servers were lagging today, so I suppose the root cause
[LanX]: ambrus no recent patches
[Petroza]: no i haven't posted anything before. It was a more or less long question with a specific issue. I did post a title and the links i added where only part of the element i was searching within the code (so no purpose other than the question itself).
[1nickt]: Petroza can you go back in your browser to the preview screen, redact the links, and try to submit again?
[LanX]: copy and paste the text into your Petroza's scratchpad please
[Petroza]: yes I'll do that. instead of the url I'll write "link"

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (11)
As of 2017-10-17 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My fridge is mostly full of:

















    Results (233 votes). Check out past polls.

    Notices?