Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^3: Twig Mixed Content Child Text Replace Issues

by mirod (Canon)
on Feb 08, 2011 at 16:18 UTC ( #887006=note: print w/replies, xml ) Need Help??

in reply to Re^2: Twig Mixed Content Child Text Replace Issues
in thread Twig Mixed Content Child Text Replace Issues

The fact that you want the user input for each substitution makes the problem a _lot_ more tricky. Otherwise you could simply use subs_text as in my previous answer.

In this case you should do the substitution on the text of the '#TEXT' (or '#PCDATA') children of the paragraph. But then what happens if the text contains twice the string you want to replace, and you want only to replace the second one? The logic becomes quite a bit more complex. Of the top of my head I would modify the regexp, to let it skip the appropriate number of occurrences of the string to replace.

A non interactive version that you could use as a basis would be:

#!/usr/bin/perl use strict; use warnings; use Test::More tests => 1; use XML::Twig; my $doc = '<paragraph> Some <bold>text</bold> here which may be any <b +old>length</bold> and <bold>contain</bold> a number of child tags.</p +aragraph>'; my $exp = '<paragraph> Some <bold>text</bold> here which may be any <b +old>length</bold> and <bold>contain</bold> a quantity of child tags.< +/paragraph>'; # I got a little fancy here to allow several keywords to replace # the keywords are grouped in a regexp, sorted by inverse length so th +e alternation works properly my $replace = { number => 'quantity' }; my $keywords= join( '|', map { "\Q$_\E" } sort { length$b <=> length $ +a } keys %$replace); my $t=XML::Twig->new( twig_roots => { paragraph => \&subs_word })->par +se( $doc); is( $t->sprint, $exp, 'one change') ; exit; sub subs_word { my( $t, $para)= @_; foreach my $text_elt ($para->children( '#TEXT')) { my $text= $text_elt->text; if( $text_elt->text=~ m{\b($keywords)\b}) { $text=~ s{\b($keywords)\b}{$replace->{$1}}g; $text_elt->set_text( $text); } }

Replies are listed 'Best First'.
Re^4: Twig Mixed Content Child Text Replace Issues
by unknown_varmit (Initiate) on Feb 08, 2011 at 22:18 UTC
    Thanks mirod, your advice is spot on.

    I have modified my code (using much less-elegant methods) to prompt the user for each #text match. It's working beautifully, but will take me a bit more work to get to a regex solution as clean as yours.

    Twig is turning out to be a very useful tool.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://887006]
[erix]: Henry VIII has a lot to answer for :P
[LanX]: erix sure next time you miss a conference nearby we drop by ...
[LanX]: did you even go to the Dutch Perl Workshop in Utrecht?
[erix]: haha ok. I'm going to Glasgow though, and make a long road trip round the isles
[erix]: [Lanx} not the last time but the one before (there have been two)
[LanX]: when did you move to Utrecht, you used to live in a windmill on a dyke walking around in wooden shoes. ..
[LanX]: ... did you lie to me? OO
erix wouldn't mind a nice old windmill to live in... Mueller/Miller/ Molenaar

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2017-12-15 16:42 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (439 votes). Check out past polls.