Thanks. Working with parent elements containing the target element was the way to go, rather than aiming directly at the target element itself. See the first handler below.
There are a lot of handlers in this script, but here are a pair relevant to my original question:
my $xml = XML::Twig->new(
pretty_print => 'nsgmls', # nsgmls for parsability
output_encoding => 'UTF-8',
twig_roots => { 'office:body' => 1 },
twig_handlers =>
{
# link anchors (text:boomark) must be handled before
# processing the internal links
'*[text:bookmark]' => \&handler_bookmark,
. . .
$xml = XML::Twig->new(
pretty_print => 'nsgmls',
empty_tags => 'html',
output_encoding => 'UTF-8',
twig_roots => { 'office:body' => 1 },
twig_handlers =>
{
# links (text:a) must be handled separately from link targets
'text:a' => \&handler_links,
. . .
sub handler_bookmark {
my ($twig, $bookmark)= @_;
my @bmk = $bookmark->children('text:bookmark');
foreach my $bk (@bmk) {
my $l = $bk->trimmed_text;
my $t = $l;
$t =~ s/\s/_/g;
my $anchor = $bk->att('text:name');
$bookmarks{$anchor}{'label'} = $l;
$bookmarks{$anchor}{'target'} = $t;
$bk->set_text("\n { ".$anchor.' }');
$bk->parent->merge($bk);
}
}
sub handler_links {
my ($twig, $link)= @_;
my $href = $link->att('xlink:href');
$href =~ s/^\#//;
my $l = $bookmarks{$href}{'label'};
my $t = $bookmarks{$href}{'target'};
if (! $l) {
$l = $link->trimmed_text;
$link->set_text("[$href $l]\n");
} else {
$link->set_text("[$t $l]\n");
}
$link->parent->merge($link);
}
. . .
These two handler subroutines are each used in separate parsing pass, for a total of two passes. Strangely, two parsings seems to be faster than one pass with all the handlers in a single object. The first pass collects a hash of link targets and their labels. The second pass applies those to the links pointing at those targets.
|