Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

get text from node - XML::LibXML

by corfuitl (Sexton)
on Jul 20, 2018 at 09:36 UTC ( #1218873=perlquestion: print w/replies, xml ) Need Help??

corfuitl has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I have the XML file

<seg><foo mid="0" mtype="seg"><g id="1">Need to export this text</g></foo></seg>

and I would like to get the text with its XML tags (i.e. <g id="1">Need to export this text</g>

for my $foo ($seg->findnodes('foo')) { my $mid = ($foo->findvalue('@mid')); my $mrktext = ($foo->findnodes('text()')); print "$mid $mrktext\n"; }

I use this but it doesn't export any text.

Replies are listed 'Best First'.
Re: get text from node - XML::LibXML
by hippo (Chancellor) on Jul 20, 2018 at 09:49 UTC

    Here is an SSCCE:

    use strict; use warnings; use XML::LibXML; use Test::More tests => 1; my $in = '<seg><foo mid="0" mtype="seg"><g id="1">Need to export this +text</g></foo></seg>'; my $want = '<g id="1">Need to export this text</g>'; my $xml = XML::LibXML->load_xml (string => $in); my $have = $xml->getElementsByTagName ('g')->shift->toString; is ($have, $want);

      Hi,

      Thank you for your help. I gave it a try and it works when it has tags. However, there are cases where the extracted text doesn't contain tags (i.e. <seg><foo mid="0" mtype="seg">Need to export this text</foo></seg>.

      i used this one my $mrktext = $mrk->getElementsByTagName ('*')->shift->toString;

      .

        What is your full code? What output do you get? What do you expect?

        Unless you tell us that, we can't figure out where you're going wrong.

        Here's the code for an example I created:

        use strict; my $str = q{<seg><foo mid="0" mtype="seg"><g id="1">Need to export thi +s text</g></foo></seg>}; my $xml = XML::LibXML->load_xml(string => $str); print $xml->getElementsByTagName("g")->shift->toString, "\n";

        The output is:

        <g id="1">Need to export this text</g>

        Which is exactly what I'd expect it to be.

        What is your full code? What output do you get? What do you expect?

Re: get text from node - XML::LibXML
by marto (Archbishop) on Jul 20, 2018 at 09:39 UTC

    As a side note you keep posting things in the wrong section of the forum. Please read and understand Where should I post X?, which is displayed each time you post.

Re: get text from node - XML::LibXML
by tangent (Vicar) on Jul 21, 2018 at 00:17 UTC
    If I understand correctly you want to extract the text and its enclosing tag if it has one, and just the text if it doesn't. This might help:
    use XML::LibXML; my $xml = q| <seg> <foo mid="0" mtype="seg"> <g id="1">Need to export this text</g> </foo> <foo mid="1" mtype="seg"> Need to export this text also </foo> </seg>|; my $doc = XML::LibXML->load_xml(string => $xml); my @foos = $doc->findnodes('//foo'); for my $foo (@foos) { my $mid = $foo->getAttribute('mid'); print "mid: $mid "; my @childnodes = $foo->childNodes(); if (@childnodes) { for my $node (@childnodes) { print $node->toString, "\n"; } } else { print $foo->textContent, "\n"; } }
    Output:
    mid: 0 <g id="1">Need to export this text</g> mid: 1 Need to export this text also
    You may need to trim leading and trailing whitespace if your XML contains it.
Re: get text from node - XML::LibXML
by ikegami (Pope) on Jul 21, 2018 at 16:26 UTC

    Problem #1: You ask for the foo nodes that are children of the document. There are no such nodes.

    $doc->findnodes('foo')
    should be
    $doc->findnodes('seg/foo')
    or
    $doc->findnodes('/seg/foo')

    Problem #2: You ask for the text children of the foo element, but it doesn't have any. It's not even text you want!

    $foo->findnodes('text()')
    should be
    join('', map { $_->toString() } $foo_node->childNodes)
    which can be simplified to
    join('', $foo_node->childNodes)

    Also,

    $foo->findvalue('@mid')

    is better written as

    $foo->getAttribute('mid')

    So, we get

    use strict; use warnings qw( all ); use feature qw( say ); use XML::LibXML qw( ); my $xml = <<'__EOS__'; <seg><foo mid="0" mtype="seg"><g id="1">Need to export this text</g></ +foo></seg> __EOS__ my $doc = XML::LibXML->new->parse_string($xml); for my $foo_node ($doc->findnodes('/seg/foo')) { my $mid = $foo_node->getAttribute('mid'); my $inner_xml = join('', $foo_node->childNodes); say "$mid $inner_xml"; }
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1218873]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2020-01-29 00:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?