dirtdog has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm trying to strip out the value that is between <CorpActnEvtId>12345678</CorpActnEvtId>. The problem is there is a string of tags on each line and i don't know what to use to split on CorpActnEvtId to get the 12345678 value in this example.

here is a sample of one record:

<NtfctnTp>REPL</NtfctnTp><PrcgSts><Cd><EvtCmpltnsSts>COMP</EvtCmpltnsS +ts><EvtConfSts>HELLO</EvtConfSts></Cd></PrcgSts></NtfctnGnlInf><PrvsN +tfctnId><Id>92190602</Id></PrvsNtfctnId> <CorpActnGnlInf><CorpActnEvtId>12345678</CorpActnEvtId><OffclCorpActnE +vtId>USJJT</OffclCorpActnEvtId><EvtPrcgTp><Cd>DISN</Cd></EvtPrcgTp><E +vtTp><Cd>DVCA</Cd></EvtTp><MndtryVlntryEvtTp>< Cd>MAND</Cd></MndtryVlntryEvtTp><UndrlygScty><

Just a tip on what how to get that value between the CorpActnEvtId tags would be a great help. thanks DD

Replies are listed 'Best First'.
Re: Strip out value between xml tags
by toolic (Bishop) on Mar 18, 2015 at 18:12 UTC
    use warnings; use strict; use XML::Twig; my $xml = <<XML; <top> <PrvsNtfctnId><Id>92190602</Id></PrvsNtfctnId> <CorpActnGnlInf> <CorpActnEvtId>12345678</CorpActnEvtId> <OffclCorpActnEvtId>USJJT</OffclCorpActnEvtId> </CorpActnGnlInf> </top> XML my $twig = XML::Twig->new( twig_handlers => { CorpActnEvtId => sub { print $_->text(), "\n" } + } ); $twig->parse($xml); __END__ 12345678
Re: Strip out value between xml tags
by choroba (Archbishop) on Mar 18, 2015 at 18:37 UTC
    Using XML::XSH2, a wrapper around XML::LibXML:
    open file.xml ; echo (//CorpActnEvtId) ;
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Strip out value between xml tags
by sauoq (Abbot) on Mar 18, 2015 at 18:42 UTC
    I'm trying to strip out the value that is between <CorpActnEvtId>12345678</CorpActnEvtId>.

    By which, I think you mean you want to retrieve the value between the CorpActnEvtid opening and closing tags... (Strip out, to me, usually menas you want to delete it, and leave the rest.)

    Parsing it with one of the XML modules is the right answer.

    A quick answer, if this is a one-off thing or something, is to use something like

    $junk =~ /<CorpActnEvtId>(\d+)<\/CorpActnEvtId>/; my $id = $1;

    I might do it that way if I had a pile of data I had to munge up just once, and if I didn't need much more from that snippet of XML. But even so, it's making some assumptions (like the ID is always just digits and there is no whitespace within the tag.) This is terribly brittle and, if you have to do it once, you'll probably have to do it again. So I'm not recommending you do it this way, particularly as you weren't able to work it out on your own.

    That said, here's some rope; it's up to you not to hang yourself with it.

    "My two cents aren't worth a dime.";

      Thanks! I ended up using the following sed command:

      grep CorpActnEvtId | sed -e 's/.*<CorpActnEvtId>//' | sed -e 's/<\/Cor +pActnEvtId.*//' > file.out

      I apologize for the late response. Something urgently came up that day and i'm just now able to reply.