Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Extracting tagged data from a XML file

by theroninwins (Friar)
on Aug 31, 2004 at 08:50 UTC ( [id://387134]=perlquestion: print w/replies, xml ) Need Help??

theroninwins has asked for the wisdom of the Perl Monks concerning the following question:

I have the problem that i have a XML file and i need to get the IP adresses that at taged "<IP-ADDRESS>" or "<IP-NEIGHBOUR>" and write them to a txt file. How can i get perl to get that info out??

20040908 Edit by castaway: Changed title from 'XML2Perl'

Replies are listed 'Best First'.
Re: Extracting tagged data from a XML file
by davorg (Chancellor) on Aug 31, 2004 at 10:19 UTC

    Using the proper tools (an XML parser, not regular expressions), it becomes pretty simple:

    use XML::XPath; my $xp = XML::XPath->new(filename => '/path/to/file'); foreach ('IP-ADDRESS', 'IP-NEIGHBOUR') { foreach my $ip ($xp->findnodes("//$_")) { print $ip->findvalue('.'), "\n"; } }
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      I searched for XPath since i don't have it on my comp and i found a long read but how can i get it to work?? (same with XML::Simple.pm although i got that pm file but it is not really working)???
      The problem is still the same, it reads both the one taged "<IPAddress>" and the ones taged "<NEIGHBOURIPAddress>".
      148.192.116.253
      148.192.116.253
      148.192.116.253
      148.192.116.253
      148.192.116.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.137.253
      148.192.63.249
      148.192.63.249
      148.192.63.249
      148.192.63.249
      148.192.63.249
      148.192.63.249
      148.192.63.249
      148.192.63.249
      148.192.63.249
      148.192.63.249
      this is what sonme of them look like! many are the same. :-(

        It looks like that's what you asked for. I'm not sure what the problem is.

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

        ok here is another idea I had:

        # use XML::Dumper; open (XMLINPUT, "e:/topo.xml") or (print "Die XML Datei konnte nicht geöffnet werden!!\n". "Bitte Pfad und Dateiname mit Slashs angeben!!". "\nz.B. c:/data1.xml\n" and open (MACERROR, ">XMLError.err") and print MACERROR "Die XML Datei konnte nicht geöffnet werden!!\n +". "Bitte Pfad und Dateiname mit Slashs angeben!!". "\nz.B. c:/data1.xml\n" and close MACERROR and exit); # Zeiger erzeugen $zeiger_xml = new XML::Dumper->xml2pl(join("",<XMLINPUT>)); $zeiger_data = \@{$zeiger_xml->{'Data'}}; close XMLINPUT; open (CACHENEU, ">ips2.txt"); close CACHENEU; foreach $z_device (@{$zeiger_xml->{'Device'}}) { #if ( $z_device->{'SystemDescription'} =~ m/Version 12.0 +\(5\)/ #and $z_device->{'SystemDescription'} =~ m/C2900/ #and $z_device->{'SystemDescription'} !~ m/WS/) #{ open (CACHENEU, ">>ips2.txt"); foreach $z_add (@{$z_device->{'DeviceName'}}) { %PortHash = ( $z_device->{'DeviceName'} => $z_ip->{'IPAddr +ess'} ); printf CACHENEU ("%s\t\t%s\n",%PortHash); } close CACHENEU; #} } print "\nfertig Portliste erstellen!!\n" ;


        and this is the File I have to read from:

        <?xml version="1.0" encoding="UTF-8" ?> - <CMData> <CMServer>S03</CMServer> <CreatedAt>Tue Aug 24 09:09:41 GMT+02:00 2004</CreatedAt> <SchemaVersion>1.0</SchemaVersion> <Heading>Topology Data</Heading> - <Layer2Details> - <Device> <DeviceName>LWL-H91-CW-4-5-4</DeviceName> <IPAddress>148.192.59.254</IPAddress> <DeviceState>Reachable</DeviceState> <DeviceType>C2950G-24</DeviceType> - <Neighbors> - <Neighbor> <NeighborIPAddress>148.192.22.22</NeighborIPAddress> <NeighborDeviceType>C6506</NeighborDeviceType> <Link>Point to Point link</Link> <LocalPort>Gi0/2</LocalPort> <RemotePort>3/3</RemotePort> </Neighbor>


        Any idea on where I am mistaking??
Re: Extracting tagged data from a XML file
by reneeb (Chaplain) on Aug 31, 2004 at 10:14 UTC

      Doesn't that only catch top-level nodes in the file?

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        XML::Simple parses the whole file. With Data::Dumper you can display the datastructure!

        You can get the values for all nodes in the xml-file.
Re: Extracting tagged data from a XML file
by mirod (Canon) on Aug 31, 2004 at 15:06 UTC

    If you have installed XML::Twig you can use the xml_grep utility that comes with it:

    xml_grep -t -cond 'IP-ADDRESS' -cond 'IP-NEIGHBOUR'  test_xml_grep.xml | sort -u

    -t is the option that will give you only the text of the results, not the tags, and of course sort -u will them give you the list of unique IPs.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Extracting tagged data from a XML file
by theroninwins (Friar) on Sep 01, 2004 at 14:31 UTC
    OK I decided to try my luck with XML::Dumper and XML::Simple. They are easyly installed and I guess they just have to do. I got the code down to :
    #!/usr/bin/perl -w # use strict; use XML::Simple; use Data::Dumper; my $xmlfile = "e:/topo.xml"; my $ref = eval { XMLin($xmlfile) }; if ($@){ print "XML Read ERROR"; } else { foreach my $item (@{$ref->{Layer2Details}}){ print $item->{IPAddresse}, "\n"; } }

    and the Dumper gives this:
    $VAR1 = { 'SchemaVersion' => '1.0', 'Layer2Details' => { 'Device' => [ { 'DeviceName' => 'LWL-H91-CW +-4-5-4.bc.de.bic', 'DeviceType' => 'C2950G-24' +, 'IPAddress' => '148.192.59. +254', 'Neighbors' => { 'Neighbor' = +> [

    now I just have to know how to get over the Layer2Detail into the Device section. The rest I know. Any help on that?? Would really be great and thanks for taking that much time helping me already.
      foreach my $item ( @{$ref->{Layer2Details}{Device}} ) { print $item->{IPAddress}, "\n"; }
      --
      edan

        OK thanks for all of your help everyone and here it is the work I made all the tmie:
        #!/usr/bin/perl -w use Net::SNMP; use strict; use warnings; use diagnostics; use XML::Simple; use Data::Dumper; #open xml file my $xmlfile = "e:/topo.xml"; my $ref = eval { XMLin($xmlfile) }; #erase or creat the ipfile open ERASER, ">ipfile.txt"; close ERASE; #see if open worked if ($@) { print "XML Read ERROR"; } else { #go to IPAddress tag and read infos into file foreach my $item (@{$ref->{Layer2Details}->{Device}}){ my @ipliste = $item->{IPAddress}; my @sorted = @ipliste; open OUTIP, ">>ipfile.txt"; print OUTIP @sorted, "\n"; close OUTIP; } } #open ipfile open IPFILE, "ipfile.txt" or die "Can't get IPs - $!\n"; #Sets knots und community my $community = 'public'; my $ifIndex = '1.3.6.1.2.1.47.1.1.1.1.6'; my $ifDescr = '1.3.6.1.2.1.47.1.1.1.1.11'; my $ifDescr2 = '1.3.6.1.2.1.47.1.1.1.1.2'; #erase or create files open ERASER, ">serlist.txt"; close ERASE; open ERASER, ">desclist.txt"; close ERASE; open ERASER, ">finallist.txt"; close ERASE; while ( my $ip = <IPFILE> ) { chomp $ip; print "Got: $ip\n"; + #open session my ( $session, $error ) = Net::SNMP->session( -hostname => $ip, -community => $community, -port => 161 ); my $response; #goto index table and hash index if ( defined( $response = $session->get_table($ifIndex) ) ) { #get serialnumbers foreach my $index ( values %{$response} ) { my $this_desc = "$ifDescr.$index"; my $description; if ( defined( $description = $session->get_request($this_d +esc) ) ) { #print serialno. to file my @serial = values %{$description}; open OUTPUT1, ">>serlist.txt"; print OUTPUT1 @serial, "\n"; close OUTPUT1; } } #get description foreach my $index ( values %{$response} ) { my $this_desc = "$ifDescr2.$index"; my $description2; if ( defined( $description2 = $session->get_request($this_ +desc) ) ) { #print describtion to file my @desc = values %{$description2}; open OUTPUT2, ">>desclist.txt"; print OUTPUT2 @desc, "\n"; close OUTPUT2; } } #creat final file open(OUT, "serlist.txt"); open(OUT2, "desclist.txt"); my @outlist; while (<OUT>) { my %hash; my @temp = split(/;/,$_); $hash{'file1'} = $temp[0]; $hash{'file2'} = <OUT2>; #delete Return foreach (values %hash) { $_ =~ s/\n//g; } push(@outlist,\%hash); } close(OUT); #print to file open FINOUT, ">>finallist.txt"; print FINOUT "$ip\n"; foreach my $hashref (@outlist) { print FINOUT "$hashref->{'file1'};$hashref->{'file2'}\n"; } print FINOUT "\n"; close FINOUT; } #close session $session->close(); }

        It reads all IPAddress-tagged infos from an xml file andf then uses those to get infos from MIBs via SNMP, saves thjem in different files and then creats another file with the result in a certain order like a csv file so it can be automatically read be progrags like CiscoWorks 2000. Thanks again everybody for the help provided.
        Thabnks I got it out myself as well after some lucky tries :-) BTW you must put a -> between the Layer2Detail and the Device.
Re: Extracting tagged data from a XML file
by GaijinPunch (Pilgrim) on Aug 31, 2004 at 09:06 UTC
    Well, that's a pretty general question, but a multi-step process. You'd first need to open the file, run through line by line, and then grab the value within the tags. Assuming your XML file would look something like this "<IP-NEIGHBOUR>192.168.1.1</IP-NEIGHBOUR>" then the code would be something like this:
    #!/usr/bin/perl # use strict; use warnings; # open file for reading open ( XML, "/path/to/file" ) || die "Can't open file $!" while ( <XML> ) { # go through line by line $_ =~ ( /<IP-(ADDRESS|NEIGHBOUR)>(.+?)<\/IP.+?/ ) my $ip = $2; # IP is the value in 2nd parenthesis }
    The essence of the problem lines with in the regular expression -- I'd suggest some reading up on them if you want to get serious with data extraction. Of course, the regex might change based on your XML file.
      Howdy!

      You'd first need to open the file, run through line by line, and then grab the value within the tags.

      That's not reliable. What assures you that the open and close tags are on the same line? You cannot make that assumption.

      Far better is to use an XML parser of some sort.

      yours,
      Michael
      For some reason this works but when i leave the NEIGHBOUR out, it doesn't anymore how come ?? I know regular expretions and from my point of view "$_ =~ ( /<IPAddress>(.+?)<\/IP.+?/ );" should work fine. Where is the mistake?? (I am leaving the rest as it is. the problem is that i only need the IPAddress and not the NEIGHBOURIPAdress and that IPAddress string is in both so how can i get around that and only get the first one (otherwise I have all IPs in that list a thousand times). Sorry for writing that now I just noticed that
        Here is a naughty one liner to extract all occurances of IP addresses between tags <IP-ADDRESS> in any case and across lines. Very inefficient for a large file as it reads it all into memory. Change the word data to the name of your file.
        perl -le 'local$/;open F,data;$_=<F>;s/\n//g;while(/<ip-address>(.+?)< +\/ip/gi){print $1}'
        Here is the same but to grap IP addresses from either <IP-ADDRESS> or <IP_NEIGHBOUR> tags.
        perl -le 'local$/;open F,data;$_=<F>;s/\n//g;while(/<ip-(neigbour|addr +ess)>(.+?)<\/ip/gi){print $2}'

        update

        For some reason I got marked -1 on this, if anyone can explain what I did wrong here I'd love to know. I realise the code is naughty for eating the file in one gulp but if the file is small this can't do much harm and it makes for a very simple solution to the possible problem of the addresses being broken accross line breaks.

        Anyway, looking at this thread the OP looks to have changed his mind and not want to capture the <IP-NEIGHBOUR> addresses, as well as wanting only unique addresses returned, I will update this space soon with a version that is more mem friendly and possibly redeem myself in the eyes of the monastery.

        further update

        OK, I have had the error of my ways pointed out, thou shalt not parse XML with regexp. I shall stop sinning now, no further unholy code shall follow.
        R.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://387134]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 00:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found