Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Extracting tagged data from a XML file

by GaijinPunch (Pilgrim)
on Aug 31, 2004 at 09:06 UTC ( [id://387139]=note: print w/replies, xml ) Need Help??


in reply to Extracting tagged data from a XML file

Well, that's a pretty general question, but a multi-step process. You'd first need to open the file, run through line by line, and then grab the value within the tags. Assuming your XML file would look something like this "<IP-NEIGHBOUR>192.168.1.1</IP-NEIGHBOUR>" then the code would be something like this:
#!/usr/bin/perl # use strict; use warnings; # open file for reading open ( XML, "/path/to/file" ) || die "Can't open file $!" while ( <XML> ) { # go through line by line $_ =~ ( /<IP-(ADDRESS|NEIGHBOUR)>(.+?)<\/IP.+?/ ) my $ip = $2; # IP is the value in 2nd parenthesis }
The essence of the problem lines with in the regular expression -- I'd suggest some reading up on them if you want to get serious with data extraction. Of course, the regex might change based on your XML file.

Replies are listed 'Best First'.
Re^2: Extracting tagged data from a XML file
by herveus (Prior) on Aug 31, 2004 at 11:52 UTC
    Howdy!

    You'd first need to open the file, run through line by line, and then grab the value within the tags.

    That's not reliable. What assures you that the open and close tags are on the same line? You cannot make that assumption.

    Far better is to use an XML parser of some sort.

    yours,
    Michael
Re^2: Extracting tagged data from a XML file
by theroninwins (Friar) on Aug 31, 2004 at 09:45 UTC
    For some reason this works but when i leave the NEIGHBOUR out, it doesn't anymore how come ?? I know regular expretions and from my point of view "$_ =~ ( /<IPAddress>(.+?)<\/IP.+?/ );" should work fine. Where is the mistake?? (I am leaving the rest as it is. the problem is that i only need the IPAddress and not the NEIGHBOURIPAdress and that IPAddress string is in both so how can i get around that and only get the first one (otherwise I have all IPs in that list a thousand times). Sorry for writing that now I just noticed that
      Here is a naughty one liner to extract all occurances of IP addresses between tags <IP-ADDRESS> in any case and across lines. Very inefficient for a large file as it reads it all into memory. Change the word data to the name of your file.
      perl -le 'local$/;open F,data;$_=<F>;s/\n//g;while(/<ip-address>(.+?)< +\/ip/gi){print $1}'
      Here is the same but to grap IP addresses from either <IP-ADDRESS> or <IP_NEIGHBOUR> tags.
      perl -le 'local$/;open F,data;$_=<F>;s/\n//g;while(/<ip-(neigbour|addr +ess)>(.+?)<\/ip/gi){print $2}'

      update

      For some reason I got marked -1 on this, if anyone can explain what I did wrong here I'd love to know. I realise the code is naughty for eating the file in one gulp but if the file is small this can't do much harm and it makes for a very simple solution to the possible problem of the addresses being broken accross line breaks.

      Anyway, looking at this thread the OP looks to have changed his mind and not want to capture the <IP-NEIGHBOUR> addresses, as well as wanting only unique addresses returned, I will update this space soon with a version that is more mem friendly and possibly redeem myself in the eyes of the monastery.

      further update

      OK, I have had the error of my ways pointed out, thou shalt not parse XML with regexp. I shall stop sinning now, no further unholy code shall follow.
      R.
        if anyone can explain what I did wrong here I'd love to know

        You tried to parse XML with regular expressions.

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://387139]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2024-04-25 15:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found