Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Best Way to parse this XML?

by sirius98 (Novice)
on Mar 14, 2005 at 09:04 UTC ( #439201=perlquestion: print w/ replies, xml ) Need Help??
sirius98 has asked for the wisdom of the Perl Monks concerning the following question:

The XML is below, i need to read all the entries under <CasualAdvertiser> and not <NewspaperAdvertiser>. Ive looked at XML::Parser but i dont think i will be able to do things like not include the <NewspaperAdvertiser> results, well not in a good way. So i was wondering what parser to use? also if you could give me a little run down on how to use it or where to find a god example of its use. Thanks (the format didnt come out exactly right but it is correctly formatted XML) Pete
<?xml version="1.0" standalone="yes" ?> - <BCCFeed> - <CasualAdvertiser> <AdvertisementID>1580590</AdvertisementID> <FirstName /> <LastName /> <EveningPhone /> <MobilePhone /> <Make>MERCEDES-BENZ</Make> <Family>E220</Family> <ModelVariant>SPECIAL-EDITION</ModelVariant> <Series /> <BodyType>4D SEDAN</BodyType> <SeriesYear>1994</SeriesYear> <Rego>ORV215</Rego> <Price>21500.0000</Price> <Kilometres>175000</Kilometres> <Colour>Almandine</Colour> <InteriorColour /> <Transmission>A</Transmission> <EngineSize>2.2</EngineSize> <Fuel /> <FeaturesVerbose>Air Conditioning, Power Steering, ABS Braking, Powe +r Windows, Cruise Control, Towbar, Air Bags, CD Player, Sunroof, Meta +lic Paint, Full service history</FeaturesVerbose> <Condition>excel</Condition> <Comments>Garaged, leather trim Must sell</Comments> <RegoExpire>Aug 2004</RegoExpire> <Location>VIC - South Eastern</Location> </CasualAdvertiser> - <CasualAdvertiser> <AdvertisementID>1607551</AdvertisementID> <FirstName /> <LastName /> <EveningPhone>0408898011</EveningPhone> <MobilePhone>0408898011</MobilePhone> <Make>MERCEDES-BENZ</Make> <Family>ML</Family> <ModelVariant>320-LUXURY-4x4</ModelVariant> <Series /> <BodyType>4D WAGON</BodyType> <SeriesYear>1999</SeriesYear> <Rego>AHV26M</Rego> <Price>42900.0000</Price> <Kilometres>123000</Kilometres> <Colour>Silver</Colour> <InteriorColour /> <Transmission>A</Transmission> <EngineSize /> <Fuel>Unleaded</Fuel> <FeaturesVerbose>Air Conditioning, Power Steering, ABS Braking, Powe +r Windows, Cruise Control, Air Bags, CD Player, Central Locking Remot +e, Metalic Paint, Roadworthy certificate, Alloy Wheels, Full service +history</FeaturesVerbose> <Condition>immac</Condition> <Comments>Immaculate luxury 4wd with all the usual Merc' quality and + extras. It's fully serviced & optioned up with the luxury pack plus +has quite new tyres all round, long rego, leather interior, woodgrain + inserts, CD stacker, traction control, brake assist, Merc' roof rack +s etc etc. Extremely well looked after - always garaged and no off ro +ad use. Beautiful vehicle in superb condition. Will listen to all rea +listic offers.</Comments> <RegoExpire>Oct 2005</RegoExpire> <Location>QLD - South Eastern QLD</Location> </CasualAdvertiser> <NewspaperAdvertiser> <AdvertisementID>2065338</AdvertisementID> <Make>KIA</Make> <Family>CARNIVAL</Family> <ModelVariant /> <Series /> <BodyType /> <SeriesYear>2004</SeriesYear> <Rego>NF361</Rego> <Price>27500.0000</Price> <Kilometres>19000</Kilometres> <Colour /> <InteriorColour /> <Transmission /> <EngineSize>0.0</EngineSize> <Fuel /> <FeaturesVerbose /> <Condition /> <Comments>KIA CARNIVAL 3/2004, 19,000kms, blue, many extras. $27,500 +. NF361. 0411 090 456 or 02 4626 8620.</Comments> <Location>NSW - Sydney</Location> </NewspaperAdvertiser> </BCCFeed>

Comment on Best Way to parse this XML?
Download Code
Re: Best Way to parse this XML?
by borisz (Canon) on Mar 14, 2005 at 09:11 UTC
Re: Best Way to parse this XML?
by mirod (Canon) on Mar 14, 2005 at 10:13 UTC

    Your know that this XML is invalid right? First there's the '-' at the begining, which might be an artifact of the way you created the post, then there is an invalid '&' in the middle of the data, which should be escaped as &amp;.

    But if you manage to get real XML, here is an example of how you could do it with XML::Twig:

    #!/usr/bin/perl -w use strict; use XML::Twig; my $t= XML::Twig->new( twig_roots => { CasualAdvertiser => 1 }) # so w +e only load CasualAdvertiser elements ->parsefile( "bcc.xml"); foreach my $ad ( sort { $a->field( 'Price') <=> $b->field( 'Price') } +$t->root->children( 'CasualAdvertiser')) { printf "price: %8.2d - %-30s - %6d: ID%s\n", $ad->field( 'Price'), $ad->field( 'Make') . " " . $ad->field( 'Family'), $ad->field( 'Kilometres'), $ad->field( 'AdvertisementID'), ; } exit;
Re: Best Way to parse this XML?
by perlsen (Chaplain) on Mar 14, 2005 at 11:12 UTC

    Hi, if you need to get the text in between CasualAdvertiser tag try below code
    parser error:
    in your file & should be changed in to amp; entity name
    then the following code can give the output text

    use XML::Twig; my $twig = new XML::Twig( TwigRoots => { 'CasualAdvertiser' => \&outpu +t_title }); $twig->parsefile( shift @ARGV ); sub output_title { ( $tree, $elem) = @_; print $elem->text, "\n\n"; } output: ********** 1580590MERCEDES-BENZE220SPECIAL-EDITION4D SEDAN1994ORV21521500.0000175 +000Almandi neA2.2Air Conditioning, Power Steering, ABS Braking, Powe r Windows, Cruise Control, Towbar, Air Bags, CD Player, Sunroof, Meta lic Paint, Full service historyexcelGaraged, leather trim Must sellAug + 2004VIC - South Eastern 160755104088980110408898011MERCEDES-BENZML320-LUXURY-4x44D WAGON1999AH +V26M42900. 0000123000SilverAUnleadedAir Conditioning, Power Steering, ABS Braking +, Powe r Windows, Cruise Control, Air Bags, CD Player, Central Locking Remot e, Metalic Paint, Roadworthy certificate, Alloy Wheels, Full service historyimmacImmaculate luxury 4wd with all the usual Merc' quality and extras. It's fully serviced & optioned up with the luxury pack plus has quite new tyres all round, long rego, leather interior, woodgrain inserts, CD stacker, traction control, brake assist, Merc' roof rack s etc etc. Extremely well looked after - always garaged and no off ro ad use. Beautiful vehicle in superb condition. Will listen to all rea listic offers.Oct 2005QLD - South Eastern QLD
      Thank you all for your responses it seems XML::Twig is exactly what i was looking for, i havent tried it yet but from what ive read here it will do the job. I will get back to you and let you know how it goes. Pete

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://439201]
Approved by pelagic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (13)
As of 2014-11-27 09:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (183 votes), past polls