Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

XML Parsing,

by dmsparts (Sexton)
on Sep 09, 2009 at 09:35 UTC ( #794307=perlquestion: print w/replies, xml ) Need Help??

dmsparts has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that parses XML data with details of vehicles into many CSV files(one for each vehicle). i Have posted here before when i was developing the script. it has been working faultlessly until the other day when there was only one vehicle in the XML file. I cannot work out why it wont work, can anyone help. Here is a copy of the script

#!/usr/bin/perl use XML::Simple; print "Content-type: text/html\n"; print "\n"; ######XML PARSE########## $ipath = "e:/cars/"; $str = ".xml"; opendir THEDIR, "$ipath"; @xmlfile = grep (/$str/, readdir THEDIR); closedir THEDIR; foreach $file (@xmlfile){ $xml = $ipath.$file; } if (lc($xml) =~ "don"){&doncaster} else { if (lc($xml) =~ "ayl"){$holding = "asmholding"}; if (lc($xml) =~ "cartrans"){$holding = "ctpholding"}; if (lc($xml) =~ "hills"){$holding = "hilholding"}; if (lc($xml) =~ "windley"){$holding = "wlsholding"}; if (lc($xml) =~ "ppm"){$holding = "ppmholding"}; print "Other Cars<br>"; $config = XMLin( $xml, SuppressEmpty => "" ); $exportid = "$config->{Header}->{ExportID}"; $numveh = "$config->{Summary}->{NumberOfVehicles}"; $memid = "$config->{Header}->{MemberID}"; $lineno = 10; $vehicleno = 1; foreach $vehicle ( @{ $config->{'Vehicle'} } ) { $auctionid = "$vehicle->{AuctionID}"; $make = "$vehicle->{Manufacturer}"; $model = "$vehicle->{Model}"; $reg = "$vehicle->{RegNo}"; $year = "$vehicle->{RegYear}"; $colour = "$vehicle->{Colour}"; $fueltype = "$vehicle->{Fuel}"; $damage = "$vehicle->{Damage}"; $doors = "$vehicle->{Doors}"; $cc = "$vehicle->{CC}"; $mileage = "$vehicle->{Speedo}"; $transpeed = "$vehicle->{TransSpeed}"; $trantype = "$vehicle->{TransType}"; $trim = "$vehicle->{TrimLevel}"; $abicat = "$vehicle->{Cat}"; $reserve = "$vehicle->{Reserveprice}"; $vat = "$vehicle->{HasVAT}"; if ($vat eq 1){$hasvat = "Yes"} else {$hasvat = "No"}; $xmlstarts = "$vehicle->{Starts}"; if ($xmlstarts eq 1){$starts = "Starts"} else {$starts = ""}; $xmldrives = "$vehicle->{Drives}"; if ($xmldrives eq 1){$drives = "Drives"} else {$drives = ""}; $xmlkeys = "$vehicle->{Keys}"; if ($xmlkeys eq 1){$keyssupplied = "yes"} else {$keyssupplied = "No"}; $image1 = "$vehicle->{Images}->{Image_1}"; $image2 = "$vehicle->{Images}->{Image_2}"; $image3 = "$vehicle->{Images}->{Image_3}"; $image4 = "$vehicle->{Images}->{Image_4}"; $image5 = "$vehicle->{Images}->{Image_5}"; $image6 = "$vehicle->{Images}->{Image_6}"; $image7 = "$vehicle->{Images}->{Image_7}"; $image8 = "$vehicle->{Images}->{Image_8}"; $image9 = "$vehicle->{Images}->{Image_9}"; $image10 = "$vehicle->{Images}->{Image_10}"; $image11 = "$vehicle->{Images}->{Image_11}"; $image12 = "$vehicle->{Images}->{Image_12}"; ###########Write CSV############ $opath = "e:/out/" ; $add='0'; $add2='xx.csv'; $add5 = $exportid.$lineno ; $add3 = $opath.$exportid.$lineno.$add2 ; $add6 = "xx"; $filename = $exportid.$lineno.$add2; $mileage =~ tr/,//d ; $damage =~ tr/,//d ; $trim =~ tr/,//d ; open (FILENAME,">$add3"); print FILENAME "Ref No,Make,Model,Trim,Vehicle Sub Class,Colour,Engine + Size,Fuel Type,Transmission,Year,Mileage,Registration No,Keys Suppli +ed,Damage Report,VAT Applicable,Condition,FSH,ABI Category,Reserve,Ch +assis No,Vehicle Source,Agent,Region,Vehicle Location,Source Name,Dat +e Approved\n"; print FILENAME "$add5,$make,$model,$trim,$doors,$colour,$cc,$fueltype, +$transpeed $trantype,$year,$mileage,$reg,$keyssupplied,$damage,$hasva +t,$starts $drives,No,$abicat,$reserve,na,$memid,$memid,$memid,$memid, +$auctionid,na,\n"; close FILENAME ; print "<table width=400>"; print "<tr><td>$vehicleno. "; print "$filename Created </td>"; #############Change IMG Name############## $pga ="a.jpg"; $pgb ="b.jpg"; $pgc ="c.jpg"; $pgd ="d.jpg"; $pge ="e.jpg"; $pgf ="f.jpg"; $pgg ="g.jpg"; $pgh ="h.jpg"; $pgi ="i.jpg"; $pgj ="j.jpg"; $pgk ="k.jpg"; $pgl ="l.jpg"; if ($image1 ne ""){$jpg1 ="$add5$add6$pga"; rename ($ipath.$image1, $ipath.$jpg1); print "<td colspan=2>Image 1 Renamed</td></tr>"; } if ($image2 ne ""){$jpg2 ="$add5$add6$pgb"; rename ($ipath.$image2, $ipath.$jpg2); print "<tr><td>&nbsp</td><td colspan=2>Image 2 Renamed</td></tr>"; } if ($image3 ne ""){$jpg3 ="$add5$add6$pgc"; rename ($ipath.$image3, $ipath.$jpg3); print "<tr><td>&nbsp</td><td colspan=2>Image 3 Renamed</td></tr>"; } if ($image4 ne ""){$jpg4 ="$add5$add6$pgd"; rename ($ipath.$image4, $ipath.$jpg4); print "<tr><td>&nbsp</td><td colspan=2>Image 4 Renamed</td></tr>"; } if ($image5 ne ""){$jpg5 ="$add5$add6$pge"; rename ($ipath.$image5, $ipath.$jpg5); print "<tr><td>&nbsp</td><td colspan=2>Image 5 Renamed</td></tr>"; } if ($image6 ne ""){$jpg6 ="$add5$add6$pgf"; rename ($ipath.$image6, $ipath.$jpg6); print "<tr><td>&nbsp</td><td colspan=2>Image 6 Renamed</td></tr>"; } if ($image7 ne ""){$jpg7 ="$add5$add6$pgg"; rename ($ipath.$image7, $ipath.$jpg7); print "<tr><td>&nbsp</td><td colspan=2>Image 7 Renamed</td></tr>"; } if ($image8 ne ""){$jpg8 ="$add5$add6$pgh"; rename ($ipath.$image8, $ipath.$jpg8); print "<tr><td>&nbsp</td><td colspan=2>Image 8 Renamed</td></tr>"; } if ($image9 ne ""){$jpg9 ="$add5$add6$pgi"; rename ($ipath.$image9, $ipath.$jpg9); print "<tr><td>&nbsp</td><td colspan=2>Image 9 Renamed</td></tr>"; } if ($image10 ne ""){$jpg10 ="$add5$add6$pgj"; rename ($ipath.$image10, $ipath.$jpg10); print "<tr><td>&nbsp</td><td colspan=2>Image 10 Renamed</td></tr>"; } if ($image11 ne ""){$jpg11 ="$add5$add6$pgk"; rename ($ipath.$image11, $ipath.$jpg11); print "<tr><td>&nbsp</td><td colspan=2>Image 11 Renamed</td></tr>"; } if ($image12 ne ""){$jpg12 ="$add5$add6$pgl"; rename ($ipath.$image12, $i2path.$jpg12); print "<tr><td>&nbsp</td><td colspan=2>Image 12 Renamed</td></tr>"; } print "<BR>"; print "</table>"; $vehicleno = $vehicleno +1; $lineno = $lineno+1 } $str = ".jpg"; opendir THEDIR, "$ipath"; @jpgfile = grep (/$str/, readdir THEDIR); closedir THEDIR; print "Creating Thumbnails........."; foreach $file (@jpgfile){ $nconv = "c:\\nconvert"; $jpgin = "$ipath/$file"; $jpgout = "$ipath/t$file"; $cmd = "$nconv -quiet -out jpeg -resize 100 75 -o $jpgout $jpgin"; system $cmd; } print "Thumbnails Created<BR>"; unlink $xml; };

Thanks Michael

Replies are listed 'Best First'.
Re: XML Parsing,
by moritz (Cardinal) on Sep 09, 2009 at 09:45 UTC

    So what's the input data that's causing you troubles, and what kind of troubles do you have? A run time error? If so, which?

    My magic crystal ball tells me that you might need some ForceArray options as explained in the XML::Simple documentation, but my crystal ball is not the most reliable diagnosis tool out there.

    Update: Don't you find the repetitions in your program a bit painful? Perl has loops, arrays and hashes - they can save you lots of typing, and make your code more general.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: XML Parsing,
by Sewi (Friar) on Sep 09, 2009 at 10:48 UTC
    I don't know which error you get or what isn't happening, but I'ld like to give some comments on your code:
    #!/usr/bin/perl use XML::Simple; print "Content-type: text/html\n"; print "\n"; ######XML PARSE########## $ipath = "e:/cars/"; $str = ".xml"; opendir THEDIR, "$ipath"; @xmlfile = grep (/$str/, readdir THEDIR); closedir THEDIR; foreach $file (@xmlfile){ $xml = $ipath.$file; }
    This bracket above confuses me. At this place, it would make the loop go through the directory getting a random filename (the last one readdir returns, but you can't say which one would be the last returned). The order could change over time, reboots, fsck's, OS-changes, directory copies...

    Okay, I'll assume that you know what you're doing :-)

    if (lc($xml) =~ "don"){&doncaster} else { if (lc($xml) =~ "ayl"){$holding = "asmholding"}; if (lc($xml) =~ "cartrans"){$holding = "ctpholding"}; if (lc($xml) =~ "hills"){$holding = "hilholding"}; if (lc($xml) =~ "windley"){$holding = "wlsholding"}; if (lc($xml) =~ "ppm"){$holding = "ppmholding"};
    The following processing only takes for file with doesn't include don.
    Using elsif for the last if-block might be better.
    print "Other Cars<br>"; $config = XMLin( $xml, SuppressEmpty => "" ); $exportid = "$config->{Header}->{ExportID}"; $numveh = "$config->{Summary}->{NumberOfVehicles}"; $memid = "$config->{Header}->{MemberID}";
    exportid and memid are being used, but not modified and numveh isn't used at all. You may want to use their long form (config-header-exportid, like you wrote) instead of copiing the values without need.
    $lineno = 10; $vehicleno = 1; foreach $vehicle ( @{ $config->{'Vehicle'} } ) { $auctionid = "$vehicle->{AuctionID}"; $make = "$vehicle->{Manufacturer}"; $model = "$vehicle->{Model}"; [...]
    Same as above, there is no need to copy the values, it just makes your script much longer than needed.
    $image1 = "$vehicle->{Images}->{Image_1}"; $image2 = "$vehicle->{Images}->{Image_2}"; $image3 = "$vehicle->{Images}->{Image_3}"; $image4 = "$vehicle->{Images}->{Image_4}"; $image5 = "$vehicle->{Images}->{Image_5}"; $image6 = "$vehicle->{Images}->{Image_6}"; $image7 = "$vehicle->{Images}->{Image_7}"; $image8 = "$vehicle->{Images}->{Image_8}"; $image9 = "$vehicle->{Images}->{Image_9}"; $image10 = "$vehicle->{Images}->{Image_10}"; $image11 = "$vehicle->{Images}->{Image_11}"; $image12 = "$vehicle->{Images}->{Image_12}";
    Besides the previous comment, this is a wonderful loop example. Here is my suggestion:
    for my $ImgNo (1..12) { # Storing them into an array is easy: $image[$_] = $vehicle->{Images}->{'Image_'.$ImgNo}; # or you could also skip empty values: push @image,$vehicle->{Images}->{'Image_'.$ImgNo} if $vehicle->{Imag +es}->{'Image_'.$ImgNo} ne ''; # or you may want to keep the variable names: eval('$image'.$ImgNo.'=$vehicle->{Images}->{"Image_".$ImgNo};'); }
    This makes three lines which were 12 before.
    ###########Write CSV############ $opath = "e:/out/" ; $add='0'; $add2='xx.csv'; $add5 = $exportid.$lineno ; $add3 = $opath.$exportid.$lineno.$add2 ; $add6 = "xx"; $filename = $exportid.$lineno.$add2; $mileage =~ tr/,//d ; $damage =~ tr/,//d ; $trim =~ tr/,//d ; open (FILENAME,">$add3");
    Just a text issue: I prefer calling it FILEHANLDE, because FILENAME is not what is inside, but it doesn't make any difference to Perl...
    print FILENAME "Ref No,Make,Model,Trim,Vehicle Sub Class,Colour,Engine + Size,Fuel Type,Transmission,Year,Mileage,Registration No,Keys Suppli +ed,Damage Report,VAT Applicable,Condition,FSH,ABI Category,Reserve,Ch +assis No,Vehicle Source,Agent,Region,Vehicle Location,Source Name,Dat +e Approved\n"; print FILENAME "$add5,$make,$model,$trim,$doors,$colour,$cc,$fueltype, +$transpeed $trantype,$year,$mileage,$reg,$keyssupplied,$damage,$hasva +t,$starts $drives,No,$abicat,$reserve,na,$memid,$memid,$memid,$memid, +$auctionid,na,\n";
    This last print could be made much more readable:
    print FILENAME join(',', $exportid.$lineno, #$add5 is defined this way some lines before $vehicle->{Manufacturer}, $vehicle->{Model}, [...] @image, )."\n";
    This one uses join and the original names for the fields.
    close FILENAME ; print "<table width=400>"; print "<tr><td>$vehicleno. "; print "$filename Created </td>"; #############Change IMG Name############## $pga ="a.jpg"; $pgb ="b.jpg"; $pgc ="c.jpg"; $pgd ="d.jpg"; $pge ="e.jpg"; $pgf ="f.jpg"; $pgg ="g.jpg"; $pgh ="h.jpg"; $pgi ="i.jpg"; $pgj ="j.jpg"; $pgk ="k.jpg"; $pgl ="l.jpg"; if ($image1 ne ""){$jpg1 ="$add5$add6$pga"; rename ($ipath.$image1, $ipath.$jpg1); print "<td colspan=2>Image 1 Renamed</td></tr>"; }
    Here we could you really cool Perl features:
    $Char = "a"; for (@images) { next if $_ eq ''; # if you didn't filter them above rename $ipath.$_,$ipath.$add5.$add6.($Char++).'.jpg'; }
    First, I defined a variable for the current char, the first one is "a".
    I assumed that you switched to an array before which is easier to work with. This loop goes through the array and (next line) skips empty values if there is a risk that they may make it into the array.
    The next line is using the current loop item $_ and renaming it to $Char.jpg
    Like (nearly) any programming language, Perl could count up, but in Perl this also works for letters. The "a" defined before is counted up to a "b" here, but - and this is why the ++ are BEHIND the variablen name - the old value is returned and used in the loop run. Pretty cool, isn't it?
    I skiped the print, but I guess that you could handle this yourself...

    Let's stop here as I have nothing to say about the rest.

    One thing left for reading directories:

    opendir THEDIR, "$ipath"; for $file (grep (/$str/, readdir THEDIR)) { # Do the work } closedir THEDIR;
    This is shorter and you save the creattion of a temporary array.

    Don't worry, I also wrote code like this when I was a Perl beginner. Please don't treat this as critic, I'm very happy that people are still learning Perl! This post should be a review of things you could do to make your code more readable and easier to maintain.

      Hi, thanks for the reply, as you have pointed out, i do not know a lot of perl, hence why there is a lot of repetition. a little more info (that i should have said before). I am sent a zip file with lots of jpgs and one xml file. i extract them to the folder E:\cars on my pc and then run the script. usually this works fine, renames all the images and then creates a csv file for each vehicle. The problem i have is that when i try to process an xml file that has details of only one vehicle in, ie
      <?xml version="1.0"?> <!--DMS Auction Export - Version 1.2--> <root> <Header> <MemberID>005</MemberID> <ExportID>7908</ExportID> </Header> <Vehicle> <AuctionID>21261</AuctionID> <VehicleID>149611</VehicleID> <Ref></Ref> <Manufacturer>FORD</Manufacturer> <Model>FIESTA</Model> <RegNo>EA05BYM</RegNo> <RegYear>2005</RegYear> <Colour>Red</Colour> <Fuel>Petrol</Fuel> <Damage></Damage> <Doors>3</Doors> <Body>Hatchback</Body> <CC>1999</CC> <Speedo></Speedo> <TransSpeed>5 Speed</TransSpeed> <TransType>Manual</TransType> <TrimLevel>CLOTH</TrimLevel> <Engine></Engine> <Cat>C</Cat> <ReservePrice>2565</ReservePrice> <StartPrice>0</StartPrice> <HasVAT>0</HasVAT> <Keys>1</Keys> <Starts>0</Starts> <Drives>0</Drives> <Stereo>0</Stereo> <VINPlate>1</VINPlate> <LogBook>0</LogBook> <DateAuction>27.03.2009</DateAuction> <Location>BURSCOUGH</Location> <CanBeViewed>YES</CanBeViewed> <CostExVAT>0</CostExVAT> <PAV>0</PAV> <Images> <Image_1>650161.jpg</Image_1> <Image_2>650162.jpg</Image_2> <Image_3>650163.jpg</Image_3> <Image_4>650164.jpg</Image_4> <Image_5>650165.jpg</Image_5> <Image_6></Image_6> <Image_7></Image_7> <Image_8></Image_8> <Image_9></Image_9> <Image_10></Image_10> <Image_11></Image_11> <Image_12></Image_12> </Images></Vehicle> <Summary> <NumberOfVehicles>1</NumberOfVehicles> <DateExport>19.03.2009 14:12:50</DateExport> </Summary> </root>
      the script stops. it doesn't give an error. it prints "other cars" than nothing else. i have determined that the script stops at the point where the XML fields are read into perl variables. making print statements at various points. It seems to be that because there is only one item in the foreach loop it is not going to work. I will also look at your suggestions, however i do not understand some of them. Thanks Again Michael
        the script stops. it doesn't give an error. ...

        That's odd... when I ran the OP script on the xml file in your reply, I got an error message:

        Not an ARRAY reference at 794307.pl line 46.
        In my copy of the script, line 46 was this one:
        foreach $vehicle ( @{ $config->{'Vehicle'} } ) {
        Apparently, when the xml file contains only one "Vehicle", the default behavior of XML::Simple is to provide you with something that is not an array reference (e.g. maybe it's a reference to a hash instead of a reference to an array of hashes).

        That is why moritz, in the first reply, suggested that you include the ForceArray=>1 parameter when you invoke XMLin to create your $config object, like this:

        $config = XMLin( $xml, ForceArray => 1, SuppressEmpty => "" );
        When I made that change and ran it again, I didn't get the error message, and I got more output in addition to just Other Cars<br>.

        Anyway, as others have pointed out, there seem to be quite a few points where your code could use some improvement, and where it may even be doing things you don't intend (and/or not doing things you meant to do). Using more data structures, more loops, and better indentation will help a lot.

        (updated to fix a mistake in wording)

        Hi Michael,

        please ask for everything you didn't understand. I think you're on a very good way and I don't want to stop you because of too short or bad written comments.

        Now I understand, why you're only parsing one xml file. Here is a really complicated, but short solution for this:

        opendir THEDIR, "$ipath"; $xml = $ipath.(grep (/$str/, readdir THEDIR))[0]; closedir THEDIR;
        It is your grep (which returns an array of matching items) but as you only expect one element in the array, the grep-result is forced to be an array using the ( ) around grep. The combination of ( ) and &91; &93; allows you to fetch a specific array item without writing it to an array first.

        Another comment on your original article already told you that you might not get a array from the XML parser if only one inside object is found. There was a hint for a solution. To check this, you should print the thing you use as a array reference:

        print STDERR $config->{'Vehicle'}."\n"; foreach $vehicle ( @{ $config->{'Vehicle'} } ) {
        If this isn't an array reference (shown as ARRAY(0x12345)), you could force it to be one:
        $config->{'Vehicle'} = [$config->{'Vehicle'}] if ref($config->{'Vehicl +e'}) ne 'ARRAY';
        ref() tests if the parameter is a reference and returns the type of reference (in this case, ARRAY). If your value isn't what you expect (an ARRAY - reference), it creates one and fills in the old value into the new array.
Re: XML Parsing,
by biohisham (Priest) on Sep 09, 2009 at 19:26 UTC
    Hello there, you're doing some serious stuff out there, you're employing Perl to solve issues and all that, I'm yet to learn all these modules required but hey, here is a tip, you've to stick to, it saves you a lot of unnecessary headaches, my tip is on the box below:

    use strict; use warnings;
    I was not into this habit before but today, I appreciate it so much and I share it with you comrade, no one here at PM would advice you against using either these two lines, esp the first one because we all know its worth.
    Take care...

    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
Re: XML Parsing,
by saberworks (Curate) on Sep 09, 2009 at 22:37 UTC
    Use Data::Dumper to print out the $config variable after you parse it. Do this with the bad file and with a good file so you can spot the difference. As the first commenter mentioned, you can likely pass an option to XMLIn called ForceArray so the Vehicle element is always an array (but will likely do it to other elements as well).
    use Data::Dumper; ... $config = XMLin( $xml, SuppressEmpty => "" ); warn Dumper($config);

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://794307]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2021-04-21 20:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?