Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses


by ultibuzz (Monk)
on Nov 20, 2006 at 15:58 UTC ( #585076=perlquestion: print w/replies, xml ) Need Help??
ultibuzz has asked for the wisdom of the Perl Monks concerning the following question:

Situation: HUGE xml files with lots of informations that are not needed, we talk about millions of rows and lots of GB, and i don't understand , get confused with the dumper output
Example XML file (test.xml)


Dumper output
$VAR1 = { 'BL_DPLI_RECORD' => { 'REASON' => { 'DATE_OCCURRED' => '2006 +-01-30', 'ENTRY' => 'highspender +limit exceeded' }, 'BL_LI_RECORD' => { 'MONETARYVALUE' => + { + 'C' => 'EUR', + 'V' => '215.84' + }, 'DATE_LAST_UPDATE' + => '2006-01-30' }, 'BL_DP_TOTAL' => { 'DATE_OLDEST_INVOIC +E' => {}, 'MONETARYVALUE' => +{ + 'C' => 'EUR', + 'V' => '0.00' +}, 'NUMBER_DENIED_PAYM +ENTS_TOTAL' => '0' }, 'DATE_LAST_UPDATE' => '2006-01-30', 'DATE_CREATED' => '2006-01-31' }, 'BL_USER' => { 'USERID' => '22069332', 'ADDRESS' => { 'COUNTRY' => 'D', 'MAIDENNAME' => {}, 'CITY' => 'Entenhausen', 'ZIP' => '660666', 'LASTNAME' => 'M&#65533;ller', 'FIRSTNAME' => 'heiner', 'STREET' => 'Entenhausenerweg' +, 'STREET_NO' => '15' }, 'PHONENO' => [ { 'ENTRY' => '877465535' }, { 'ENTRY' => '86719273704' }, { 'ENTRY' => '8671881760' }, { 'ENTRY' => '8671969876' } ] } };

I need all Phone numbers with the reason behind (entry)

__OUT__ 877465535;highspender limit exceeded 86719273704;highspender limit exceeded 8671881760;highspender limit exceeded 8671969876;highspender limit exceeded

i attemped so far xml::simple with data::dumper, but i get confused howto access a record for each Customer because they always start with BL_DPLI_RECORD
tips hints are definatly needed

kd ultibuzz

#!"C:\perl\bin\perl.exe" use warnings; use strict; use XML::Simple; use Data::Dumper; my $config = XMLin('test.xml'); open(OUT,'>','out.txt') or die; print OUT Dumper($config);

Edited by planetscape - added readmore tags

Replies are listed 'Best First'.
Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by Joost (Canon) on Nov 20, 2006 at 16:08 UTC
    Forget about using XML::SImple for large XML files. It will take quite a lot more memory than the file size to load XML via XML::Simple.

    I'd recommend XML::Twig (and keep the fhe flush() method in mind).

    update: XML::Twig also has much better search methods for finding the tags you're interested in (and ignoring the rest). XML::Simple is more useful if the XML is fairly small and already more or less matches your desired data structure.

Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by monkey_boy (Priest) on Nov 20, 2006 at 17:00 UTC
    A little snipet to get you going, you could easyly post-process the output if this:
    use strict; use warnings; use XML::Twig; my $t = XML::Twig->new( twig_handlers => { 'PHONENO/ENTRY' => \&print_n_purge, 'REASON/ENTRY' => \&print_n_purge, } ); $t->parsefile($your_xml_file); sub print_n_purge { my( $t, $elt)= @_; print $elt->parent->name,":",$elt->text , "\n"; $t->purge; };

    This is not a Signature...

      xml::twig sounds good, if vpn is working right i will test now some.
      i just came home from a 12 hour working day
      big thanks to joos and you for that hind with xml:twig

      UPDATEi have tryed this snipet and this is the output

      PHONENO:877465535 PHONENO:86719273704 PHONENO:8671881760 PHONENO:8671969876 not well-formed (invalid token) at line 17, column 14, byte 313 at C:/ +Perl/site/lib/XML/ line 187

      i added 'REASON/DATE_OCCURRED' => \&print_n_purge because i thought its related to that i don't take this element of REASON. but this didn't help

      UPDATE2 well now i don't understand this, i jsut copy the informations ina a file called test2.xml and start over and i got this

      PHONENO:867112593 PHONENO:86719273704 PHONENO:8671881760 PHONENO:8671969876 REASON:highspender limit exceeded no element found at line 52, column 16, byte 1252 at C:/Perl/site/lib/ +XML/ line 187

      as you can see now the Reason is ther still an error but the reason is printed

        Looks like your "XML" file isn't well-formed (note the line and column numbers in the error messages refer to the line and column in the XML file). In effect that means it's not valid XML and you should return it to whoever created it and let them figure out how to write well-formed XML first.

        update: it appears the problem is the ü character. Adding an xml prolog that indicates the correct encoding should fix it. Add something like this at the very beginning of the file:

        <?xml version="1.0" encoding="ISO-8859-1"?>

        Your encoding is probably iso-8859-1 (latin-1), CP1252 (latin windows encoding) or utf-8 (one of the unicode encodings)

Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by madbombX (Hermit) on Nov 20, 2006 at 17:05 UTC
    I agree with Joost above, so I will therefore not restate his suggestion on using XML::Twig. However, if you are bent (for one reason or another) on using XML::Simple, then you need to look into the "ForceArray => 1" config item:
    my $config = XMLin('test.xml', ForceArray => 1);

      i tryed it with forcearray but it looks nearly the same so i quit using force array

Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by inman (Curate) on Nov 20, 2006 at 20:15 UTC
    You can use a combination of XML::Twig to do the file processing record by record and XML::Simple for convenient data structure. Forcearray is strongly advised for XML::Simple.
Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by Smaug (Pilgrim) on Nov 20, 2006 at 17:52 UTC

    Untested but you're welcome to try:
    $reason = $config->{'BL_DPLI_RECORD'}->{'REASON'}->{'ENTRY'};

    That should give you a good starting point to move on from.
    Update: In fact if you read getting required format output from a xml file I believe it will solve your confusion.

      this definatly sorts out some confusion ;)
      but i still don't geht how to get all the phone numbers, and this is also just an example
      i can have from one phone number to 1000

Re: Confusion ,XML::SIMPLE with DATA:DUMPER
by ultibuzz (Monk) on Nov 20, 2006 at 21:39 UTC

    so i used this snipet to fill an array, then i seek in this array untill text is found, then all elements passed so far get the text found behind
    well even if this it not the nice way it works

    i don't find another way reading the xml::twig documentation
    tomorrow i will have again a look, i think its over for today with the brainpower

    kd ultibuzz

      Considering the data you need, there is no way for you to avoid storing the phone numbers, then outputing them when you find the reason. You could leave the elements (and just these elements) in the tree though, by using twig_roots to get the twig built only for them:

      #!/usr/bin/perl use strict; use warnings; use XML::Twig; XML::Twig->new( # the twig will only contain PHONENO/ENTRY and REASON/ +ENTRY elements # (plus the root or it would not be a tree) twig_roots => { 'BL_USER/PHONENO/ENTRY' => 1, + 'BL_DPLI_RECORD/REASON/ENTRY' => \&rea +son, } ) ->parsefile( 'phone_data.xml'); sub reason { my( $t, $reason)= @_; foreach my $phone_no ($reason->prev_siblings) { print $phone_no->text, ";", $reason->text, "\n"; } $t->purge; }

        i have some questions left to understand this.
        with twig_roots it will make me an array if more then 1 entry is presented ?
        $reason->prev_siblings points to that array so the script process it one by one
        what means prev_siblings ?
        kd ultibuzz

by ultibuzz (Monk) on Nov 24, 2006 at 08:11 UTC

    i want a third value to be printed out, so i set it up in the root

    with next_siblings i want to access the value but didn't work
    when i set the root tu => 1 its printet out like an element of phone number
    so whats wrong in my mind, whats screwed up ?

    kd ultibuzz

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://585076]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2018-07-17 11:52 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (363 votes). Check out past polls.