Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Perl XML::Smart Out of memory! error

by nasaa (Novice)
on Apr 28, 2014 at 14:48 UTC ( #1084138=perlquestion: print w/replies, xml ) Need Help??
nasaa has asked for the wisdom of the Perl Monks concerning the following question:

I have this small Perl script that downloads an XML file, parses it using XML::Smart, and copies the contents into a

MySQL database by deleting and recreating a table. This script used to work fine on Centos 5, but recently the disk crashed, and the new drive has Centos 6 on it. The XML file is 21.5MB in size. I know it gets stuck at the point of parsing the file, as the database table is never deleted or created.

use 5.008; use strict; use warnings; use DBI(); use XML::Smart; use Data::Dumper; #for debugging purposes only BEGIN { $| = 1 }; my $location = 'xxxxxx'; my $XML = XML::Smart->new($location.'CategoriesList.xml') or die("Unable to parse CategoriesList.xml: $!");; $XML = $XML->cut_root(); $XML = $XML->cut_root(); $dbh->do("DROP TABLE IF EXISTS ice_categories"); $dbh->do("CREATE TABLE ice_categories ( category_id int(11) not null, parent_cat_id int(11) not null, category_name varchar(100) not null default '', category_description varchar(100) not null default '', category_image varchar(100) not null default '', category_thumb varchar(100) not null default '', KEY (category_id), KEY (parent_cat_id)) CHARACTER SET utf8 COLLATE utf8_unicode_ci;"); my @Categories = @{$XML->{CategoriesList}{Category}}; my $c_categories = 0; foreach my $category (@Categories) { my $cat_name = ucwords($category->{Name}('langid','eq','1')->{Valu +e}); #print $category->{ID} . " => " . $cat_name . "\n"; my $cat_desc = $category->{Description}('langid','eq','1')->{Value +}; $dbh->do("INSERT ice_categories (category_id, parent_cat_id, categor +y_name, category_description, category_image, category_thumb) VALUES (".$category->{ID}.", ".$cat_parent.", ".$dbh->quote($cat +_name).", ".$dbh->quote($cat_desc).", ".$dbh->quote($category->{LowPi +c}).", ".$dbh->quote($category->{ThumbPic}).")"); $c_categories++; } print "$c_categories categories imported.\n"; } 1;
Example of the xml file.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE ICECAT-interface SYSTEM "http://data.icecat.biz/dtd/ICECAT-i +nterface_response.dtd"> <ICECAT-interface> <Response Date="Sat Apr 26 14:46:53 2014" ID="13219513" Request_ID= +"1398516412" Status="1"> <CategoriesList> <Category ID="127" LowPic="http://images.icecat.biz/img/low_pic/ +127-563.jpg" Score="9725" Searchable="0" ThumbPic="http://images.icec +at.biz/thumbs/CAT127.jpg" UNCATID="43171520" Visible="0"> <Description ID="548838" Value="Device or stand where you can +rest your mobile or fixed telephone." langid="1"/> <Description ID="8310" Value="" langid="2"/> <Keywords ID="3274" Value="" langid="1"/> <Keywords ID="3275" Value="" langid="2"/> <Keywords ID="3276" Value="" langid="3"/> <Keywords ID="3277" Value="" langid="4"/> <Keywords ID="3278" Value="" langid="5"/> <Name ID="255" Value="telephone rests" langid="1"/> <Name ID="471173" Value="telefoon steunen" langid="2"/> <Name ID="343915" Value="autres téléphones" langid="3"/> <ParentCategory ID="242"> <Names> <Name ID="485" langid="1">networking</Name> <Name ID="471244" langid="2">netwerken</Name> </Names> </ParentCategory> </Category> </CategoriesList>

Replies are listed 'Best First'.
Re: Perl XML::Smart Out of memory! error
by BrowserUk (Pope) on Apr 28, 2014 at 15:41 UTC
    The XML file is 21.5MB in size. I know it gets stuck at the point of parsing the file

    Does it still fail if you try it on a smaller file? (Like the sample you posted.)

    If not, it suggests that the the process of parsing the file and building the complex, nested data-structure that results from it, is consuming more memory than is available.

    Suggestions:

    1. Install a 64-bit Perl.

      32-bit perls are limited to using 2GB of memory. Most machines these days have much more than this available.

      A 64-bit Perl will allow you to use all the memory your machine has available.

    2. Use a different XML parser.

      XML::Smart is a particularly memory hungry module because of its design. Another XML parser might be able to parse the whole file whilst using less memory.

      Some XML modules are explicitly designed to handle files that are too big for memory, by only parsing/retaining a subset of the file in memory at any given time. eg. XML::Twig


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      hi thanks for replying.It works with smaller file.My machine only has 2 gig ram. I looked at XML::Twig but cant figure out what i need to change. i made following changes and then i am lost in the twig docs.
      use XML::Twig; my $XML = XML::Twig->new($location.'CategoriesList.xml') or die("Unabl +e to parse CategoriesList.xml: $!");;
        #!perl use strict; use DBI; use XML::Twig; my $dbh = dbh(); $dbh->do("DROP TABLE IF EXISTS ice_categories"); $dbh->do("CREATE TABLE ice_categories ( category_id int(11) not null, parent_cat_id int(11) not null, category_name varchar(100) not null default '', category_description varchar(100) not null default '', category_image varchar(100) not null default '', category_thumb varchar(100) not null default '', KEY (category_id), KEY (parent_cat_id)) CHARACTER SET utf8 COLLATE utf8_unicode_ci;"); my $sql = 'INSERT INTO ice_categories ( category_id, parent_cat_id, category_name, category_description, category_image, category_thumb) VALUES (?,?,?,?,?,?)'; my $sth = $dbh->prepare($sql); my $t = XML::Twig->new( twig_handlers => { 'Category' => \&Category } ); $t->parsefile( 'file.xml' ); sub Category{ my ($t, $elt) = @_; my @f = ( $elt->att('ID') ); $f[4] = $elt->att('LowPic'); $f[5] = $elt->att('ThumbPic'); $f[1] = $elt->first_child('ParentCategory')->att('ID'); $f[2] = $elt->first_child('Name[@langid="1"]')->att('Value'); $f[3] = $elt->first_child('Description[@langid="1"]')->att('Value'); print "@f\n"; $sth->execute(@f); } # connect sub dbh { my $dsn = "DBI:mysql:database=test;host=localhost"; my $dbh = DBI->connect($dsn, 'user', 'password', {RaiseError => 1, PrintError => 1}) or die (Error connecting " $DBI::errstr"); }
        poj
        then i am lost in the twig docs.

        Sorry, I can't help with that. I've never had occasion to use Twig.

        Perhaps you should start another thread asking for help doing the conversion.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1084138]
Approved by GotToBTru
Front-paged by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2019-02-19 21:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I use postfix dereferencing ...









    Results (105 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!