Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Perl XML::Smart Out of memory! error

by BrowserUk (Pope)
on Apr 28, 2014 at 15:41 UTC ( #1084144=note: print w/ replies, xml ) Need Help??


in reply to Perl XML::Smart Out of memory! error

The XML file is 21.5MB in size. I know it gets stuck at the point of parsing the file

Does it still fail if you try it on a smaller file? (Like the sample you posted.)

If not, it suggests that the the process of parsing the file and building the complex, nested data-structure that results from it, is consuming more memory than is available.

Suggestions:

  1. Install a 64-bit Perl.

    32-bit perls are limited to using 2GB of memory. Most machines these days have much more than this available.

    A 64-bit Perl will allow you to use all the memory your machine has available.

  2. Use a different XML parser.

    XML::Smart is a particularly memory hungry module because of its design. Another XML parser might be able to parse the whole file whilst using less memory.

    Some XML modules are explicitly designed to handle files that are too big for memory, by only parsing/retaining a subset of the file in memory at any given time. eg. XML::Twig


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: Perl XML::Smart Out of memory! error
Re^2: Perl XML::Smart Out of memory! error
by nasaa (Novice) on Apr 28, 2014 at 16:21 UTC
    hi thanks for replying.It works with smaller file.My machine only has 2 gig ram. I looked at XML::Twig but cant figure out what i need to change. i made following changes and then i am lost in the twig docs.
    use XML::Twig; my $XML = XML::Twig->new($location.'CategoriesList.xml') or die("Unabl +e to parse CategoriesList.xml: $!");;
      then i am lost in the twig docs.

      Sorry, I can't help with that. I've never had occasion to use Twig.

      Perhaps you should start another thread asking for help doing the conversion.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      #!perl use strict; use DBI; use XML::Twig; my $dbh = dbh(); $dbh->do("DROP TABLE IF EXISTS ice_categories"); $dbh->do("CREATE TABLE ice_categories ( category_id int(11) not null, parent_cat_id int(11) not null, category_name varchar(100) not null default '', category_description varchar(100) not null default '', category_image varchar(100) not null default '', category_thumb varchar(100) not null default '', KEY (category_id), KEY (parent_cat_id)) CHARACTER SET utf8 COLLATE utf8_unicode_ci;"); my $sql = 'INSERT INTO ice_categories ( category_id, parent_cat_id, category_name, category_description, category_image, category_thumb) VALUES (?,?,?,?,?,?)'; my $sth = $dbh->prepare($sql); my $t = XML::Twig->new( twig_handlers => { 'Category' => \&Category } ); $t->parsefile( 'file.xml' ); sub Category{ my ($t, $elt) = @_; my @f = ( $elt->att('ID') ); $f[4] = $elt->att('LowPic'); $f[5] = $elt->att('ThumbPic'); $f[1] = $elt->first_child('ParentCategory')->att('ID'); $f[2] = $elt->first_child('Name[@langid="1"]')->att('Value'); $f[3] = $elt->first_child('Description[@langid="1"]')->att('Value'); print "@f\n"; $sth->execute(@f); } # connect sub dbh { my $dsn = "DBI:mysql:database=test;host=localhost"; my $dbh = DBI->connect($dsn, 'user', 'password', {RaiseError => 1, PrintError => 1}) or die (Error connecting " $DBI::errstr"); }
      poj
        Hi many thanks for the code. i ran it on the full file and it showed up an error while processing. Can't call method "att" on an undefined value at import_icecat.pl line 107. line 107 is
        $f[3] = $elt->first_child('Description[@langid="1"]')->att('Value');
        looking at the xml i find that there is block with missing attributes.It has the opening and closing Category tag but not discription.How do i get it to ignore where tags are missing and keep processing
        <Category ID="1" LowPic="" Score="0" Searchable="0" ThumbPic="" UNCA +TID="" Visible="0"> <Name ID="0" Value="" langid="1"/> <ParentCategory ID="1"/> </Category>
        i tried
        $f[3] = eval { $elt->first_child('Description[@langid="1"]')->att('Val +ue'); };
        i get an error
        Use of uninitialized value $f[3] in join or string at import_icecat.pl + line 108. 1 1 DBD::mysql::st execute failed: Column 'category_description' cannot be + null at import_icecat.pl line 109.
        lol.. its 2 am here.. your joke cracked me up..:) thank you.. my question is how do i get the script to ignore where the tags are not present.been looking at the xml and in cases the discription or another tag is not present.I want the script to carry on and ignore mising tags..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1084144]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2014-09-21 00:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (165 votes), past polls