Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

XML::CSV out of memory

by slugger415 (Monk)
on Mar 26, 2015 at 16:03 UTC ( [id://1121411]=perlquestion: print w/replies, xml ) Need Help??

slugger415 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am using XML::CSV to parse a very large (500MB) CSV file and convert it to XML. I keep getting an "out of memory" error during the parse.

Is there a way to "purge" data to free up memory during this process, like you can with XML::Twig? (Though I don't see anything about handlers in the doc....)

FWIW:

my $csv_obj = XML::CSV->new( error_out => 1 ); $csv_obj->{column_headings} = \@heads; my $status = $csv_obj->parse_doc($file);

thanks!

Replies are listed 'Best First'.
Re: XML::CSV out of memory
by stonecolddevin (Parson) on Mar 26, 2015 at 16:19 UTC

    In all honesty if you have a modern amount of RAM on your system, 500MB isn't that big.

    That said, in the docs, it looks like you can do this:

    use XML::CSV; $default_obj_xs = Text::CSV_XS->new({quote_char => '"'}); $csv_obj = XML::CSV->new({csv_xs => $default_obj_xs}); $csv_obj->{column_headings} = \@arr_of_headings; $csv_obj->{column_data} = \@arr_of_data; $csv_obj->print_xml("out.xml");

    Basically, notice that you can pass a Text::CSV_XS object to XML::CSV, which would allow you to read in $n lines of the CSV file, and then pass them in as an array to XML::CSV. This would be fairly trivial to do, but if you can't get it figured out, come back and we can help you through it.

    Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

      I guess we dont know how much RAM is required without seeing the input CSV data, and the derived XML, the in memory parse of the CSV will be many factors larger than the original file.
      That said, if you have access to a machine with a lot more RAM this can't hurt.


      This is not a Signature...

        I agree but, even the smallest linode and EC2 instances have a gig of RAM, and DigitalOcean is 512MB, which squeaks by. I digress, but I'm just saying, it would be a big difference it it were a 500GB file being parsed.

        Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

      Thanks for the replies and suggestions. Yes it is strange, the system now has 12 gigs of memory. My own workstation has 16 gigs and I'm not getting the out of memory error running the same script. Anyway.

      So it's probably a dumb question, but do I even need Text::CSV for your suggestion? Couldn't the script just put $n lines of text into an array and feed those to @arr_of_data? And how would I do $csv_obj->print_xml("out.xml") at each iteration without clobbering the previous iteration? (ok, you can tell I'm still learning here - begging patience...)

      Thanks.

        do I even need Text::CSV for your suggestion? Couldn't the script just put $n lines of text into an array and feed those to @arr_of_data?

        Text::CSV will insure you are reading complete records. If you can be sure that your input files have no embedded "new lines" in the CSV records, then you could skip Text::CSV.

        And how would I do $csv_obj->print_xml("out.xml") at each iteration without clobbering the previous iteration?

        Try (untested):

        open(my $outFH, '>', "out.xml"); while (<>) { ...; $csv_obj->print_xml($outFH); }

        The example seems to call for Text::CSV_XS in this instance:

        $default_obj_xs = Text::CSV_XS->new({quote_char => '"'}); $csv_obj = XML::CSV->new({csv_xs => $default_obj_xs});

        It's really not hurting you to use it, so you might as well honestly.

        Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

Re: XML::CSV out of memory
by monkey_boy (Priest) on Mar 26, 2015 at 16:21 UTC

    From the XML::CSV docs:
    At this point it does not allow for a write as you parse interface but is the first upgrade for the next release.
    So either wait for the next release, or I'd attempt dividing the large file into many smaller chunks and chuck these through XML::CSV, then combine output if required.


    This is not a Signature...

      So either wait for the next release, ...

      Ha ha wait another 14 years? XML-CSV-0.15 15 May 2001

      Grab XML::Writer and Text::CSV_XS and write something "original" (or just modify csv2xls )

      Grab DBD::AnyData and write something shorter

        Yea I actually agree with this the most. There are far too many options for this to dick around with something that's not immediately easy.

        Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

Re: XML::CSV out of memory
by sundialsvc4 (Abbot) on Mar 27, 2015 at 16:49 UTC

    You could be running into a memory-problem in any one of three places:

    1. The CSV import.
    2. The XML translation.
    3. An unknown bug in a module that hasn’t been attended-to in a long time.
    4. In this case, I think I would just resort to my own logic ... using CPAN modules to do all the heavy-lifting but not to perform the entire task.   For instance, use a CSV module to read the file line-by-line.   Now, build an in-memory hash structure conforming to the XML structure you want to build.   (Pause and verify that everything works so-far ...)   Then, use an XML module to write it out.

    With the amount of RAM that you say you have, the only really-plausible explanation for what you are seeing is ... a bug.   Somewhere.   And, in this case, I would just drive around it.   The obvious path to “get ’er done” is plain, and does not obligate the use of a module that might in fact be buggy.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1121411]
Front-paged by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-04-16 09:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found