Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: xml_split - split huge XML documents into smaller chunks

by grantm (Parson)
on Feb 11, 2005 at 08:11 UTC ( #430015=note: print w/replies, xml ) Need Help??


in reply to xml_split - split huge XML documents into smaller chunks

You're right, this question does arise frequently. In fact it's become a frequently asked question.

  • Comment on Re: xml_split - split huge XML documents into smaller chunks

Replies are listed 'Best First'.
Re^2: xml_split - split huge XML documents into smaller chunks
by mirod (Canon) on Feb 11, 2005 at 09:46 UTC

    OK, I get it, let me run the tests on all my machines here and I will upload 3.16 later today ;--)

    BTW if someone could test it on Windows, I would appreciate, I don't have any Win32 machine for testing at the moment. If someone could also check Bad newline interpretation by XML-Twig on Windows that would be even better.

    Thanks

      On using splitting and merging on Win.. After merging the xml compared with the original one gave: Bad newline interpretaion in the first (foo-00.xml) if this is about 30Kb of size. The smaller chunks, have no newline, but:

      from Twig :

      { <sometag></sometag> </end_tag_before_splitingone> }

      right one :

      { <sometag><![CDATA[]]></sometag> </end_tag_before_splitingone> }

      updated 2005-02-11 by mirod: added tags

Re^2: xml_split - split huge XML documents into smaller chunks
by Anonymous Monk on Jan 20, 2009 at 10:42 UTC
    Hi, Instead of putting each section in to a each file, can I put some first 1000 sections in one file, and next 1000 in other file and so on. When there are 16000 sections so many files getting created in hard to handle them.

      Sorry I did not see this follow-up. Yes you can. In recent versions of xml_split, the -g or the -s options should give you what you need:

        -s <size>
             generates files of (approximately) <size>. The 
             content of each file is enclosed 
             in a new element ("xml_split::root"), so itís 
             well-formed XML.  The size can be given in bytes,
             Kb, Mb or Gb.
      
         -g <nb>
             groups <nb> elements in a single file. The content
            of each file is enclosed in a new element 
           ("xml_split::root"), so itís well-formed XML.
      

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://430015]
help
Chatterbox?
[james28909]: check if it is a hash or an array ect with ref in one sub.
[james28909]: like all the subs calling get_data get_array get_ect, you could just use get_data. once you send the data to get_data, check if it is a hash or an array ect. and do functions for each, in one sub
[james28909]: that looks like it would be alot more work than just renaming the sub though lol
[Lady_Aleena]: james28909, did you even look at the subs?
[james28909]: yes
[Lady_Aleena]: And you see how the first two are vastly different than the third, right?

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2017-05-24 04:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?