Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Perl: the Markov chain saw
 
PerlMonks  

Re: xml_split - split huge XML documents into smaller chunks

by grantm (Parson)
on Feb 11, 2005 at 08:11 UTC ( #430015=note: print w/ replies, xml ) Need Help??


in reply to xml_split - split huge XML documents into smaller chunks

You're right, this question does arise frequently. In fact it's become a frequently asked question.


Comment on Re: xml_split - split huge XML documents into smaller chunks
Re^2: xml_split - split huge XML documents into smaller chunks
by mirod (Canon) on Feb 11, 2005 at 09:46 UTC

    OK, I get it, let me run the tests on all my machines here and I will upload 3.16 later today ;--)

    BTW if someone could test it on Windows, I would appreciate, I don't have any Win32 machine for testing at the moment. If someone could also check Bad newline interpretation by XML-Twig on Windows that would be even better.

    Thanks

      On using splitting and merging on Win.. After merging the xml compared with the original one gave: Bad newline interpretaion in the first (foo-00.xml) if this is about 30Kb of size. The smaller chunks, have no newline, but:

      from Twig :

      { <sometag></sometag> </end_tag_before_splitingone> }

      right one :

      { <sometag><![CDATA[]]></sometag> </end_tag_before_splitingone> }

      updated 2005-02-11 by mirod: added tags

Re^2: xml_split - split huge XML documents into smaller chunks
by Anonymous Monk on Jan 20, 2009 at 10:42 UTC
    Hi, Instead of putting each section in to a each file, can I put some first 1000 sections in one file, and next 1000 in other file and so on. When there are 16000 sections so many files getting created in hard to handle them.

      Sorry I did not see this follow-up. Yes you can. In recent versions of xml_split, the -g or the -s options should give you what you need:

        -s <size>
             generates files of (approximately) <size>. The 
             content of each file is enclosed 
             in a new element ("xml_split::root"), so itís 
             well-formed XML.  The size can be given in bytes,
             Kb, Mb or Gb.
      
         -g <nb>
             groups <nb> elements in a single file. The content
            of each file is enclosed in a new element 
           ("xml_split::root"), so itís well-formed XML.
      

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://430015]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2014-04-21 12:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (495 votes), past polls