Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: xml_split - split huge XML documents into smaller chunks

by grantm (Parson)
on Feb 11, 2005 at 08:11 UTC ( #430015=note: print w/replies, xml ) Need Help??


in reply to xml_split - split huge XML documents into smaller chunks

You're right, this question does arise frequently. In fact it's become a frequently asked question.

  • Comment on Re: xml_split - split huge XML documents into smaller chunks

Replies are listed 'Best First'.
Re^2: xml_split - split huge XML documents into smaller chunks
by mirod (Canon) on Feb 11, 2005 at 09:46 UTC

    OK, I get it, let me run the tests on all my machines here and I will upload 3.16 later today ;--)

    BTW if someone could test it on Windows, I would appreciate, I don't have any Win32 machine for testing at the moment. If someone could also check Bad newline interpretation by XML-Twig on Windows that would be even better.

    Thanks

      On using splitting and merging on Win.. After merging the xml compared with the original one gave: Bad newline interpretaion in the first (foo-00.xml) if this is about 30Kb of size. The smaller chunks, have no newline, but:

      from Twig :

      { <sometag></sometag> </end_tag_before_splitingone> }

      right one :

      { <sometag><![CDATA[]]></sometag> </end_tag_before_splitingone> }

      updated 2005-02-11 by mirod: added tags

Re^2: xml_split - split huge XML documents into smaller chunks
by Anonymous Monk on Jan 20, 2009 at 10:42 UTC
    Hi, Instead of putting each section in to a each file, can I put some first 1000 sections in one file, and next 1000 in other file and so on. When there are 16000 sections so many files getting created in hard to handle them.

      Sorry I did not see this follow-up. Yes you can. In recent versions of xml_split, the -g or the -s options should give you what you need:

        -s <size>
             generates files of (approximately) <size>. The 
             content of each file is enclosed 
             in a new element ("xml_split::root"), so itís 
             well-formed XML.  The size can be given in bytes,
             Kb, Mb or Gb.
      
         -g <nb>
             groups <nb> elements in a single file. The content
            of each file is enclosed in a new element 
           ("xml_split::root"), so itís well-formed XML.
      

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://430015]
help
Chatterbox?
[marioroy]: Today, wanted to revisit running parallel Re: Crash with ForkManager on Windows. I tried running on Cygwin for comparison.
[marioroy]: Running parallel on Cygwin feels like running on Unix ;-)
[marioroy]: Fortunately, Strawberry Perl v5.26 runs well.
[karlgoethebier]: marioroy: He! Magic Mushrooms was a talk in the CB some days ago ;-)

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2017-09-23 16:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (272 votes). Check out past polls.

    Notices?