Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Hi, I have a mega XSLT batch processing job to perform. Somewehere in the bowels of the building sits an AIX box with Informix/SAP on it. Stored in there are ~1600 product descriptions. The SAP Business Connector web interface can spit the product descriptions out as a single nested XML file.

On a Linux or NT box, a simple Perl script uses the nesting, and some simple rules, parses ths XML file using XML::Parser in stream mode and generates a nested set of directorites, and various descriptive text and XML files over the newly created directory tree.

I now traverse the director tree and find all the text files, and create appropiate HTML pages from them. That bit is easy, the problem I face is running XSLT on ~1600 xml files to get HTML.

I did a simple bench mark on NT: Instant Saxon 6.5; Xalan 1.3 (c++) and XML::LibXSLT, and found that the Java start up on Saxon makes it massivly slower to run than Xalan or the LibSXLT solution - assuming that the XSLT job is small and simple. Xalan is twice as fast as LibXSLT when both are called via a system call, but when in-line, LibXSLT is much faster.

Given that I have ~1600 XML files to transofrm to HTML, I can do this one of two ways, build a list and pass them one at a time to Xalan (it was faster than either Saxon or LibXSLT), or use LibXSLT from within the scipt that finds them (which should be the fastest method given simple transformations).

I'm not worried about raw speed, it will run in batch mode, but I would like it to finish in hours rather than days.

I'm also would not like a pure Perl solution to fail from a leak of some sorts, ~1600 XSLT calls in one process is a lot, and I'd rather not have to do it several times.

In summary I plan to:

  • Parse one big XML file into a series of directories and smaller XML and control text files - PASS 1
  • Convert the text files into HTML pages (the indexes of each folder) - PASS 2
  • Run XSLT on the ~1600 XML files, to generate HTML - PASS 3

I'll do this serveral times per language, and assuming all works, probably once per week as the product database underneath changes.

I know this is very brute-force, are there better approaches to the problem? give than I'm not allowed to use DBI to get data directly from the underlying database.

Hints, tips and suggestions, warmly accepted,

As every, my humble thanks in advance...


In reply to Mega XSLT Batch job - best approach? by ajt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others about the Monastery: (9)
    As of 2014-09-21 16:20 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (172 votes), past polls