Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Hi, I have a mega XSLT batch processing job to perform. Somewehere in the bowels of the building sits an AIX box with Informix/SAP on it. Stored in there are ~1600 product descriptions. The SAP Business Connector web interface can spit the product descriptions out as a single nested XML file.

On a Linux or NT box, a simple Perl script uses the nesting, and some simple rules, parses ths XML file using XML::Parser in stream mode and generates a nested set of directorites, and various descriptive text and XML files over the newly created directory tree.

I now traverse the director tree and find all the text files, and create appropiate HTML pages from them. That bit is easy, the problem I face is running XSLT on ~1600 xml files to get HTML.

I did a simple bench mark on NT: Instant Saxon 6.5; Xalan 1.3 (c++) and XML::LibXSLT, and found that the Java start up on Saxon makes it massivly slower to run than Xalan or the LibSXLT solution - assuming that the XSLT job is small and simple. Xalan is twice as fast as LibXSLT when both are called via a system call, but when in-line, LibXSLT is much faster.

Given that I have ~1600 XML files to transofrm to HTML, I can do this one of two ways, build a list and pass them one at a time to Xalan (it was faster than either Saxon or LibXSLT), or use LibXSLT from within the scipt that finds them (which should be the fastest method given simple transformations).

I'm not worried about raw speed, it will run in batch mode, but I would like it to finish in hours rather than days.

I'm also would not like a pure Perl solution to fail from a leak of some sorts, ~1600 XSLT calls in one process is a lot, and I'd rather not have to do it several times.

In summary I plan to:

  • Parse one big XML file into a series of directories and smaller XML and control text files - PASS 1
  • Convert the text files into HTML pages (the indexes of each folder) - PASS 2
  • Run XSLT on the ~1600 XML files, to generate HTML - PASS 3

I'll do this serveral times per language, and assuming all works, probably once per week as the product database underneath changes.

I know this is very brute-force, are there better approaches to the problem? give than I'm not allowed to use DBI to get data directly from the underlying database.

Hints, tips and suggestions, warmly accepted,

As every, my humble thanks in advance...


In reply to Mega XSLT Batch job - best approach? by ajt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2022-10-03 22:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (15 votes). Check out past polls.

    Notices?