Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Looking for parser of "bookmaster" script files

by slugger415 (Scribe)
on Nov 29, 2011 at 15:56 UTC ( #940647=perlquestion: print w/replies, xml ) Need Help??
slugger415 has asked for the wisdom of the Perl Monks concerning the following question:

I have been given a set of "SCRIPT" files in an archaic markup language (IBM called it "bookmaster") that I need to parse and ultimately convert to XML. Is anyone aware of such a parser?

Here's some sample code with a paragraph (p), an explanation (xpl), and an unordered list (ul/li):

:p.You have done something wrong. :xpl. This error occurs when the input entered on the QMF command line: :ul compact. :li.Is not a valid QMF command :li.Is a valid QMF command, but is one that cannot be issued from the QMF command line :eul.

It uses the markup syntax colon-name-period, with an attribute sometimes preceding the period (as in :ul compact. above). Sometimes there are opening and closing elements (as with :ul. and :eul.) but mostly only opening elements.

I realize this is a long shot but I'm trying to avoid having to write my own parser. Thanks for any tips!


Replies are listed 'Best First'.
Re: Looking for parser of "bookmaster" script files
by Khen1950fx (Canon) on Nov 29, 2011 at 17:48 UTC
    XML::TMX::Writer has a DATATYPE option 'ipf': IPF/Bookmaster. This was my first try at using it:
    #!/usr/bin/perl use strict; use warnings; use XML::TMX::Writer; my $tmx = new XML::TMX::Writer(); $tmx->start_tmx( OUTPUT => '/root/Desktop/log.tmx', SEGTYPE => qw[ :p :xpl :ul :li :eul ], DATATYPE => 'ipf', ); $tmx->end_tmx();
    Since this was my first time, it probably will need considerable tweeking:). It produced this doc:
    <?xml version="1.0" encoding="UTF-8"?> <tmx version="1.4"> <header o-tmf="plain text" adminlang="en" creationdate="20111129T1731 +29Z" creationtoolversion="0.23" creationtool="XML::TMX::Writer" srcla +ng="en" segtype="sentence" datatype="ipf"> </header> <body> </body> </tmx>
      Ah yes, I did see that but wasn't sure if it was Bookmaster -- I appreciate the example; this is helpful. Thanks so much to all!


Re: Looking for parser of "bookmaster" script files
by Old_Gray_Bear (Bishop) on Nov 29, 2011 at 17:45 UTC
    Doing a quick Google Search (bookmaster +xml) turns up a Yahoo Groups thread from 2002 In addition, there are several for-hire firms that will do the conversion for you, depending on your time-frame and budget.

    As I recall from playing around with Script (SGML's predecessor) conversions, the language was simple to parse -- look for the colon in position 1, extract the next three characters, do a table look up to get the appropriate subroutine (all written in Basic Assembler, mind you) and go do it. Where we got ourselves really knotted up is with 'user extensions', macros that looked like markup, but weren't (:xpl in your example). We finally ended up running the Script Processor with the command-option that expanded all macros inline and then we converted the resultant. It really bulked up some of the documents -- 500 lines going in, 5000 lines of output was not unusual.

    I Go Back to Sleep, Now.


Re: Looking for parser of "bookmaster" script files
by Anonymous Monk on Nov 29, 2011 at 16:47 UTC
    I can't imageine you're looking for a "parser"| IBM: Description of B2H

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://940647]
Approved by JSchmitz
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2018-06-20 23:33 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (117 votes). Check out past polls.