Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: Parsing BNF syntax diagrams..

by kvale (Monsignor)
on Aug 20, 2004 at 15:22 UTC ( #384619=note: print w/replies, xml ) Need Help??

in reply to Parsing BNF syntax diagrams..

Programs in most computer languages, like Perl, are one-dimensional linear strings. The 'language' above (these used to be called railroad diagrams) is two-dimensional and parsing nonlinear languages is known to be difficult in general.

Another example of a two-dimensional language is Befunge, a language whose sole purpose is to be hard to parse :-) You might try looking at Befunge parsers to get an idea of how to go about it.

I have not tried to understand your compiler, but if I was to go about this, I would split each diagram into a two-dimensional array of single characters. Then I would 'walk' the diagram, building up a topological graph. Finally, I would convert that graph/NFA into a grammar.

But the fastest method might just be to convert the diagrams manually yourself.


Replies are listed 'Best First'.
Re^2: Parsing BNF syntax diagrams..
by dazzle (Sexton) on Aug 20, 2004 at 21:06 UTC
    You can find a dotted-decimal representation of the same syntax diagrams (meant for screen readers) in the DB2 for zSeries online documentation which should be much easier to parse. Unfortunately, getting to the dotted-decimal representation isn't all that easy... here's a quick summary.
    1. Open the DB2 for zSeries information center in your Web browser.
    2. Find the statement or command you're interested in.
    3. View the source of the frame to find <img src="c.gif" alt="Read syntax diagram" longdesc="syntax.htm" border="0" />. syntax.htm will be some long HTML filename that contains the dotted-decimal version of the syntax.
    4. View the dotted-decimal syntax by copying the frame URL and changing the filename to point to the dotted-decimal filename. The DB2 for zSeries information center uses frames heavily, so the base URL doesn't change to reflect the content of the page you're looking at.

    Here's an example of two "normal" syntax diagrams and their dotted decimal equivalents (1 (subselect) and 2 (select-clause)).

    These should be much easier to parse if you can get LWP to grab the dotted-decimal syntax files.

    Update: Duh, here's a page that describes the dotted-decimal syntax format.

    Update 2: If you need the DB2 for Linux, UNIX, Windows version of the syntax, you can install the DB2 8.2 Information Center locally, unzip the files in eclipse/plugins/, and work with the HTML files directly rather than going through LWP and the Web. Nice way of avoiding the relative URI problems with the framesets, too. Unfortunately, DB2 8.2 for Linux, UNIX, and Windows won't be released for a little while -- in the meantime, if you can still get your hands on the DB2 Stinger beta information center that will also do the job. Versions of DB2 prior to 8.2 don't include the dotted-decimal syntax diagrams.

      TIMTOWTSAC++ (There's more than one way to skin a cat.)

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://384619]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2019-08-21 09:05 GMT
Find Nodes?
    Voting Booth?

    No recent polls found