|The stupid question is the question not asked|
Re: sourcefilter with complete parser?by roboticus (Chancellor)
|on Dec 10, 2013 at 20:10 UTC||Need Help??|
I've worked on several translation systems. The first converted programs from Z80 assembler on a Zilog development system to 8086 code, including the operating system calls. Most of that was trivial, as they're very compatible at the source code level. We also had translations for operating system calls, and recognition for various features. We fully automated it so that development/maintenance could continue on the original MC/Z system.
It's tremendously fun to work on a project like that. It's amazing how simple and straightforward the majority of the translation task is. Then as you get into the dusty corners, you find yourself doing much more work to get the next little bit done. In a project like this, I've found that the most common 80% of the syntax takes about 50% of the time to implement. The next 10% takes about 75% of the project time. Then the next 5% takes about 75% more of the project time, and the final 5% takes the remaining 100% of the project time. That's why the second translator, a COBOL to C translation system, wasn't completed. It looked like it was going to take well over the 330% of time alloted to the project. ;^) We did a proof-of-concept system pretty quickly. COBOLs variables were simple enough that variable management wasn't terribly difficult.
So the project started, and we moved along quickly, for a while. But the PERFORM statement (function/subroutine call) in COBOL is an ugly beast. For example, one variation is "PEFORM THROUGH <X>" which executes paragraphs A through X, including all the intermediates, and then return. So we backtracked out the paragraph <==> function, and instead coded things into a large straight-line function. Then, we had to add some funky data structure to the top of the stack telling us what was expected. So at the end of each paragraph, if it was a simple "PERFORM " statement, it would return at the end of when we reached it. If it was a "PERFORM THROUGH <X>" version, it would ignore the paragraph boundaries until it reached the end of paragraph <X>.
Of course, we also had to add more data structures to handle the TIMES, UNTIL, VARYING, etc. clauses.
I think one of the difficulties with that job was due to coming up with a proof-of-concept so quickly. It led to expectations that it would be a much simpler project than it actually was. It also was the project that made me start looking for a simple example of the most difficult construction when approaching a proof-of-concept. When the project was cancelled, we had pretty good coverage over most of the code base. Management asked for a "hands free" translator, and at that time, I didn't realize that we may have been able to renegotiate them into a 99+% translator with some manual intervention required.
The third one, translating one proprietary robot control language to another, was much simpler. The language wasn't general purpose, so it was much less trouble.
In general, it seems that it really is pretty simple to get a good chunk of code handled automatically. If you insist on fully automatic conversion with the entire syntax of the language enabled, though, things can get sticky. One problem is that sometimes to handle the general case of some peculiar bit of syntax may complicate other bits of code. If you can recognize the things that introduce sticky problems, then you could use the simpler code most of the time, and revert to the more complex code only when you detect the situation.
These jobs are high on my list of the most fun. (#1=robotics control, #2=custom industrial automation applications, #3=language translation/compilation, .... #65472=financial applications.)
It was quite a lot of fun, so if you're wanting to chat about some of those sorts of things, PM me and we can chat. Also, if you haven't yet, find a good compiler book and read it. I read the old Dragon book (Aho, Sethi and Ullman), and would recommend it--it was a fun read--though the optimization section was tough to read through, IIRC. There are probably a mess of more contemporary books, but I haven't read them.
When your only tool is a hammer, all problems look like your thumb.