http://www.perlmonks.org?node_id=231591

Elgon has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks,

I've got a quick query: I'm writing an assembler for the venerable Zilog Z80 microprocessor (nostaligia caught up with me) and I've been doing it in my typically brute-force manner. This will take a file of assembly code with comments and so forth and convert it into a file of hex object code. This isn't really much of a problem per se but I'd like to try and do it well or even stylishly rather than by the BF&I approach.

The first pass looks up instructions that never vary (i.e. ones which don't include any user data or addresses) such as things like im 0 or ex sp,ix in a two dimensional array to find the hex.

The second pass will, when I get around to writing it, convert the remaining instructions to their hex equivalents using a series of nested if(){} constructions containing regexps (mostly.)

Now that I've done the explanation, here's the query: Is there a better way of doing this? Although the Z80 is an 8 bit CPU it has rather a lot of instructions in its set and the array holding the data for the first pass is going to be pretty big (something like 500 * 2 elements in it.) Is there a better way?

Secondly, I would prefer to use something like a hash but many of the instructions contain spaces, commas etc... Should I just use a regexp to turn these into underscores or similar and just turn the array into a hash? Is this frowned upon?

I suppose that what I'm really asking is this - can anyone point me to a tutorial or short primer on assembler writing?

Thanks, Elgon.

"What this book tells me is that goose-stepping morons, such as yourself, should read books instead of burning them."
       - Dr. Jones Snr, Indiana Jones and the Last Crusade

Replies are listed 'Best First'.
Re: Z80 Assembler Questions
by Aristotle (Chancellor) on Jan 31, 2003 at 13:56 UTC

    This is all a bit vague, but I suppose there is a system behind the way the various variants of an instruction translate into bits. In that case, I'd look up the translation for the instruction and the translation for its parameters separately and then combine them; this should shrink the table by an order of magnitude.

    I'd parse the instructions into their tokens and then use some form of dispatch hash onto those, but lacking an specification to work with, I have no idea what it would have to look like in your case.

    Makeshifts last the longest.

      Thanks Aristotle, I've been trying to see if I could find a specification online to see how instructions are built up from bit fields - I may have found something, in which case I'll go via the parsing and tokenising method, building up each byte from its components. Otherwise I may have to go via the hash route.

      Elgon

      UPDATE: Unfortunately, the information I was looking for on how opcodes are built up was no good, on the other hand after tokenisation I've worked out how to use a system of hashes-of-hashes to make it a bit simpler.

      "What this book tells me is that goose-stepping morons, such as yourself, should read books instead of burning them."
             - Dr. Jones Snr, Indiana Jones and the Last Crusade

        I remember the first Z80 assembler I wrote, about 20 years ago. Back then we had very limited resources (on the computer), so were forced to do a fair amount of work on paper.

        The way to identify the construction of the instruction formats is to create a 16x16 grid (for each hex digit in the basic instructions). When you place the instructions on this grid it will become very obvious how the instructions are structured. You'll even see how the designers used a few meaningless instructions (e.g. ld a,a) to find encodings for other instructions (e.g. HALT). Once you've done the basic instructions, overlay the extention opcode tables: you'll find that the IX/IY instructions closely map onto the HL/DE register instructions. You'll also find that you can guess a few "undocumented" instructions in the CB extention set -- there's one empty column, IIRC).

        I've sorry I can't remember more of the details: it was a long time ago, and I'm suddenly feeling old. --Dave

Re: Z80 Assembler Questions
by Anonymous Monk on Jan 31, 2003 at 14:46 UTC
    Go read perldata again.
    $HASH{qq[ ANY THING CAN GO HERE, pack AWAY]}=1;

    edited: Sat Feb 1 16:04:56 2003 by jeffa - toned that font down a bit