Building Machines In Code – Part 9

This entry is part 9 of 9 in the series Building Machines in Code

Post Stastics

  • This post has 2440 words.
  • Estimated read time is 11.62 minute(s).

Tooling for the Tiny-T

When we completed the console, last installment, I had said I was unsure what I would cover next. I’m really wanting to begin implementing our audio device, but I felt that adding a GUI for the Tiny-T system was a target much more achievable in the limited time I had. However, before we can create a GUI for the Tiny-T, we will need an assembler, loader, and disassembler for our new CPU. The GUI’s main window will provide an assembly listing of our program and also display a range of memory addresses and their content. Also, the CPU status flags and registers will be shown. The GUI will provide a platform for the user to write their program, assemble it, and step through their program or run it to completion. Another option I want to add is the ability to load a binary file and display its corresponding assembly listing. This is why we need a toolset before we can create GUI.

Our goal today is to build our toolset, beginning with our assembler. The assembler for the Tiny-T is very similar to the one we created back in part 5 for the Tiny-P. However, our instructions have changed as well as their encoding. Because of this, we will need to rewrite parts of our assembler. Since we covered the workings of the assembler back in part 5, I’m just going to present the code and then discuss the changes from the Tiny-P’s assembler. Here’s the code for the Tiny-T’s assembler:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# File: assembler.py
""" Tiny-T Assembler

...

# Opcode table relates mnemonics
# to the corresponding opcode value.
OPCODE_TABLE = {
    'htl': 0x0,
    'lda': 0x1,
    'sta': 0x2,
    'add': 0x3,
    'sub': 0x4,
    'and': 0x5,
    'or': 0x6,
    'xor': 0x7,
    'not': 0x8,
    'shl': 0x9,
    'shr': 0xA,
    'bra': 0xB,
    'brp': 0xC,
    'brz': 0xD,
    'inp': 0xE,
    'out': 0xF
}


class Lexer:
    def __init__(self):
        self.line = None
        self.tokens = []

    def set_text(self, line: str):
        self.line = line
        self.tokens = line.split()

    def next_token(self):
        if not self.tokens:
            return None
        tok = self.tokens.pop(0)
        return tok


class Assembler:
    def __init__(self, lexer: Lexer, _text: str):
        self.text = _text
        self.lines = self.text.split('\n')
        self.current_address = 0
        self.opcode = 0
        self.operand = 0
        self.lexer = lexer
        self.symbol_table = {}
        self.code = []

    def skip_spaces(self, tok: str):
        while tok.isspace():
            tok = self.lexer.next_token()

    def skip_comment(self, tok: str):
        if tok == '#':
            while tok:
                tok = self.lexer.next_token()

    def is_hex(self, tok: str) -> bool:
        if tok.startswith('0x') or tok.startswith('0X'):
            try:
                op = int(tok[2:], 16)
            except ValueError:
                return False
            return True
        return False

    def from_hex(self, tok: str)-> str:
        if self.is_hex(tok):
            val = str(int(tok[2:], 16))
            return val

        msg = f"Can not convert {tok} to integer value"
        raise ValueError(msg)



    def fixup(self):
        text_ = ''
        for line in self.code:
            parts = line.split(':')
            addr = parts[0]
            sub_parts = parts[1].split('-')
            opcode = sub_parts[0]
            operand = sub_parts[1]

            if operand.isalnum() and not operand.isnumeric() and not self.is_hex(operand):
                if operand in self.symbol_table:
                    operand = self.symbol_table[operand]
                else:
                    msg = f"Undefined Symbol: {operand}"
                    raise ValueError(msg)
            elif self.is_hex(operand):
                operand = self.from_hex(operand)

            bin_code = (int(opcode) << 12) + int(operand)
            if bin_code > 0xFFFF:
                raise ValueError(f"Illegal Machine Code Value {bin_code}")
            code_line = f'{addr.zfill(4)} {bin_code}\n'
            text_ += code_line

        return text_



    def parse(self):
        for line in self.lines:
            line = line.lower()
            self.opcode = 0
            self.operand = 0

            self.lexer.set_text(line)

            tok = self.lexer.next_token()
            code_text = ''
            while tok is not None:
                self.skip_spaces(tok)

                if tok is None or not tok:
                    break

                elif tok.endswith(':'):
                    # LABEL _DECL
                    key = tok[:-1]
                    self.symbol_table[key] = self.current_address


                elif tok == '#':

                    # COMMENT

                    self.skip_comment(tok)

                    break


                elif tok.endswith('.'):

                    # DIRECTIVE

                    if tok[:-1] == 'org':

                        operand = self.lexer.next_token()

                        if operand.isnumeric():

                            self.current_address = int(operand)

                        elif self.is_hex(operand):
                            try:
                                operand = int(operand[2:], 16)
                            except ValueError:
                                msg = f'Illegal value given. Expected int or hex, got {operand}'
                                raise ValueError(msg)

                        else:
                            msg = f'Illegal Origin. Expected: integer, Found {operand}'
                            raise ValueError(msg)

                        break

                elif tok in OPCODE_TABLE.keys():

                    # INSTRUCTION

                    self.opcode = OPCODE_TABLE[tok]

                    operand = self.lexer.next_token()

                    if operand.isnumeric():

                        self.operand = operand

                    elif self.is_hex(operand):
                        self.operand = self.from_hex(operand)

                    elif operand.isalnum():

                        if operand in self.symbol_table:
                            self.operand = self.symbol_table[operand]

                        elif self.is_hex(operand):
                            self.operand = self.from_hex(operand)

                        else:
                            self.operand = operand

                    elif operand.startswith('#'):

                        self.operand = 0

                        self.skip_comment(operand)

                    self.code.append(f"{self.current_address} : {self.opcode}-{self.operand}")

                    self.current_address += 1

                tok = self.lexer.next_token()

        code_text = self.fixup()

        return code_text


import sys, getopt


def main(argv):
    inputfile = ''
    outputfile = ''
    usage_message = "Usage: assembler.py -i <inputfile> -o <outputfile>"

    try:
        opts, args = getopt.getopt(argv, "hi:0:", ["help", "ifile=", "ofile="])
    except getopt.GetoptError:
        print(usage_message)
        sys.exit(2)

    for opt, arg in opts:
        if opt in ('-h', '--help'):
            print(usage_message)
            sys.exit()
        elif opt in ('-i', '--ifile'):
            inputfile = arg
        elif opt in ('-o', '--ofile'):
            outputfile = arg

    if not inputfile:
        print(usage_message)
        sys.exit(2)

    # If only input file given default output file to <inputfile>.bin
    if inputfile and not outputfile:
        outputfile = inputfile.split('.')[0] + '.bin'

    with open(inputfile, 'r') as ifh:
        program_text = ifh.read()
    ifh.close()

    # Assemble program
    assembler = Assembler(Lexer(), program_text)
    machine_text = assembler.parse()

    # Write output file
    if machine_text:
        with open(outputfile, 'w') as ofh:
            ofh.write(machine_text)
        ofh.close()
    else:
        msg = f'Unable to assemble output file {inputfile}'
        raise AssertionError(msg)

    # Exit message
    print(f"Assembled: {inputfile} and wrote machine code to {outputfile}")


if __name__ == '__main__':
    main(sys.argv[1:])

The first thing you will notice is that I have removed the MNEMONICS list. This was not needed as we can simply use the OPCODE_TABLE keys. In addition, the OPCODE_TABLE’s contents had to change to support our new instruction set.

Instead of creating the lexer in the assembler class, I did a little dependency injection and passed a fully instantiated lexer into the assembler’s __init__() method.

Since our new assembly language allows hexadecimal values, we need two additional methods is_hex() and from_hex(). The first returns a boolean True if the string passed in represents a hexadecimal value. The second will convert a hex string to an integer string.

Under Python 3.11 my f-strings weren’t working inside exceptions. So I moved the message composition outside the exceptions calls and assigned them to a variable. I’ll figure this out and fix it later.

Inside fixup() we need to make a few changes. The first if statement inside the for loop needs to have a new condition added to it. Change this line:

if operand.isalnum() and not operand.isnumeric():

to this:

if operand.isalnum() and not operand.isnumeric() and not self.is_hex(operand):

Then we need to add a new elif branch:

            ...
            elif self.is_hex(operand):
                operand = self.from_hex(operand)
            ...

In the line that assembles our opcode and operand into a single value and stores it in bin_code we originally multiplied our opcode value by 100 to shift its position left two digits. In our new assembler, we are dealing with a 16-bit value and need to shift our opcode left 3 hexadecimal digits or 12-bits. So, change:

            ...
            bin_code = (int(opcode * 100) + int(operand)
            ...

to:

            ...
            bin_code = (int(opecode) << 12) + int(operand)
            ...

Next, to allow hexadecimal values in our assembler directives (ORG.), we need to parse them. This means making calls to is_hex() and from_hex() inside the directive branch of our parsing code.

When we create our assembler instance inside main(), we need to first create an instance of our Lexer() to pass to the Assembler() which now uses dependency injection.

Lastly, we should add some error testing and handling before we attempt to write out our outputfile.

Hopefully, I haven’t missed anything. You can pull the new code from the repo and diff the files to be sure.

Testing the Assembler

With the changes made to our new assembler, we are now ready to test it. We won’t write a full test suite but we will write a simple four-line assembly program, assemble it, and run the binary.

To begin, create a new sub-directory in the part-9 folder named “asm”. Then add a new file named “echo.asm”. Inside this file add the following assembly code:

        ORG.    0x0000

start:  INP 0x0FE   # Read console input        0xE0FE
        OUT 0x0FF   # Write back to display     0xF0FF
        BRA start   # Loop                      0xB000

As you can see our little program simply echos anything we type into the console back out to the console.

Now assemble the program:

> python3 assembler.py asm/echo.asm

This should create a new file, “asm/echo.bin” with the following contents:

000  57598
001  61695
002  45056

This is the machine code for our program. If your file doesn’t contain exactly this, then you need to check your assembler. Don’t move on until you have the assembler working correctly.

The Loader

The loader is the next tool in our small arsenal. As you recall from Part-6, our loader is responsible for reading our machine code file and loading its contents into memory. Because we covered loaders in Part-6, I won’t cover it here. I’ll just show the code and let you dif it with the loader from Part-6.

Most of the changes have to do with the Tiny-P having a program() method while the Tiny-T uses a write() method. The main() method had to change considerably do to the fact that the CPU is no longer a stand-alone device. Here’s the code:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
""" Tiny-T CPU Simulator.
    ...
"""

# Tiny-T Machine Code Loader
# Assumes machine code is stored in
# a *.bin file and is formatted as:
# <address> <opcode>
# Where the address is a 4-digit decimal
# value and the opcode is a 4-digit
# decimal value.

import sys
import getopt

from cpu import CPU
from bus import Bus
from memory import Memory
from console import Console


class Loader:
    def __init__(self, cpu: CPU, code_text: str):
        self.machine_code = code_text
        self.code = self.machine_code.split('\n')
        self.cpu = cpu

    def load(self):
        for line in self.code:
            code = line.split()
            if len(code) == 2:
                addr = int(code[0])
                opcode = int(code[1])
                self.cpu.write(addr, opcode)


def dump(cpu: CPU):
    print(f"ACC: {cpu.accumulator}, PC: {cpu.program_counter}, Z: {cpu.z_flag}, P: {cpu.p_flag}")
    print('\n')


def dump_mem(mem: list):
    for i, data in enumerate(mem):
        if i % 16 == 0: print(f"\n{i} : ", end='')
        print(f" {data}, ", end='')
    print()


def main(argv):
    inputfile = ''
    usage_message = "Usage: assembler.py -i <inputfile> "

    try:
        opts, args = getopt.getopt(argv, "hi:", ["help", "ifile="])
    except getopt.GetoptError:
        print(usage_message)
        sys.exit(2)
    for opt, arg in opts:
        if opt in ('-h', '--help'):
            print(usage_message)
            sys.exit()
        elif opt in ('-i', '--ifile'):
            inputfile = arg

    if not inputfile:
        print(usage_message)
        sys.exit(2)

    with open(inputfile, 'r') as ifh:
        program_text = ifh.read()
    ifh.close()

    # Build up Computer Stem
    ram = Memory(64, 16)
    con = Console()
    bus = Bus()
    bus.register_handler(ram)
    bus.register_handler(con)
    cpu = CPU(bus)

    # Loader Program
    loader = Loader(cpu, program_text)
    loader.load()

    # Exit message
    print(f"Loader: {inputfile} loaded in to cpu.")
    print(f"Ready to run!")

    # Run the program
    cpu.run()


if __name__ == "__main__":
    main(sys.argv[1:])

As you can see there isn’t much difference between the loader presented in Part-6 and the one presented above.

You should be able to run the loader, passing it your echo.asm file and get an operating Tiny-T system waiting for your input.

Disassembler

In preparation for our GUI, I want to add a disassembler to our arsenal of tools. What is a disassembler? It’s a program that takes in machine code and spits out assembly code.

Our disassembler will take each word of machine code and split it into its corresponding opcode and operand. Then we only need to look up the opcode value in a table to locate the mnemonic. The majority of our disassemblers will deal with handling command-line arguments. The meat of our disassembler is contained in two static methods. The disasm() method takes in our program text read from the input file in main() and splits it into lines. It then walks over each line splitting the line into the address and instruction components. Next, it calls the static method decode() to convert the instruction code into its mnemonic and operand, format them as a line of text, and return that text back to disasm(). The disasm() method then collects these lines of decoded instructions into the asm_text variable and returns this text back to main(). The main() function then writes this text out to our output file.

That was easy, right? A disassembler in about 20 lines of code. The rest of the program is just file handling. Since the command-line option handling is the same as for our assemblers presented earlier, I won’t discuss this part of the program.

Homework

Give the disassembler a try. Make sure it produces the proper output for your echo.bin file and make a test.bin file containing each instruction, then run disassemble on it and inspect it to ensure proper disassembley.

Conclusion

In this post, we have prepared ourselves for developing a GUI to support our Tiny-T computer system. We implemented an assembler, loader, and disassembler. Much of this work was familiar to us and differed only slightly from some of our previous projects so, I didn’t give a detailed explanation of the code.

I would recommend you play with the code and get familiar with it. Try to write these tools yourself, from scratch. In the future, we will create more complex tooling, and having a good foundation will help.

In our next installment, we will begin creating a graphical user interface for the Tiny-T. This GUI will most likely be built using pySimpleGui or Qt. I haven’t quite decided yet. But I have found pySimpleGui in the past, to be easy to use and quick to develop GUI applications.

Until next time: Happy Coding!

Resources

You can find the code for this post on GitHub at: https://github.com/Monotoba/Building-Machines-In-Code

Series Navigation<< Building Machines In Code – Part 8

12 comments on “Building Machines In Code – Part 9

Leave a Reply

Your email address will not be published. Required fields are marked *