- Implementing Stack Oriented Languages
- Implementing Stack Oriented Languages — Part 2
- Implementing Stack Oriented Languages – Part 3
- Implementing Stack Oriented Languages – Part 4
Post Stastics
- This post has 4325 words.
- Estimated read time is 20.60 minute(s).
Strings and Variable
So far we've added lots of stack operations and a couple I/O routines with the KEY and EMIT keywords. At the moment if we wanted to write a simple "Hello World" application we would need to place each character on the stack and pop them off using the EMIT keyword. This is messy and bit difficult to manage in a larger program. What we need is a way to store strings somewhere and then a way to tell our language where to find the string we wish to manipulate. So how can we do this?
Well, a simple straight-forward solution would be to place all strings in a string table. A string table is an area of memory where we store the strings. Then we need to tell our language how to find the string. In a compiled version of a Stack-Oriented language, we would use a pointer to the string, passing its address via the stack to any method that needed to manipulate the string. In an interpreted version of the language, we only need to pass an index into our list of strings. Sounds pretty easy, right? Well, it's a bit more complicated. When the parser encounters a string we will need to store it in the string table and then place the string's address (index) onto the stack. Since managing the string table is a bit more work than we want the parser to do, and in keeping with the Single Responsibility Principle, we will create a StringTable class to keep track of the strings and their addresses.
Our string table class will need a list to store strings into, a save method that will return the index of the string in the table. A get method to get the string by index and a length method to return the length of the string.
Our strings will need to store two peices of data. The actual string (value) and it's length. So we have to determine if we will return the object containing this information of just the printable string contents. I opted to provide methods for both. The get(idx) method will return the string data object while the get_string(idx) method will extract the string contents and return only the printable string.
Ok, it's time to see what this StringTable class looks like:
# string_table.py class StringTable: def __init__(self): self.data = [] # Insert string and meta data into string list def save(self, string): entry = {'len': len(string), 'val': string} self.data.append(entry) return (len(self.data) - 1) # Return string storage object (including length) def get(self, idx): return self.data[idx] # Return length of string def length(self, idx): return int(self.data[idx]['len']) # Return printable string only (no meta data). def get_str(self, idx): return str(self.data[idx]['val']) def __str__(self): return str(self.data)
As you can see the string table couldn't be much simpler. Now we need to handle inserting strings in the parser. Our parser will need to take the string from the token passed back from the lexer, save it in the string table, and push it's address (index) on to the stack. Then it will need to push the string length onto the stack. Once the string is encountered it is up to the next operation to handle the length-address pair.
Place the following code in the parser after the arithmetic and logic operations and before the section of code to handle keywords.
# parser.py ... # Handle Strings and Characters elif tok.type == TT_STRING: string = self.current_tok.value addr = self.string_table.save(string) self.data.push(addr) self.data.push(len(string)) # Handle KEYWORDS elif tok.type == TT_KEYWORD: ...
Now before we can run this code we need to first import it and add it to our parser object at the bottom of the __init__() method:
def __init__(self, lexer: Lexer, data_stack: Stack = None): ... # Setup string table if string_table is not None: self.string_table = string_table else: self.string_table = StringTable()
Next, we need to change the signature of the __init__() method to allow us to pass a new symbol table into the method when needed. You will see why we would want to do this later. For now, just know it's something we may need to do.
class Parser: def __init__(self, lexer: Lexer, data_stack: Stack = None, string_table=None): self.lexer = lexer # Setup data stack ...
With that change, any errors you were getting in the __init__() method should have disappeared. If not, then make sure you've imported the StringTable class at the top of the file.
We now need to change our source code in the "if" statement at the bottom of the parser file to:
if __name__ == '__main__': source = """ 16 2 >> "This is a string" dump """ lexer = Lexer(source) parser = Parser(lexer) parser.parse()
If we run this code now you should see the following:
[ '4', '0', '16' ]
These values come from the dump operation which displays the contents of the stack after the parser processed the string. Since we haven't put any other strings in the string table, the index for this string is 0 and it's length is 16.
Now that we can get the string stowed away in the string table and the address of it on the stack, what can we do with it? Well, at the moment not a lot. So let's fix that. The primary thing we want to do with strings is display them. So let's add some code to handle a new keyword "TYPE". Type will pop the length and address of a string off the stack and print the string out to the console.
Add the following code to the parser between the code to handle the "emit" and the code to handle the "key" keywords:
# parser.py ... elif tok.value == 'type': """<addr> <length> type Sends a string to standard out. The string is located in the string table at location <addr> and has a length of <length>. """ length = int(self.data.pop()) addr = int(self.data.pop()) string = self.string_table.get_str(addr) if length > len(string): raise IndexError(f'Error: String length of {length} was given but actual length is {len(string)}.') idx = 0 while idx < length: print(string[idx], end='') idx += 1 print() ...
Now change the code for the runner to:
if __name__ == '__main__': source = """ 16 2 >> "This is a string" dump type dump """ lexer = Lexer(source) parser = Parser(lexer) parser.parse()
When you run this you should get the following output:
[ '4', '0', '16' ] This is a string [ '4' ]
When the parser encounters the string it already has 4 on the stack from the previous operation. Then it places the strings address (index), on the stack followed by the string's length. We then call "type" which takes the length and address of the string off the stack and uses this information to display the string. Then we call "dump" again to show that only the string values were consumed, leaving 4 still on the stack when the program completes.
With this, we can now right that "Hello World" app. Give it a try!
Now that we have strings working, let's get characters working. Characters are single-quoted ASCII characters, and when encountered by the parser, they are converted to their ASCII values and placed on the stack. I bet by now you can add the code for this without my help. So, I'll just show you my implementation below:
... elif tok.type == TT_CHAR: char = self.current_tok.value ascii_code = ord(char) self.data.push(ascii_code) # Handle Keywords elif tok.type == TT_KEYWORD: ...
Here we simply get the character from the token value and convert it to its code-point using the ord() function from python. Then we push that value onto the stack.
I had wanted to include variable handling in this issue. However, I feel that it will make this post exceedingly long so I will postpone variables until next time.
As always, I have included the code as it now stands, below:
# keywords.py KEYWORDS = [ # Stack Operators 'SWAP', 'SWAP2', 'DUMP', 'DROP', 'ROL', 'ROR', 'OVER', 'DUP', 'DUP2', 'DUP3', 'DUP4', 'CLEAR', 'DUMP', # Conditional Operators 'IF', 'ELSE', 'THEN', 'NOT', # Iterative 'DO', 'LOOP', 'WHILE', 'WEND', # I/O Operators 'TYPE', # Outputs a string to stdout 'EMIT', # Outputs a character to stdout 'KEY', # Takes input character from stdin # Special Character Values # See ASCII Character Table for values 'CR', 'LF', 'NULL', 'BELL', 'BS', 'TAB', 'VT', 'FF', 'ESC', # Logical Operators 'NEQ', 'AND', 'OR', ]
# lexer.py from keywords import KEYWORDS from tokens import * class Lexer: def __init__(self, text, line=None, col=None): self.text = text self.pos = 0 self.line = line if line else 1 self.col = col if col else 1 self.current_char = self.text[0] ############################################# # Utility Methods # #-------------------------------------------# # These methods help manage the internal # # workings of the lexer. # ############################################# def advance(self): """advance(): Advances the current position in the character stream. """ if not self.is_eoi(): self.pos += 1 self.current_char = self.text[self.pos] self.update_position() else: self.current_char = '' def update_position(self): """update_position Maintains the line and column positions for the current token scan. """ if self.current_char == '\n': self.line += 1 self.col = 0 else: self.col += 1 def is_eoi(self): """is_eoi(): :return: True if current position is End Of Input stream. Returns False otherwise. """ return self.pos >= (len(self.text) - 1) def expected(self, expected, found): """expected Raises error is expected and found are different. """ if expected == found: return else: raise ValueError(f'[Error] Expected {expected} but found {found} in line: {self.line} position: {self.col}') def peek(self): """peek Returns next character in input stream after the current character. Note: If we are at EOI (End Of Input) then a space character is returned. """ if not self.is_eoi(): return self.text[self.pos + 1] else: return ' ' # space ############################################# # Recognizer Methods # #-------------------------------------------# # These methods collect the characters that # # belong to a single multi-character token # # into a single unit, and return that unit # # to the caller. # ############################################# def char(self): if self.current_char != "'": self.expected("'", self.current_char) self.advance() # Consume opening single quote char = self.current_char self.advance() # Consume character # Now we must have a closing single quote mark if self.current_char != "'": self.expected("'", self.current_char) self.advance() # Consume closing single quote return char def string(self): if self.current_char != '"': self.expected('"', self.peek()) self.advance() # Consume leading '"' string = '' while self.current_char != '"': string += self.current_char self.advance() self.advance() # Consume trailing '"' return string def number(self): number = '' while self.current_char.isdigit(): number += self.current_char self.advance() return number def is_ident(self): return self.current_char.isalpha() or self.current_char == '_' def _ident(self): _id = '' # identifiers begin with alpha or underscores if self.current_char.isalpha() or self.current_char == '_': while self.current_char.isalnum() or self.current_char == '_': _id += self.current_char self.advance() return _id def is_keyword(self, kw): return kw.upper() in KEYWORDS def eat_comment(self): if self.current_char == '(': while self.current_char != ')' and not self.is_eoi(): self.advance() self.advance() # skip trailing ) elif self.current_char == '\\': while self.current_char != '\n' and not self.is_eoi(): self.advance() self.advance() # skip new line character ############################################# # Tokenization # #-------------------------------------------# # The method next_token() is called by the # # client to obtain the next token in the # # input stream. # ############################################# def next_token(self) -> Token: if not self.is_eoi(): if self.current_char.isspace(): while self.current_char.isspace(): self.advance() if self.current_char.isdigit(): return Token(TT_NUMBER, self.number(), self.line, self.col) elif self.current_char == '+': val = self.current_char self.advance() return Token(TT_PLUS, val, self.line, self.col) elif self.current_char == '-': val = self.current_char self.advance() return Token(TT_MINUS, val, self.line, self.col) elif self.current_char == '*': val = self.current_char if self.peek() == '*': val += '*' self.advance() self.advance() return Token(TT_POW, val, self.line, self.col) self.advance() return Token(TT_MUL, val, self.line, self.col) elif self.current_char == '/': val = self.current_char self.advance() return Token(TT_DIV, val, self.line, self.col) elif self.current_char == '&': val = self.current_char self.advance() return Token(TT_AND, val, self.line, self.col) elif self.current_char == '|': val = self.current_char self.advance() return Token(TT_OR, val, self.line, self.col) elif self.current_char == '^': val = self.current_char self.advance() return Token(TT_XOR, val, self.line, self.col) elif self.current_char == '~': val = self.current_char self.advance() return Token(TT_INV, val, self.line, self.col) elif self.is_ident(): _id = self._ident() if self.is_keyword(_id): return Token(TT_KEYWORD, _id, self.line, self.col) else: return Token(TT_ID, _id, self.line, self.col) elif self.current_char == '<': val = self.current_char if self.peek() == '<': val += '<' self.advance() self.advance() return Token(TT_LSL, val, self.line, self.col) self.advance() return Token(TT_LESS, val, self.line, self.col) elif self.current_char == '>': val = self.current_char if self.peek() == '>': val += '>' self.advance() self.advance() return Token(TT_LSR, val, self.line, self.col) self.advance() return Token(TT_GREATER, val, self.line, self.col) elif self.current_char == '=': val = self.current_char if self.peek() == '=': val += '=' self.advance() self.advance() return Token(TT_EQ, val, self.line, self.col) self.advance() return Token(TT_ASSIGN, val, self.line, self.col) elif self.current_char == ':': val = self.current_char self.advance() return Token(TT_STARTDEF, val, self.line, self.col) elif self.current_char == ';': val = self.current_char self.advance() return Token(TT_ENDDEF, val, self.line, self.col) elif self.current_char == '(' or self.current_char == '\\': self.eat_comment() return self.next_token() elif self.current_char == "'": return Token(TT_CHAR, self.char(), self.line, self.col) elif self.current_char == '"': string = self.string() return Token(TT_STRING, string, self.line, self.col) elif self.is_eoi(): return Token(TT_EOF, None, self.line, self.col) else: raise ValueError(f'Unexpected character: {self.current_char} in input stream at position: {str(self.pos)}.') else: return Token(TT_EOF, None, self.line, self.col) def main(text: str): lexer = Lexer(text) tok = lexer.next_token() while tok.type != TT_EOF: print(tok) tok = lexer.next_token() if __name__ == "__main__": text = """ ( this is a multi-line comment. ) \ This ia a line comment." _myIdent """ main(text)
# parser.py import sys from lexer import Lexer from posts.post4.string_table import StringTable from stack import Stack from tokens import * ####################################### # Parser # ####################################### class Parser: def __init__(self, lexer: Lexer, data_stack: Stack = None, string_table=None): self.lexer = lexer # Setup data stack if data_stack: self.data = data_stack else: self.data = Stack() self.current_tok = Token(TT_START, None) # Setup string table if string_table is not None: self.string_table = string_table else: self.string_table = StringTable() def next_token(self): if self.current_tok != TT_EOF: self.current_tok = self.lexer.next_token() def parse(self): # get token and decide what to do with it tok = self.current_tok while tok.type != TT_EOF: self.next_token() tok = self.current_tok if tok.type == TT_START: continue if tok.type == TT_NUMBER: self.data.push(int(tok.value)) continue # Arithmetic elif tok.type == TT_PLUS: right = self.data.pop() left = self.data.pop() result = left + right self.data.push(result) elif tok.type == TT_MINUS: right = self.data.pop() left = self.data.pop() result = left - right self.data.push(result) elif tok.type == TT_MUL: right = self.data.pop() left = self.data.pop() result = left * right self.data.push(result) elif tok.type == TT_DIV: right = self.data.pop() left = self.data.pop() result = left // right self.data.push(result) elif tok.type == TT_POW: right = self.data.pop() left = self.data.pop() result = left ** right self.data.push(result) # Logical Operations elif tok.type == TT_INV: left = self.data.pop() left = ~left self.data.push(left) elif tok.type == TT_XOR: right = self.data.pop() left = self.data.pop() self.data.push((left ^ right)) elif tok.type == TT_AND: right = self.data.pop() left = self.data.pop() self.data.push((left & right)) elif tok.type == TT_OR: right = self.data.pop() left = self.data.pop() self.data.push((left | right)) elif tok.type == TT_LSL: right = self.data.pop() left = self.data.pop() self.data.push(int(left) << int(right)) elif tok.type == TT_LSR: right = self.data.pop() left = self.data.pop() self.data.push((left >> right)) elif tok.type == TT_STRING: string = self.current_tok.value addr = self.string_table.save(string) self.data.push(addr) self.data.push(len(string)) elif tok.type == TT_CHAR: char = self.current_tok.value ascii_code = ord(char) self.data.push(ascii_code) # Handle Keywords elif tok.type == TT_KEYWORD: # Process keyword if tok.value == 'dump': """dump Display the data stack contents in the console """ print(str(self.data)) elif tok.value == 'drop': """pop Remove the top value from the stack and discard it. """ self.data.pop() elif tok.value == 'clear': """clear Empty the stack. """ self.data.clear() elif tok.value.lower() == 'swap': """swap Swaps the position of the top two items on the stack. Ex: x y z --> x z y """ if self.data.count() < 2: raise IndexError( f'Attempt to SWAP on stack of size {self.data.count()}!\n\t\t\tStack must contain at least 2 elements to SWAP.') n1 = self.data.pop() n2 = self.data.pop() self.data.push(n1) self.data.push(n2) elif tok.value == 'swap2': """swap2 Swaps the position of the top two pairs of items on the stack. Ex: w x y z --> y z w x """ if self.data.count() < 4: raise IndexError( f'Attempt to SWAP on stack of size {self.data.count()}!\n\t\t\tStack must contain at least 4 elements to SWAP2.') n3 = self.data.pop() n2 = self.data.pop() n1 = self.data.pop() n0 = self.data.pop() self.data.push(n2) self.data.push(n3) self.data.push(n0) self.data.push(n1) elif tok.value == 'over': """over Copies the second item on the stack to the head position. Ex: x y z --> x y z y """ top = self.data.pop() next = self.data.pop() self.data.push(next) self.data.push(top) self.data.push(next) elif tok.value == 'over2': """over Copies the second item on the stack to the head position. Ex: x y z --> x y z y """ top = self.data.pop() sec = self.data.pop() next = self.data.pop() self.data.push(next) self.data.push(top) self.data.push(next) elif tok.value == 'dup2': """dup2 Duplicates the top two items on the stack. Ex: x y z --> x y z y z """ n1 = self.data.pop() n2 = self.data.pop() self.data.push(n2) self.data.push(n1) self.data.push(n2) self.data.push(n1) elif tok.value == 'dup3': """dup3 Duplicates the top three items on the stack. Ex: x y z --> x y z x y z """ n1 = self.data.pop() n2 = self.data.pop() n3 = self.data.pop() self.data.push(n3) self.data.push(n2) self.data.push(n1) self.data.push(n3) self.data.push(n2) self.data.push(n1) elif tok.value == 'dup4': """dup3 Duplicates the top three items on the stack. Ex: w x y z --> w x y z w x y z """ n1 = self.data.pop() n2 = self.data.pop() n3 = self.data.pop() n4 = self.data.pop() self.data.push(n4) self.data.push(n3) self.data.push(n2) self.data.push(n1) self.data.push(n4) self.data.push(n3) self.data.push(n2) self.data.push(n1) elif tok.value == 'rol': """ror Rotates the values in the stack by moving the top element to the bottom and shifting everything else up one position. Ex: rot x y z -> y z x """ item = self.data.pop() last = self.data.que(item) elif tok.value == 'ror': """rot Rotates the values in the stack by moving the bottom element to the top and pushing everything else down one position. Ex: rot x y z -> y z x """ last = self.data.tail() self.data.push(last) # Handle I/O Routines elif tok.value == 'emit': """emit The word EMIT takes a single ASCII representation on the stack, using the low-order byte only, and sends the character to stdout. For example, in decimal: 65 EMIT↵ A 66 EMIT↵ B 67 EMIT↵ C """ ch = self.data.pop() if ch > 0x110000 or ch < 0: raise ValueError(f'[Error] character popped by emit: {ch} is out of printable range!') char = chr(ch) print(char, end='') elif tok.value == 'type': """<addr> <length> type Sends a string to standard out. The string is located in the string table at location <addr> and has a length of <length>. """ length = int(self.data.pop()) addr = int(self.data.pop()) string = self.string_table.get_str(addr) if length > len(string): raise IndexError(f'Error: String length of {length} was given but actual length is {len(string)}.') idx = 0 while idx < length: print(string[idx], end='') idx += 1 print() elif tok.value == 'key': """key Pauses execution until a key is pressed. Once a key is pressed, the key code is pushed onto the stack. Ex: <A> --> 65 """ getch = sys.stdin.read key = None while not key: key = getch(1) self.data.push(ord(key)) # Helpful Constants elif tok.value == 'cr': # Carriage Return self.data.push(13) elif tok.value == 'lf': # Line Feed self.data.push(10) elif tok.value == 'sp': self.data.push(32) elif tok.value == 'null': # NUll self.data.push(0) elif tok.value == 'bell': # System Bell self.data.push(7) elif tok.value == 'bs': # Backspace self.data.push(8) elif tok.value == 'tab': # TAB self.data.push(9) elif tok.value == 'vt': # Vertical TAB self.data.push(11) elif tok.value == 'ff': # Form Feed self.data.push(12) elif tok.value == 'esc': # Escape self.data.push(27) if __name__ == '__main__': source = """ 16 2 >> "This is a string" dump type dump 'C' 'B' 'A' dump """ lexer = Lexer(source) parser = Parser(lexer) parser.parse() #print(parser.data)
#! usr/bin/env/python3 # sol.py from lexer import Lexer from parser import Parser import argparse def main(src): _lexer = Lexer(src) _parser = Parser(_lexer) _parser.parse() if __name__ == '__main__': argparse = argparse.ArgumentParser() argparse.add_argument("-i", "--infile", help="Input source file", required=True) argparse.add_argument("-v", "--verbose", help="Echo source file", action="store_true") args = argparse.parse_args() infile = args.infile sorce = '' with open(infile, 'r') as ifh: source = ifh.read() if args.verbose: print(source) main(source)
# stack.py class Stack: def __init__(self, report_errors=False): self.store = [] self.report_error = report_errors def push(self, item): self.store.append(item) def pop(self): if self.store: return self.store.pop() else: if self.report_error: raise IndexError('Attempt to pop a value from empty stack!') else: return None def tail(self): if self.store: return self.store.pop(0) else: return None def clear(self): self.store.clear() def count(self): return len(self.store) def is_empty(self): return not bool(self.store) def __str__(self): s = '[ ' for v in self.store: s += "'" + str(v) + "', " s = s[:-2] s += ' ]' return s def __repr__(self): return self.__str__()
class StringTable: def __init__(self): self.data = [] # Insert string and meta data into string list def save(self, string): entry = {'len': len(string), 'val': string} self.data.append(entry) return (len(self.data) - 1) # Return string storage object (including length) def get(self, idx): return self.data[idx] # Return length of string def length(self, idx): return int(self.data[idx]['len']) # Return printable string only (no meta data). def get_str(self, idx): return str(self.data[idx]['val']) def __str__(self): return str(self.data)
# tokens.py # Data TT_CHAR = 'CHAR' # 'c' TT_STRING = 'STRING' # "string" TT_NUMBER = 'NUMBER' # 1234 (integers only TT_ID = 'ID' # Arithmatic Operators TT_ASSIGN = 'ASSIGN' # '=' TT_PLUS = 'PLUS' # '+' TT_MINUS = 'MINUS' # '-' TT_MUL = 'MUL' # '*' TT_DIV = 'DIV' # '/' TT_POW = 'POW' # '**" # Bitwise Operators TT_LSR = 'LSR' # '>>' TT_LSL = 'LSL' # '<<' TT_AND = 'AND' # '&' Bitwise AND TT_OR = 'OR' # '|' Bitwise OR TT_XOR = 'XOR' # '^' Bitwise XOR TT_INV = 'INV' # '~' Bitwise NOT/Invert # Relational Operators TT_EQ = 'EQ' # '==' Is Equal? TT_LESS = 'LESS' # '<' Is Less? TT_GREATER = 'GREATER' # '>' Is greater? # Logical Operators TT_LAND = 'LAND' # AND Logical TT_LOR = 'LOR' # OR Logical TT_NEQ = 'NEQ' # NEQ Logical # DEFINE Functions TT_STARTDEF = 'STARTDEFINE' # ':' TT_ENDDEF = 'ENDDEFINE' # ';' # INTERNAL USE TT_KEYWORD = 'KEYWORD' TT_START = "START" TT_EOF = 'EOF' class Token: def __init__(self, _type, _value, line=None, col=None): self.type = _type self.value = _value self.col = col self.line = line def __str__(self): return f'''type: {self.type}, value: {self.value}, at line {self.line} in column {self.col}''' def __repr__(self): return self.__str__()
\ file: test.sol \ \ Expected Output \ ----------------------- \ [ '2', '3' ] \ [ '5' ] \ [ '5', '5' ] \ [ '10' ] \ [ '10', '6' ] \ 2 3 dump + dump 5 dump + dump 6 dump * dump
Until next time, Happy Coding!