Assignment 1

Due: Friday, Sep 13th by 11pm Baltimore time

Interpreter part 1: expression evaluation

In this assignment you will implement an interpreter which serves as a calculator, reading a sequence of statements containing expressions to be evaluated, and then analyzing and evaluating them.

Getting started

Download assign01.zip and unzip it:

wget https://jhucompilers.github.io/fall2024/assign/assign01.zip
unzip assign01.zip

These commands will create a directory called assign01 which contains the starter code.

Lexical analyzer changes

You will need to implement changes to the lexical analyzer (in lexer.cpp, as well as token.h) to add the following new tokens:

var
=
||
&&
<
<=
>
>=
==
!=

Note that the two-character tokens where the first character is also a valid token — for example, >= — will require special handling. You might want to add new helper functions to the lexer.

Parser changes

These are the grammar productions implemented in the starter code (which is adapted from the astdemo example code):

Unit → Stmt
Unit → Stmt Unit
Stmt → E ;
E    → T E'
E'   → + T E'
E'   → - T E'
E'   → ε
T    → F T'
T'   → * F T'
T'   → / F T'
T'   → ε
F    → number
F    → ident
F    → ( E )

You will add the following productions to the grammar (implement these in Parser2):

Stmt → var ident ;
A    → ident = A
A    → L
L    → R || R
L    → R && R
L    → R
R    → E < E
R    → E <= E
R    → E > E
R    → E >= E
R    → E == E
R    → E != E
R    → E

Note that the symbols beginning with an upper-case letter (e.g., Stmt, A, L, R, E, E', etc.) are nonterminals, and all other symbols (e.g., (, ), =, ==, <, etc.) are terminals.

You will also need to change the production

Stmt → E ;

Stmt → A ;

You will also need to change the production

F → ( E )

F → ( A )

You will need to add new AST node types to ast.h and ast.cpp to represent the new constructs added to the language. Specifically:

The assignment (=) operator
The logical and (&&) and or (||) operators
The comparison (<, <=, >, >=) operators
The equality and inequality (==, !=) operators

Parsing assignments

The A → ident = A production requires two tokens of lookahead. The reason is that if the parser is trying to expand an A nonterminal symbol, and the next token is an identifier, there is not enough information to know whether A will expand to an assignment or to a different kind of expression. However, if the identifier is immediately followed by an assignment operator (=), that indicates that A should expand to an assignment.

The Lexer class’s peek() member function takes an optional argument to specify how many tokens ahead to peek. For example:

Node *next_tok = m_lexer->peek(1);
Node *next_next_tok = m_lexer->peek(2);

Note that if input ends before the lexer can scan the requested number of tokens, peek() returns a null pointer.

Testing your lexer and parser changes

Once you have implemented the changed described above, you can run the program and have it print a textual representation of the AST constructed from the input. For example, let’s say that the input file t/example02.txt has the following contents:

var a;
var b;
a = 11;
b = 22;
var c;
c = (a * 4 < 55) && (b == 16 || b >= 9 + 13);
c;

Run the program (after compiling it) as follows:

./minilang -p t/example02.txt

This invocation could produce the following output:

UNIT
+--STATEMENT
|  +--VARDEF
|     +--VARREF[a]
+--STATEMENT
|  +--VARDEF
|     +--VARREF[b]
+--STATEMENT
|  +--ASSIGN
|     +--VARREF[a]
|     +--INT_LITERAL[11]
+--STATEMENT
|  +--ASSIGN
|     +--VARREF[b]
|     +--INT_LITERAL[22]
+--STATEMENT
|  +--VARDEF
|     +--VARREF[c]
+--STATEMENT
|  +--ASSIGN
|     +--VARREF[c]
|     +--LOGICAL_AND
|        +--LT
|        |  +--MULTIPLY
|        |  |  +--VARREF[a]
|        |  |  +--INT_LITERAL[4]
|        |  +--INT_LITERAL[55]
|        +--LOGICAL_OR
|           +--EQ
|           |  +--VARREF[b]
|           |  +--INT_LITERAL[16]
|           +--GTE
|              +--VARREF[b]
|              +--ADD
|                 +--INT_LITERAL[9]
|                 +--INT_LITERAL[13]
+--STATEMENT
   +--VARREF[c]

Note that there is no inherently correct or incorrect form for the AST corresponding to a particular input, other than representing the semantic properties of the input (such as order of evaluation) correctly. You should arrange the code in the parser to construct an AST that

Embodies the essential information about the input that wil be needed for interpretation, and
Is as simple as possible

Interpreting the AST

After implementing the changes to the lexer and parser described above, you will implement the analyze() and execute() member functions in the Interpreter class.

The idea is that the analyze() function will inspect the AST and verify that it does not have semantic errors that would prevent interpretation. The only required semantic check for this milestone is verifying that all variable references are preceeded by a variable definition. If the AST does not have any semantic errors, analyze() returns normally. If there are any semantic errors, it should use the SemanticError::raise function to throw a SemanticError exception.

The execute() function should iterate through the statements and evaluate each one. A variable definition statement should create the named variable and initialize it with the value 0. For statements containing an expression, the expression should be evaluated. If there are any assignments in the expression, the value of the variable on the left hand side of the assignment should be updated to store the computed value of the expression on the right hand side of the assignment.

The execute() function should return the value that results from evaluating the last statement. Note that as a special case, evaluating variable definition produces the value 0.

Evaluating operators

Expression evaluation is recursive. The base cases are integer literals, which “self-evaluate”, and variable references, where the value of the variable reference is found by looking it up in the environment. (The Environment class embodies an environment, which for this milestone is a map of variable names to their values.) To evaluate an operator, first recursively evaluate the subexpressions, then perform the operation on the resulting value.

All evaluations should use the Value type to represent values. This type (as implemented in the starter code) uses the int data type as the representation of numeric values.

Operators should be evaluated as follows:

Operator	How to evaluate
`+`	Compute sum
`-`	Compute difference
`*`	Compute product
`/`	Compute quotient: throw an `EvaluationError` if the divisor is 0
`&&`	1 if both operands a non-zero, 0 if either operand is zero
`\|\|`	1 if either operand is non-zero, 0 if both operands are zero
`>`	1 if left operand is greater than right operand, 0 otherwise
`>=`	1 if left operand is greater than or equal to right operand, 0 otherwise
`<`	1 if left operand is less than right operand, 0 otherwise
`<=`	1 if left operand is less than or equal to right operand, 0 otherwise
`==`	1 if operands are equal, 0 otherwise
`!=`	1 if operands are not equal, 0 otherwise

Testing

The fall2024-tests repository has some tests you can use to check your implementation. First, clone the test repository:

git clone https://github.com/jhucompilers/fall2024-tests.git compilers-fall2024-tests

We will refer to the directory name of your clone of the test repository as $TEST_REPO.

To run tests, first set the ASSIGN01_DIR environment variable to the directory containing your program (and in particular, the minilang executable):

export ASSIGN01_DIR=~/compilers/assign01

The above command assumes that ~/compilers/assign01 is the directory containing your work. Adjust this as necessary.

Next, change directory to assign01 subdirectory of the test repository:

cd $TEST_REPO/assign01

To run a single test, use the run_test.rb script. For example,

./run_test.rb example01

will execute the example01 test. To run all tests, use the command

./run_all.rb

We encourage you to add your own tests. The input directory contains the test programs to be evaluated by the minilang interpreter program. Test programs should end in the .in file extension. The expected_output directory contains the expected output for the test program. Expected output files should end in the .out file extension, and the stem of filename should match the stem of the corresponding test program. The expected_error directory contains the expected error output, also ending with the .out file extension and with the stem matching the test program. If no error output is expected, this file can be omitted.

Here is an example showing how to create a very simple test (these commands assume your shell is in the $TEST_REPO/assign01 directory):

echo "1 + 2;" > input/example01.in
echo "Result: 3" > expected_output/example01.out
./run_test.rb example01

Assuming that your minilang program can evaluate the expression 1 + 2 correctly, the run_test.rb script should produce the output

Test passed!

Non-functional requirements

We expect your code to be clean, readable, well-designed, and (reasonably) efficient. Please refer to the design, coding style, and efficency guidelines.

We also expect your code to be free of runtime errors such as uses of uninitialized variables, out of bounds array accesses, and memory leaks. You can and should use valgrind when testing your code to check for such errors. We also highly recommend using std::unique_ptr to manage pointers to dynamically-allocated objects (to ensure that they are deallocated when no longer needed.)

You should submit a README.txt file that briefly explains your implementation. A paragraph or two is sufficient. If you used any interesting implementation techniques, let us know about them. Also, if there are any limitations such as features that you weren’t able to get working, you can document them here.

Submitting

To submit, create a zipfile with your code, the Makefile, and the README.txt. Suggested command:

zip -9r solution.zip *.h *.cpp Makefile README.txt

Upload the zipfile (i.e., solution.zip) to Gradescope as Assignment 1.