Due: Monday, Dec 6th by 11pm (you may submit as late as Friday, Dec 10th for full credit)
Points: This assignment is worth 150 points
Overview
In this assignment, you will implement optimizations to improve the target code quality compared to the code generator you implemented in Assignment 4.
Grading criteria
- Optimizations implemented: 40%
- Report:
- General discussion of optimizations implemented: 25%
- Analysis and experimental evaluation: 25%
- Design and coding style: 10%
Getting started
Start with the code you wrote for Assignment 4. (I would recommend making a copy rather than directly modifying your original code, but it’s up to you.)
The following source files are provided to implement a control flow graph representation for high level and target code:
- cfg.h, cfg.cpp: expanded to provide
BasicBlock,Edge,ControlFlowGraph,ControlFlowGraphBuilder, andControlFlowGraphPrintertypes - highlevel.h, highlevel.cpp: expanded to provide concrete
HighLevelControlFlowGraphBuilderandHighLevelControlFlowGraphPrintertypes - x86_64.h, x86_64.cpp: expanded to provide
X86_64ControlFlowGraphBuilderandX86_64ControlFlowGraphPrintertypes
Your task
Adding an optimization flag
Your compiler should support a -o command line option. When present, it indicates that the compiler should perform code optimization before emitting target assembly language.
For example, the invocation
./compiler -o input/array20.in
would generate optimized code for the source program input/array20.in, while the invocation
./compiler input/array20.in
would generate unoptimized code.
Implementing optimizations
The primary metric you should be optimizing for is the run-time efficiency of the generated assembly code.
Optimizing for (smaller) code size is also valid, but should be considered a secondary concern.
Two benchmark programs are provided (in the assign05/input subdirectory of the test repository):
- array20.in multiplies 500x500 matrices represented as 250,000 element 1-D arrays
- multidim20.in multiplies 500x500 matrices represented as 500x500 2-D arrays (only if you have implemented multidimensional arrays)
You can benchmark your compiler as follows (user input in bold):
$ ./build.rb array20
Generated code successfully assembled, output exe is out/array20
$ time ./out/array20 < data/array20.in > actual_output/array20.out
real 0m3.076s
user 0m3.060s
sys 0m0.016s
$ diff expected_output/array20.out actual_output/array20.out
Your can substitute other test program names in place of array20.
If the diff command does not produce any output, then the program’s output matched the expected output.
The build.rb puts the generated assembly language code in the out directory, so in the above test case, the assembly code would be in out/array20.S.
The user time in the output of the time command represents the amount of time the test program spent using the CPU to execute code, and so is a good measure of how long it took the compiled program to do its computation. Your code optimizations should aim to reduce user time for the benchmark programs.
Analysis and experiments
As you work on your optimizations, do experiments to see how effective they are at improving the efficiency of the generated code.
You can use any test programs as a basis for these experiments. One approach is to start with some very simple test programs, and then work your way up to the benchmark programs (which are relatively complex.)
Report
One of the most important deliverables of this assignment is a written report that describes what optimizations you implemented and how effective they were.
For each optimization technique you implemented, your report should document
- how the quality of the generated code was improved (include representative snippets of code before and after the optimization, for relevant test programs)
- how the efficiency of the generated code improved (i.e., how much did performance of the benchmark programs improve)
Your report should also discuss what inefficiencies remain in the generated code, and what optimization techniques might be helpful for addressing them.
Hints and suggestions
In addition to the hints and suggestions included below, there is a detailed advice document which has some fairly detailed suggestions for how to approach this assignment.
Ideas for optmizations to implement
Some ideas for code optimizations:
- Use machine registers for (at least some) local variables, especially loop variables
- Allocate machine registers for virtual registers
- Peephole optimization of the high-level code to eliminate redundancies and reduce the number of virtual registers needed
- Peephole optimization of the generated x86-64 assembly code to eliminate redundancies
The recommended way to approach the implementation of optimizations is to look at the high-level and low-level code generated by your compiler, and look for specific inefficiencies that you can eliminate.
We highly recommend that you keep notes about your work, so that you have a written record you can refer to in preparing your report.
Intermediate representations
You can implement your optimization passes as transformations of InstructionSequence (linear IR) or ControlFlowGraph (control flow graph IR). We recommend that you implement transformations constructively: for example, if transforming a ControlFlowGraph, the result of the transformation should be a different ControlFlowGraph, rather than an in-place modification of the original ControlFlowGraph.
If you do transformations of an InstructionSequence, you will need to take control flow into account. Any instruction that either
- is a branch, or
- has an immediate successor that is a labeled control-flow target
should be considered the end of a basic block.
If you decide to use the ControlFlowGraph intermediate representation, you can create it from an InstructionSequence as follows. For high-level code:
InstructionSequence iseq = /* InstructionSequence containing high-level code... */
HighLevelControlFlowGraphBuilder cfg_builder(iseq);
ControlFlowGraph *cfg = cfg_builder.build();
For low-level (x86-64) code:
InstructionSequence *ll_iseq = /* InstructionSequence containing low-level code... */
X86_64ControlFlowGraphBuilder xcfg_builder(ll_iseq);
ControlFlowGraph *xcfg = xcfg_builder.build();
To convert a ControlFlowGraph back to an InstructionSequence:
ControlFlowGraph *cfg = /* a ControlFlowGraph */
InstructionSequence *result_iseq = cfg->create_instruction_sequence();
Note that the create_instruction_sequence() method is not guaranteed to work if the structure of the ControlFlowGraph was modified. I.e., we do not recommend implementing optimizations which change control flow. Local optimizations (within single basic blocks) are recommended.
Framework for optimizations
The cfg_transform.h and cfg_transform.cpp source files demonstrate how to transform a ControlFlowGraph by transforming each basic block. The idea is to override the transform_basic_block member function. In general, this class should be useful for implementing any local (basic block level) optimization.
Live variables analysis
The live_vregs.h and live_vregs.cpp source files implement global live variables analysis for virtual registers. You may find this useful for determining which instructions are safe to eliminate after local optimizations are applied. Note that you will need to add the following code to the InstructionSequence class:
typedef std::vector<Instruction *>::const_reverse_iterator const_reverse_iterator;
const_reverse_iterator crbegin() const { return m_instr_seq.crbegin(); }
const_reverse_iterator crend() const { return m_instr_seq.crend(); }
This code also assumes the existence of HighLevel::is_def and HighLevel::is_use functions, which determine (respectively)
- whether an instruction is a def (assignment to a virtual register), and
- whether an operand of an instruction is a use of a virtual register
If you would like to be able to print a control flow graph annotated with live virtual registers (after each instruction), you can use the following “improved” versions:
The LiveVregsControlFlowGraphPrinter class is a version of PrintHighLevelControlFlowGraph that prints the live virtual register sets.
Note that LiveVregsControlFlowGraphPrinter assumes that
HighLevelControlFlowGraphBuilderhas a virtualformat_instructionmember function which can be overridden, and that itsprint_basic_blockmember function callsformat_instruction
You will likely need to make some code changes to allow LiveVregsControlFlowGraphPrinter to work. Here are possible implementations of print_basic_block and format_instruction for the HighLevelControlFlowGraphBuilder class:
void HighLevelControlFlowGraphPrinter::print_basic_block(BasicBlock *bb) {
for (auto i = bb->cbegin(); i != bb->cend(); i++) {
Instruction *ins = *i;
std::string s = format_instruction(bb, ins);
printf("\t%s\n", s.c_str());
}
}
std::string HighLevelControlFlowGraphPrinter::format_instruction(BasicBlock *bb,
Instruction *ins) {
PrintHighLevelInstructionSequence p;
return p.format_instruction(ins);
}
Generated code examples
Here are some examples of generated code after optimizations are applied. The optimized code is better than the unoptimized, but many opportunities for improvements remain.
Here is one data point comparing the performance of the optimized code compared to unoptimized, for the array20.in test program, reporting the user time on the average of 3 test runs:
| Unoptimized time | Optimized time | Optimized time (no reg alloc) |
|---|---|---|
| 1.657s | 0.313s | 0.894s |
Times are on Linux, using a Core i7-4771 processor.
This is a nice speedup, but it’s worth noting that the quality of the unoptimized code is not great.
Interestingly, there is a reasonably substantial speedup from just removing redundant operations, without doing general-purpose register allocation for virtual registers. (The optimization to place scalar loop variables in callee-saved registers is applied.)
Submitting
To submit, create a zipfile containing
- all of the files needed to compile your program, and
- a PDF of your report in a file called
report.pdf
Suggested commands:
make clean
zip -9r solution.zip Makefile *.h *.c *.y *.l *.cpp *.rb report.pdf
The suggested command would create a zipfile called solution.zip. Note that if your solution uses C exclusively, you can omit *.cpp from the filename patterns. If you used the scan_grammar_symbols.rb script, then make sure you include the *.rb pattern as shown above.
Make sure that your report is a PDF file called report.pdf.
Upload your submission to Gradescope as Assignment 5.