Matrix Multiplication and Cache Friendly Code COMP 273 Winter 2021 – Assignment 4, Prof. Kry Available: 24 March – Due date: 12 April
In this assignment you will write code to multiply two square n × n matrices of single precision floating point numbers, and then optimize the code to exploit a memory cache. All the functions you write in this assignment must respect register conventions and work for different sizes of square matrices. Your code must also include useful comments to make it readable.
You will need to use two MARS tools in this assignment:
- Data Cache Simulator: This tool allows you to set different cache sizes and types, and measures the number of memory accesses, and cache misses.
- Instruction Counter: This tool counts the number of true MIPS assembly instructions that execute during your program.
Each tool needs to be connected to MARS, and you will want to use a combination of breakpoints and the reset button on each tool to make careful measurements of your code performance.
You will also likely want to try the Memory Reference Visualization tool (much like the Bitmap Display), as it lets you watch the memory reference patterns generated by your program. Like- wise, the bitmap display tool will also be useful for visualizing the results. Remember to set the base address to the heap (0x10040000), and choose the unit and display width to match the matrix size (N = display width divided by unit width). Running some tools, may noticeably slow down the execution of your program. If ever you notice MARS running much too slow, try restarting.
2 Assignment objectives (15 marks total)
Provided code will help you get started with this assignment. The code lets you run 3 different tests by changing TestNumber in the .data section at the top of the code.
- Test 0 will help you test the first objectives (matrix subtraction and Frobeneous norm).
- Test 1 will help you checking your matrix multiply-and-add procedure. It allocates mem- ory on the heap for 4 matrices (one being the solution) and loads test matrix data from file names specified in the data segment.
- Test 2 will hep you compare different matrix multiply-and-add procedures.
Remember: MARS loads data files from the directory in which you start it, and test 1 will fail if the data files are not found.
1. subtract (2 marks)
Implement a function that subtracts two square n × n matrices A and B, and stores the result in matrix C. That is,
Cij ←Aij −Bij.
Use the signature void subtract( float* A, float* B, float* C, int n ) for your function, and note that you do not need nested for loops. Instead compute n2 with mul and iterate over the three matrices by stepping each pointer by 4 bytes on each loop.
2. frobeneousNorm (2 marks)
Implement a function that computes the Frobeneous norm of a matrix,
That is, compute the sum of the squares of all the entries, and then take the square root
using the floating coprocessor sqrt.x instruction. Use the function signature float frobeneousNorm( float* A, int n )
and remember that $f0 is used as the return register for a float. Just as in the previous question, note that you can use a single for loop to visit all n2 matrix entries.
3. check (2 marks)
Implement a function that prints the Frobeneous norm of the difference of two matrices. That is, your function takes two square n × n matrices A and B, computes the difference A − B, and stores the answer in A by calling subtract( A, B, A, n ), and then fi- nally computes the Frobeneous norm by calling frobeneousNorm( A, n ). Print the resulting single precision floating point number with SYSCALL number 2. Use the func- tion signature
void check( float* A, float* B, int n )
and test your check function by comparing different matrices. That is, using test 0 you should compute approximately 32.38494 when you compare 64-by-64 matrices loaded from A64.bin and B64.bin. Try changing the test 0 code to also comparing a matrix with itself to see if 0.0 is printed to the Run I/O console.
Leave the test 0 code such that it compares A64.bin and B64.bin when you submit your final assignment.
4. MADD1 – Multipy and add version 1 (4 marks)
Write MIPS code to multiply two square n × n matrices A and B, and add the result to
matrix C. That is,
Cij ←Cij +AikBkj.
All matrix entries are single precision floating point numbers. Use the following func- tion signature and implement the naive matrix multiplication algorithm with three nested loops.