CS 3340 Computer Architecture - Summer 2017 - Mazidi Homework 6: Memory Access Objective: Gain insight into memory and cache optimization. This...

CS 3340 Computer Architecture – Summer 2017 – Mazidi

Homework 6: Memory Access

Objective: Gain insight into memory and cache optimization.

This homework involves running two programs from: http://courses.missouristate.edu/KenVollmar/mars/tutorial.htm

Both programs traverse a 16x16 array of words, one by columns, the other by rows.

Run the col-major program and answer the following:

Run the Memory Reference Visualization in Tools, adjusting your run speed if necessary. Describe the pattern in which words are accessed.
Looking at the Data Segment Memory after the run, describe the pattern of placement in terms of memory addresses.
Run the Cache Simulator in Tools with the default settings: direct mapped, LRU, 8 blocks, 4 words. What is your hit rate? Explain why you got this hit rate.
Try fully associative cache. Did this change the hit rate? Why or why not?

Run the row-major program and answer the following:

Run the Memory Reference Visualization tool, adjusting your run speed if necessary. Describe the pattern in which words are accessed.
Looking at the data segment memory after the run, describe the pattern of placement in terms of memory addresses.
Run the cache simulator with the default settings: direct mapped, LRU, 8 blocks, 4 words. What is your hit rate? Explain why you got this hit rate.
Try 2-way set associative and fully associative cache. Did this change the hit rate? Why or why not?
Experiment with different settings for direct-mapped cache. Did you find an improved hit rate? With what settings? How do you explain this improvement?
Experiment with different settings for fully associative cache, describe your settings and explain the hit rate.

Col_ Major:

################################################################

# Column-major order traversal of 16 x 16 array of words.

# Pete Sanderson

# 31 March 2007

# To easily observe the column-oriented order, run the Memory Reference

# Visualization tool with its default settings over this program.

# You may, at the same time or separately, run the Data Cache Simulator

# over this program to observe caching performance. Compare the results

# with those of the row-major order traversal algorithm.

# The C/C++/Java-like equivalent of this MIPS program is:

# int size = 16;

# int[size][size] data;

# int value = 0;

# for (int col = 0; col < size; col++) {

# for (int row = 0; row < size; row++) }

# data[row][col] = value;

# value++;

# }

# Note: Program is hard-wired for 16 x 16 matrix. If you want to change this,

# three statements need to be changed.

# 1. The array storage size declaration at "data:" needs to be changed from

# 256 (which is 16 * 16) to #columns * #rows.

# 2. The "li" to initialize $t0 needs to be changed to the new #rows.

# 3. The "li" to initialize $t1 needs to be changed to the new #columns.

.data

data: .word 0 : 256 # 16x16 matrix of words

.text

li $t0, 16 # $t0 = number of rows

li $t1, 16 # $t1 = number of columns

move $s0, $zero # $s0 = row counter

move $s1, $zero # $s1 = column counter

move $t2, $zero # $t2 = the value to be stored

# Each loop iteration will store incremented $t1 value into next element of matrix.

# Offset is calculated at each iteration. offset = 4 * (row*#cols+col)

# Note: no attempt is made to optimize runtime performance!

loop: mult $s0, $t1 # $s2 = row * #cols (two-instruction sequence)

mflo $s2 # move multiply result from lo register to $s2

add $s2, $s2, $s1 # $s2 += col counter

sll $s2, $s2, 2 # $s2 *= 4 (shift left 2 bits) for byte offset

sw $t2, data($s2) # store the value in matrix element

addi $t2, $t2, 1 # increment value to be stored

# Loop control: If we increment past bottom of column, reset row and increment column

# If we increment past the last column, we're finished.

addi $s0, $s0, 1 # increment row counter

bne $s0, $t0, loop # not at bottom of column so loop back

move $s0, $zero # reset row counter

addi $s1, $s1, 1 # increment column counter

bne $s1, $t1, loop # loop back if not at end of matrix (past the last column)

# We're finished traversing the matrix.

li $v0, 10 # system service 10 is exit

syscall # we are outta here.

Row_ Major:

#############################################################################

# Row-major order traversal of 16 x 16 array of words.

# Pete Sanderson

# 31 March 2007

# To easily observe the row-oriented order, run the Memory Reference

# Visualization tool with its default settings over this program.

# You may, at the same time or separately, run the Data Cache Simulator

# over this program to observe caching performance. Compare the results

# with those of the column-major order traversal algorithm.

# The C/C++/Java-like equivalent of this MIPS program is:

# int size = 16;

# int[size][size] data;

# int value = 0;

# for (int row = 0; col < size; row++) {

# for (int col = 0; col < size; col++) }

# data[row][col] = value;

# value++;

# }

# Note: Program is hard-wired for 16 x 16 matrix. If you want to change this,

# three statements need to be changed.

# 1. The array storage size declaration at "data:" needs to be changed from

# 256 (which is 16 * 16) to #columns * #rows.

# 2. The "li" to initialize $t0 needs to be changed to new #rows.

# 3. The "li" to initialize $t1 needs to be changed to new #columns.

.data

data: .word 0 : 256 # storage for 16x16 matrix of words

.text

li $t0, 16 # $t0 = number of rows

li $t1, 16 # $t1 = number of columns

move $s0, $zero # $s0 = row counter

move $s1, $zero # $s1 = column counter

move $t2, $zero # $t2 = the value to be stored

# Each loop iteration will store incremented $t1 value into next element of matrix.

# Offset is calculated at each iteration. offset = 4 * (row*#cols+col)

# Note: no attempt is made to optimize runtime performance!

loop: mult $s0, $t1 # $s2 = row * #cols (two-instruction sequence)

mflo $s2 # move multiply result from lo register to $s2

add $s2, $s2, $s1 # $s2 += column counter

sll $s2, $s2, 2 # $s2 *= 4 (shift left 2 bits) for byte offset

sw $t2, data($s2) # store the value in matrix element

addi $t2, $t2, 1 # increment value to be stored

# Loop control: If we increment past last column, reset column counter and increment row counter

# If we increment past last row, we're finished.

addi $s1, $s1, 1 # increment column counter

bne $s1, $t1, loop # not at end of row so loop back

move $s1, $zero # reset column counter

addi $s0, $s0, 1 # increment row counter

bne $s0, $t0, loop # not at end of matrix so loop back

# We're finished traversing the matrix.

li $v0, 10 # system service 10 is exit

syscall # we are outta here.