Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.
Assume the MIPS VLIW architecture in the slides that uses an instruction packet of 2 instructions. One instruction is an ALU op or a branch. The...
VLIW:
Assume the MIPS VLIW architecture in the slides that uses an
instruction packet of 2 instructions. One instruction is an ALU op or
a branch. The other instruction is a load or a store. Assume a
single cycle VLIW processor (all instructions will take 1 clock cycle,
and we will have all cache hits, so LW and SW take 1 clock cycle).
Assume no forwarding (the result of an instruction is not available
until the next clock cycle).
b = 4;
for( i = 1; i < 21; i++ )
{
a[i] = a[i] + a[i-1] * b;
}
$s0 is set to address of a[0]
Assume $s0 + 4 is address of a[1] initially
addi $t0, $0, 4 set b
addi $t1, $0, 1 set i
addi $t5, $0, 20 set stopping condition for loop
LOOP:
lw $t2, $s0 (-4) get a[i-1]
lw $t3, $s0 (0) get a[i]
mul $t4, $t2, $t0 do the mult
add $t4, $t4, $t3 do the add
sw $t4, $s0, (0) store the result
addi $s0, $s0, 4 increment a[] address
addi $t1, $t1, 1 increment i
BNE $t1, $t5, LOOP
1) Assemble the instructions into the fewest number of packets
possible. How many instruction packets are required for the entire
program?
2) Unroll the loop once (2 loop bodies in a single iteration).
Assemble the instructions into as few packets as possible. How many
instruction packets are required for the entire program? (remember you
can re-order instructions, as long as the output is the same
3 Use a 1 bit branch predictor initialized to predict branch not
taken. If the branch follows the following patterns, how many
mispredictions are there?: (T = taken, N = not taken)
series 1: T T N N T T T N N
series 2: T N T N T N T T T N N N
4) Repeat question 3 using a two bit predictor initialized to 'predict
strongly taken'. Use the state diagram in the slides (When it changes
prediction it changes into the 'strongly' prediction state).
series 1: T T N N T T T N N
series 2: T N T N T N T T T N N N
5) Assume the following mix of instructions:
thread 1:
1 - takes 2 clock cycles
2 - no restictions
3 - uses same functional unit as 2
4 - takes 2 clock cycles
5 - depends on result generated by 4
6 - uses same functional unit as 4
7 - no restictions
8 - no restrictions
thread 2:
A - no restrictions
B - uses same functional unit as A
C - takes 2 clock cycles
D - no restictions
E - no restrictions
Show a schedule (like the last slide in 3b) in which a minimal number
of clock cycles is needed to execute the 2 threads on. Assume 2
instructions are fetched in a clock cycle and a number of instructions
equal to the number of functional units in the processor can be issued
in a clock cycle:
1) a course grained multi-threaded processor with 1 core, and 2
functional units.
2) a fine grained multi-threaded processor with 1 core containing 2
functional units that issues instructions from threads in a round
robin fashion (cycle 1 issues from thread 1, cycle 2 issues from
thread 2, cycle 3 issues from thread 1, etc). Note that instructions
from multiple threads may be in the functional units in a given clock
cycle, but only instructions from a single thread are ISSUED each
clock cycle.
3) A symmetric multithreaded processor (hyperthreaded) with 1 core and
4 functional units. Instructions from multiple threads may be issued
in a single clock cycle.