 Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.

QUESTION

# Assume the MIPS VLIW architecture in the slides that uses an instruction packet of 2 instructions. One instruction is an ALU op or a branch. The...

VLIW:

Assume the MIPS VLIW architecture in the slides that uses an

instruction packet of 2 instructions. One instruction is an ALU op or

a branch. The other instruction is a load or a store. Assume a

single cycle VLIW processor (all instructions will take 1 clock cycle,

and we will have all cache hits, so LW and SW take 1 clock cycle).

Assume no forwarding (the result of an instruction is not available

until the next clock cycle).

b = 4;

for( i = 1; i < 21; i++ )

{

a[i] = a[i] + a[i-1] * b;

}

\$s0 is set to address of a

Assume \$s0 + 4 is address of a initially

addi \$t0, \$0, 4 set b

addi \$t1, \$0, 1 set i

addi \$t5, \$0, 20 set stopping condition for loop

LOOP:

lw \$t2, \$s0 (-4) get a[i-1]

lw \$t3, \$s0 (0) get a[i]

mul \$t4, \$t2, \$t0 do the mult

add \$t4, \$t4, \$t3 do the add

sw \$t4, \$s0, (0) store the result

addi \$s0, \$s0, 4 increment a[] address

addi \$t1, \$t1, 1 increment i

BNE \$t1, \$t5, LOOP

1) Assemble the instructions into the fewest number of packets

possible. How many instruction packets are required for the entire

program?

2) Unroll the loop once (2 loop bodies in a single iteration).

Assemble the instructions into as few packets as possible. How many

instruction packets are required for the entire program? (remember you

can re-order instructions, as long as the output is the same

3 Use a 1 bit branch predictor initialized to predict branch not

taken. If the branch follows the following patterns, how many

mispredictions are there?: (T = taken, N = not taken)

series 1: T T N N T T T N N

series 2: T N T N T N T T T N N N

4) Repeat question 3 using a two bit predictor initialized to 'predict

strongly taken'. Use the state diagram in the slides (When it changes

prediction it changes into the 'strongly' prediction state).

series 1: T T N N T T T N N

series 2: T N T N T N T T T N N N

5) Assume the following mix of instructions:

1 - takes 2 clock cycles

2 - no restictions

3 - uses same functional unit as 2

4 - takes 2 clock cycles

5 - depends on result generated by 4

6 - uses same functional unit as 4

7 - no restictions

8 - no restrictions

A - no restrictions

B - uses same functional unit as A

C - takes 2 clock cycles

D - no restictions

E - no restrictions

Show a schedule (like the last slide in 3b) in which a minimal number

of clock cycles is needed to execute the 2 threads on. Assume 2

instructions are fetched in a clock cycle and a number of instructions

equal to the number of functional units in the processor can be issued

in a clock cycle:

1) a course grained multi-threaded processor with 1 core, and 2

functional units.

2) a fine grained multi-threaded processor with 1 core containing 2

functional units that issues instructions from threads in a round

robin fashion (cycle 1 issues from thread 1, cycle 2 issues from

thread 2, cycle 3 issues from thread 1, etc). Note that instructions

from multiple threads may be in the functional units in a given clock

cycle, but only instructions from a single thread are ISSUED each

clock cycle.

3) A symmetric multithreaded processor (hyperthreaded) with 1 core and

4 functional units. Instructions from multiple threads may be issued

in a single clock cycle.