Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.
Consider a basic in-order pipeline with bypassing (one instruction in each pipeline stage in any cycle). The pipeline has been extended to handle FP...
Consider a basic in-order pipeline with bypassing (one instruction in each pipeline stage in any cycle). The pipeline has been extended to handle FP add. Assume the following delays between dependent instructions:
- Load feeding any instruction: 3 stall cycles
- FP ALU feeding any instruction (except stores): 5 stall cycles FP ALU feeding store: 4 stall cycles
-Int add feeding a branch: 2 stall cycles
- Int add feeding any other instruction: 1 stall cycle
- A conditional branch has 1 delay slot (an instruction is fetched in the cycle after the branch without knowing the outcome of the branch and is executed to completion)
Below is the source code and default assembly code for a loop.
Source Code: for (i=1000; i>0; i) { w[i] = x[i] + y[i] + z[i]; }
Assembly Code:
Loop:
L.D F1, 0(R2) // Get x[i]
L.D F2, 0(R3) // Get y[i]
L.D F3, 0(R4) // Get z[i]
ADD.D F4, F2, F1 // Add two numbers
ADD.D F5, F3, F4 // Add the third number
S.D F5, 0(R5) // Store the result into w[i]
DADDUI R2, R2, #-8 // Decrement R2
DADDUI R3, R3, #-8 // Decrement R3
DADDUI R4, R4, #-8 // Decrement R4
DADDUI R5, R5, #-8 // Decrement R5
BNE R2, R1, Loop // Check if we've reached the end of the loop
NOP
. A)Show the schedule (what instruction issues in what cycle) for the default code.
B. How should the compiler order instructions to minimize stalls (without unrolling)(note that the execution of a NOP instruction is effectively a stall)? Show the schedule. How many cycles can you save per iteration, compared to the default schedule?
C. How many times must the loop be unrolled to eliminate stall cycles? Show the schedule for the unrolled code.