beq r2, rl, Label # Branch to Label if r2 == r1 (assume not equal)
add r4, r6, r2 # Add r6 and r2, store result in r4
slt r5, r8, r2 # Set r5 to 1 if r8 < r2, else 0
sw r14, 16(r3) # Store word in memory at address r3 + 16
lw r12, 12(r3) # Load word from memory at address r3 + 12
Stage Latency (ps)
IF 200 ps
ID 120 ps
EX 150 ps
MEM 190 ps
WB 100 ps
1. What is the total execution time of this instruction sequence in a 5-stage pipeline that only has one memory (both instruction and data memory)? Can you resolve the structural hazard by adding NOPs?
2. Change the load/store instructions (lw, sw) to use a register (without an offset) as the address. Assuming this change does not affect clock cycle time, what speedup is achieved in this instruction sequence compared to the original?
3. What speedup is achieved on this code if branch outcomes are determined in the ID stage, relative to the execution where branch outcomes are determined in the EX stage?
4. Repeat the speedup calculation from question 2, but now take into account the (possible) change in clock cycle time when EX and MEM are done in a single stage.
5. Assume the latency of the ID stage increases by 50% and the EX stage decreases by 10ps. What is the speedup achieved in this case?
6. What is the new clock cycle time and execution time of this instruction sequence if the beq address computation is moved to the MEM stage? What is the speedup from this change, assuming the latency of the EX stage is reduced by 20ps?