Text: Computer Architecture - Please explain all the steps? I will give good ratings :)
Problem 4.8, reference text: Consider the loop Y[i] = a*X[i] + Y[i], a key step in Gaussian elimination. (This problem is similar to assignment #2).
Loop:
LD F0,0(R1)
MULD F0,F0,F2
LD F4,0(R2)
ADDD F0,F0,F4
SD 0(R2),F0
SUBI R1,R1,#8
SUBI R2,R2,#8
BNEZ R1,loop
(1) Assume a 6-stage single-issue static-scheduled pipelined machine (F, D, EX, M1, M2, WB). Use these assumptions:
- The MULTD unit is fully pipelined with 6 stages.
- The ADDD unit is fully pipelined with 3 stages.
- Full forwarding (with 0 cc delay) for both INT and FP forwarding.
- A 1 cc delayed branch.
(a) Without unrolling the loop, complete a timing diagram.
(b) Assuming a 2 GHz clock rate, determine the floating-point performance of the loop. Is your answer exact or an approximation, and explain why?
(c) In a loop with 1000 iterations, when does the 100th MULTD operation start execution and end execution?
(d) Unroll the loop 2 times and schedule it for maximum performance. Complete the timing diagram.
(e) Unroll the loop as many times as necessary to schedule it without any stalls to achieve maximum performance.
(f) Assuming a 2 GHz clock rate, determine the floating-point performance of the loop. Is your answer exact or an approximation, and explain why?
(g) In a loop with 1000 iterations originally, after unrolling and scheduling in part (d), when does the 100th MULTD operation start execution and end execution?