2. Consider the following piece of C code:
for (j=2;j<=1000;j++)
D[j] = D[j-1]+D[j-2];
The RISC-V code corresponding to the above fragment is:
li x5, 8000
add x12, x10, x5
addi x11, x10, 16
LOOP: fld f0, -16(x11)
fld f1, -8(x11)
fadd.d f2, f0, f1
fsd f2, 0(x11)
addi x11, x11, 8
ble x11, x12, LOOP
The latency of an instruction is the number of cycles that must come between that instruction and an instruction using the result. Assume floating point instructions have the following associated latencies (in cycles):
fadd.d fld fsd
4 6 1
2.1 How many cycles does it take to execute this code?
2.2 Re-order the code to reduce stalls. Now, how many cycles does it take to execute this code?
(Hint: You can remove additional stalls by changing the offset on the fsd instruction.)
2.3 When an instruction in a later iteration of a loop depends upon a data value produced in an earlier iteration of the same loop, we say that there is a loop-carried dependence between iterations of the loop. Identify the loop-carried dependences in the above code. Identify the dependent program variable and assembly-level registers. You can ignore the loop induction variable j.
2.4 Rewrite the code by using registers to carry the data between iterations of the loop (as opposed to storing and re-loading the data from main memory). Show where this code stalls and calculate the number of cycles required to execute. Note that for this problem you will need to use the assembler pseudo-instruction "fmv.d rd, rs1", which writes the value of floating-point register rs1 into floating-point register rd. Assume that fmv.d executes in a single cycle