DESCRIPTION: In this assignment, you need to create a threaded program that supports Cost-optimal
Parallel Prefix (Covered in Lecture 4) for the following algorithm.
for (i=1; i<n; i++) A[i] = A[i] + A[i-1];
The cost-optimal parallel prefix algorithm consists of three stages, as shown below.
1. Partition n-input values into chunks of size log2(n). Each processor computes local prefix sums of
the values in one chunk in parallel (takes time O(log(n))).
2. Perform non-cost-optimal prefix sum algorithm on the n / log2(n) partial results (takes time
O(log(n/ log(n)))).
3. Each processor adds the value computed in Stage 2 by its left neighbor to all values of its chunk
(takes time O(log(n))).
For this assignment, a non-threaded example source code ("assignment5_template.cpp") is provided.
With the source code, you must address the following two requirements.
\begin{itemize}
\item Convert each stage to a lambda expression.
\item Stages 1 and 3 must be designed to be run in parallel (using multi-threading).
\end{itemize}
For multi-threading, you must use the C++ thread function (either std::thread or std::async).
With the non-threaded and threaded programs, you need to run five times (with different # of elements)
and measure elapsed time to complete the following table.
# of elements
1st run
2nd run
3rd run
Threaded
4th run
5th run
Average
Standard Deviation
1st run
2nd run
3rd run
Non-threaded
4th run
5th run
Average
Standard Deviation
16
32
64
1000
3000
5000
8000
10000
You need to create a table in an MS Excel file. With the average and standard deviation, you need to
make a line graph (or bar graph) with error bars. To create the line graph, you need to use the average