An international cosmetics company, XYZ, wants to conduct a
worldwide market analysis about oneof its new flagship products. As
XYZ has decentralized their IT operations, it allows its worldwide
branches to use different kinds of computer systems for their daily
operations. As a result, data sources are numerous and in different
formats. Now, XYZ intends to use big data analytics to analyze the
highest actual selling priceof this product in each branch. After
receiving the consultants’ advice, XYZ would adopt Hadoop and
MapReduce as the software appliances. Before deployment, XYZ wants
to test the application development algorithm. Assume the following
streams of test transaction data have been transformed into US
currency already and are grouped in data set with format (branch
code, list price, actual selling price) and each data item is
separated by a space:
{HKG, 650, 600}{TPG, 680, 650}{KSH, 710, 650}{TKO, 700,
550}{SHG, 810, 800}{SIN, 780, 700}{BEJ, 730, 700}{KLU, 750,
750}{HKG, 660, 650}{KSH, 800, 600}{TPG, 660, 650}{KLU, 660,
650}{HKG, 690, 650}{KLU, 670, 650}{SIN, 760, 750}{HKG, 990,
800}{BEJ, 910, 800}{BEJ, 880, 850}{SHG, 810, 800}{TKO, 910,
900}{KSH, 600, 600}{TPG, 660, 650}{SIN, 800, 550}{SIN, 620,
600}{SHG, 990, 900}
You are required to answer the following:
a)Assume there are 2 data nodes to be used inside HDFS, based on
ApacheMapReduce technique,perform the splittingthe function and
list your answers using appropriate diagrams.
b)Based on the above splitting results, perform mappingfunction
manually and list your answers using appropriate diagrams.
c)Based on the above mapping results, perform shufflingfunction
manually and list your answers using appropriate diagrams.
d)Based on the above shuffling results, perform reducingfunction
manually and list your answers using appropriate diagrams.
e)Based on the above reducing results, combine results manually
and list your answers using an appropriate diagram