00:01
To find the top 10 most visited common urls among 3 users recording using mapreduce, you need to follow several steps.
00:11
So, first step is this.
00:19
Iteration.
00:25
First is input data.
00:31
You have 360 gb files containing urls to visit records for 3 users.
00:38
Second is mapreduce framework.
00:50
Set up a mapreduce environment such as apache, hadoop to distribute and process the data.
01:00
Step 2.
01:05
Mapmapper function.
01:10
First is read data.
01:16
For each user record file, you read and process the data in blocks.
01:24
Second is map function.
01:32
In the mapmapper function, you parse each line of the input record to extract the url.
01:44
That is emitkeyvaluepair.
01:54
Emitkeyvaluepair to extract the url is a key.
01:57
And the value is 1.
02:00
This is done for each record.
02:02
Step 1, step 2, step 3, shuffle and so on.
02:20
The mapreduce framework will automatically shuffle and store the emitted key value paired by url.
02:30
Step 4.
02:34
Reducer function.
02:39
First is reduce function.
02:49
In the reducer function, you will receive a set of key value pair where the key is the url and the value are count.
02:56
Once for each visit, you will calculate the total count of visits for each url by summing the value.
03:10
That is emitresult.
03:18
Emitkeyvaluepair where the key is the url and the value is the total count of visits.
03:25
Step 5.
03:30
Handling data multiple users.
03:47
Repeat the above steps for each user record file.
03:52
User 1, user 2, user 3...