In Hadoop MapReduce, the Name Node is responsible for: Question 15Answer a. It schedules the jobs in different clusters. b. Managing and storing the metadata of files c. Running computation tasks d. Replicating data across nodes
Added by Robert P.
Step 1
The Name Node is a critical component of the Hadoop Distributed File System (HDFS) that manages the file system's namespace. Show more…
Show all steps
Your feedback will help us improve your experience
Akash M and 74 other AP CS educators are ready to help you.
Ask a new question
Labs
Want to see this concept in action?
Explore this concept interactively to see how it behaves as you change inputs.
Key Concepts
Recommended Videos
Assume a MapReduce cluster that consists of 32 nodes, each with 4 map slots and 2 reduce slots. The replication factor is set to 3 (there are three replicas of each block). Given a simple word count job with an input size of 320 GB, answer the following questions. a. Suppose the word count job processes the input data at a constant rate of 4 MB/s. Determine a proper block size for the distributed file system considering the efficiency of map tasks as well as their recovery cost. Justify your choice. Given the configuration of the MapReduce cluster, determine how many map tasks will this job have and how many rounds/waves are needed to finish the map phase on this cluster? b. How many reduce tasks should this job have? Justify your configuration.
Akash M.
HW 3 HDFS—Lecture 5 Name: ID: Consider a small cluster with 20 machines: 19 DataNodes and 1 NameNode. Each node in the cluster has a total of 2 Terabyte hard disk space and 2 Gigabyte of main memory available. The cluster uses a block-size of 64 Megabytes (MB) and a replication factor of 3. The master maintains 100 bytes of metadata for each 64MB block. (a) Let’s upload the file wiki_dump.xml (with a size of 600 Megabytes) to HDFS. Explain what effect this upload has on the number of occupied HDFS blocks. (b) Figure 1 shows an excerpt of wiki_dump.xml’s structure. Explain the relationship between an HDFS block, an InputSplit and a record based on this excerpt. <dump time="1483027930"> <page id="EN3234"> ... ... ... </page> } 80.2 MB <page id="DE5434"> ... ... ... </page> } 0.6 MB ... </dump> Figure 1: Excerpt of wiki_dump.xml. Each Wikipedia page is stored within an element. The element with id EN3234 contains 80.2 Megabytes of textual content. (c) You are the only user of the cluster and write a Hadoop job to extract information from wiki_dump.xml. You want to speed up the job by testing different block size configuration: besides the existing 64 MB configuration, you also consider 32 MB and 128 MB block sizes. Which configuration do you think will lead to the fastest job execution? Explain why. (d) Let us assume no files are currently stored on HDFS. You are given 100 million files, each one with a size of 100 Kilobytes. How many of those can you upload successfully to the cluster, considering the storage restrictions (memory/disk) on the NameNode and the DataNodes? Explain your answer.
Madhur L.
Recommended Textbooks
Computer Science and Information Technology
Introduction to Programming Using Python
Computer Science - An Overview
Transcript
18,000,000+
Students on Numerade
Trusted by students at 8,000+ universities
Watch the video solution with this free unlock.
EMAIL
PASSWORD