



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The solution to an assignment question in the Big Data Fundamentals course (INT 312) regarding calculating the time required to retrieve a large data set from an HDD with given features and transferring it to HDFS. The solution includes calculations for time required and Hadoop commands for moving and replicating files.
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Cabinet Kumar Shah Prof. Mamoon Rashid 11812420 20574 A35 KOM59 School of CSE dept.
Question 1. For any organization X, there is a data of 10PB and the available resource of HDD is having accessing rate of 100MB/s. The HDD is having four channels for data storage and retrieval. As a Big Data Engineer, you are required to calculate the time required to retrieve this data with given features. Also, provide an idea how this organization X can retrieve this data in minimum time and what are the requirements for fulfilling your idea in solving this problem. (10) (Hint: Data will be read from four channels in parallel. As a result you need to calculate time of half data only as half data will be read in parallel at the same time via another channel). Solution: For any organization X, 1 PB = 1000 TB 10 PB = 10 * 1000 TB = 10,000 TB Data = 10 PB = 10,000 TB = (10,000 * 1048576 ) MB ( 1 TB = 1048576 MB) = 10485760000 MB Average data accessing rate = 100 MB/sec. Now, 1 sec = 100 mb X sec = 10485760000 mb (10 PB) X = 10485760000 / 100 sec = 104857600 sec = 174762666.66667 minutes = 2912711.111111167 hours (2912711.12 hours) Hence, 29127 hours and 12 minutes
Step 2 : hdfs dfs –moveFromLocal /home/cabinetshah/ /KOM59/ Step 3 : hdfs dfs –setrep –R 4 /KOM59/
Part :– II Create a text file named test.txt and save it on HDFS in one directory. Write a hadoop command to move it in another directory of Hadoop and then display its contents. Support your answer with screenshot of CLI fetching text file. Step 1: hdfs dfs –mkdir /KOM Step 2: hdfs dfs –mkdir /K18PV Step 3: hdfs dfs –touchz /K18PV/test.txt Step 4: hdfs dfs –ls /K18PV/ Step 5: hdfs dfs –mv /K18PV/test.txt /KOM59/ Step 6:hdfs dfs –ls /KOM59/