Big Data Fundamentals Assignment Solution: Retrieving Large Data Sets from HDD to HDFS, Assignments of Data Warehousing

The solution to an assignment question in the Big Data Fundamentals course (INT 312) regarding calculating the time required to retrieve a large data set from an HDD with given features and transferring it to HDFS. The solution includes calculations for time required and Hadoop commands for moving and replicating files.

Typology: Assignments

2019/2020

Uploaded on 09/19/2020

cabinet-shah-zhylnzpttr
cabinet-shah-zhylnzpttr 🇮🇳

3 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ASSIGNMENT
ON
INT 312
(BIG DATA FUNDAMENTALS)
SET B
SUBMITTED BY: SUBMITTED TO:
Cabinet Kumar Shah Prof. Mamoon Rashid
11812420 20574
A35 KOM59 School of CSE dept.
pf3
pf4
pf5

Partial preview of the text

Download Big Data Fundamentals Assignment Solution: Retrieving Large Data Sets from HDD to HDFS and more Assignments Data Warehousing in PDF only on Docsity!

ASSIGNMENT

ON

INT 312

(BIG DATA FUNDAMENTALS)

SET B

SUBMITTED BY: SUBMITTED TO:

Cabinet Kumar Shah Prof. Mamoon Rashid 11812420 20574 A35 KOM59 School of CSE dept.

Question 1. For any organization X, there is a data of 10PB and the available resource of HDD is having accessing rate of 100MB/s. The HDD is having four channels for data storage and retrieval. As a Big Data Engineer, you are required to calculate the time required to retrieve this data with given features. Also, provide an idea how this organization X can retrieve this data in minimum time and what are the requirements for fulfilling your idea in solving this problem. (10) (Hint: Data will be read from four channels in parallel. As a result you need to calculate time of half data only as half data will be read in parallel at the same time via another channel). Solution: For any organization X, 1 PB = 1000 TB 10 PB = 10 * 1000 TB = 10,000 TB Data = 10 PB = 10,000 TB = (10,000 * 1048576 ) MB ( 1 TB = 1048576 MB) = 10485760000 MB Average data accessing rate = 100 MB/sec. Now, 1 sec = 100 mb X sec = 10485760000 mb (10 PB) X = 10485760000 / 100 sec = 104857600 sec = 174762666.66667 minutes = 2912711.111111167 hours (2912711.12 hours) Hence, 29127 hours and 12 minutes

Step 2 : hdfs dfs –moveFromLocal /home/cabinetshah/ /KOM59/ Step 3 : hdfs dfs –setrep –R 4 /KOM59/

Part :– II Create a text file named test.txt and save it on HDFS in one directory. Write a hadoop command to move it in another directory of Hadoop and then display its contents. Support your answer with screenshot of CLI fetching text file. Step 1: hdfs dfs –mkdir /KOM Step 2: hdfs dfs –mkdir /K18PV Step 3: hdfs dfs –touchz /K18PV/test.txt Step 4: hdfs dfs –ls /K18PV/ Step 5: hdfs dfs –mv /K18PV/test.txt /KOM59/ Step 6:hdfs dfs –ls /KOM59/