Assaignment on Big data, Assignments of Data Mining

this 1. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________ a) Improved data storage and information retrieval b) Improved extract, transform and load features for data integration c) Improved data warehousing functionality d) Improved security, workload management, and SQL support View Answer 2. Point out the correct statement. a) Hadoop do need specialized hardware to process the data b) Hadoop 2.0 allows live st

Typology: Assignments

2019/2020

Uploaded on 09/19/2020

cabinet-shah-zhylnzpttr
cabinet-shah-zhylnzpttr 🇮🇳

3 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ASSIGNMENT
ON
INT 312
(BIG DATA FUNDAMENTALS)
SET SB
SUBMITTED BY: SUBMITTED TO:
Cabinet Kumar Shah Prof. Mamoon Rashid
11812420 20574
A35 KOM59 School of CSE dept.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Assaignment on Big data and more Assignments Data Mining in PDF only on Docsity!

ASSIGNMENT

ON

INT 312

(BIG DATA FUNDAMENTALS)

SET SB

SUBMITTED BY: SUBMITTED TO:

Cabinet Kumar Shah Prof. Mamoon Rashid

A35 KOM59 School of CSE dept.

Question 1

For any organization X, there is a data of 10PB and the available resource of HDD

is having accessing rate of 100MB/s. The HDD is having four channels for data

storage and retrieval. As a Big Data Engineer, you are required to calculate the

time required to retrieve this data with given features. Also, provide an idea how

this organization X can retrieve this data in minimum time and what are the

requirements for fulfilling your idea in solving this problem. (10)

(Hint: Data will be read from four channels in parallel. As a result you need to

calculate time of half data only as half data will be read in parallel at the same time

via another channel).

Solution:

For any organization X,

1 PB = 1000 TB

10 PB = 10 * 1000 TB = 10,000 TB

Data = 10 PB

= 10,000 TB

= (10,000 * 1048576 ) MB ( 1 TB = 1048576 MB)

= 10485760000 MB

Average data accessing rate = 100 MB/sec.

Now,

1 sec = 100 mb

X sec = 10485760000 mb (10 PB)

X = 10485760000 / 100 sec

= 104857600 sec

= 1747626.66667 minutes

Question 2 Hadoop installation Show the steps and commands used in the installation of Apache Hadoop for one node cluster on your machine. Each step must be supported with screenshots of your machine with your name on terminal. Explain the functionality of each file used in the configuration of Apache Hadoop. Step 1 : if java version is not installed then we use below command $ sudo apt-get install openjdk-8-jdk

Step 2 : check java version. for that run this command $ java –version Step 3 : Run command $ readlink -f /usr/bin/java | sed "s:bin/java::" Copy the output after running this command

Step 6: Run the command $ ssh-keygen -t rsa -P "" Step 7 : Run the command $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 8: Run the command $ ssh localhost Step 9: open bahsrc file with command $ gedit ~/.bashrc then paste the following below lines #HADOOP VARIABLES START export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd export HADOOP_INSTALL=/home/cabinetshah/hadoop-3.1. export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib" #HADOOP VARIABLES END

Step 12: Run these two commands $ sudo mkdir -p /home/cabinetshah/hadoop-3.1.3/tmp $ sudo chown cabinetshah:cabinetshah /home/cabinetshah/hadoop-3.1.3/tmp

Step 13: Goto hadoop 3.1.3 file  etc folder  hadoop folder  open in texteditor  core-site.xml file  paste following lines inside core-site.xml file hadoop.tmp.dir /home/cabinetshah/hadoop-3.1.3/tmp A base for other temporary directories. fs.default.name hdfs://localhost:54310 The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. save & quit

Step 16: Goto hadoop 3.1.3 file  etc folder  hadoop folder  open in texteditor  hdfs-site.xml file  paste following lines inside hdfs-site.xml file dfs.replication 1 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. dfs.namenode.name.dir file:/home/cabinetshah/hadoop-3.1.3/hadoop_store/hdfs/namenode

dfs.datanode.data.dir file:/home/cabinetshah/hadoop-3.1.3/hadoop_store/hdfs/datanode Save & quit Step 17: Run following command to format namenode $ hadoop namenode –format

Remarks : If we won’t get namenode after starting deamons then we just need to delete hadoop store folder & temp folder. And again we need to run all the command. We also need to check all those 4 files inside etc folder of hadoop 3.1.3 folder. Moreover we need to check bashrc file as well. Question 3

Part:-I

Create a text file named temp.txt and save it in local file system. Write a

hadoop command to move this file into HDFS and later change the replication

factor for this file to 4 on HDFS. Support your answer with screenshot of CLI

fetching text file on HDFS.

Step 1 : created file on local file system with named temp.txt

Command to move file from local to hdfs file system. Step 2 : hdfs dfs –moveFromLocal /home/cabinetshah/ /KOM59/ Command to change replication factor to 4. Step 3 : hdfs dfs –setrep –R 4 /KOM59/

Part :– II

Create a text file named test.txt and save it on HDFS in one directory. Write

a hadoop command to move it in another directory of Hadoop and then display its

contents. Support your answer with screenshot of CLI fetching text file.

Command to create directory named ‘KOM59’ on dhfs file system.

Step 1: hdfs dfs –mkdir /KOM

Command to create directory named ‘K18PV’ on dhfs file system.