Data analysis and exploration, Study notes of Data Analysis & Statistical Methods

Its about data analysis and exploration chapter. Mcq are very important.

Typology: Study notes

2019/2020

Uploaded on 10/27/2020

ajinkya-barbade
ajinkya-barbade 🇮🇳

1 document

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dr.G.R.Damodaran College of Science
(Autonomous, affiliated to the Bharathiar University, recognized by the UGC)Re-
accredited at the 'A' Grade Level by the NAAC and ISO 9001:2008 Certified
CRISL rated 'A' (TN) for MBA and MIB Programmes
II MCA [2018-2021 Batch]
Semester IV
Elective II: Big Data Analytics - 454U8
Multiple Choice Questions.
1. Facebook Tackles Big Data With _______ based on Hadoop
A. Project Prism
B. Prism
C. Project Data
D. Project Bid
ANSWER: A
2. What are the 3v's of Big Data?
A. Volume
B. Variety
C. Velocity
D. all the above
ANSWER: D
3. What license is Hadoop distributed under ?
A. Apache License 2.0
B. Mozilla
C. Shareware
D. Middleware
ANSWER: A
4. Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop
cluster using a live CD
A. OpenOffice.org
B. OpenSolaris
C. OpenSolaris
D. Linux
ANSWER: C
5. Which of the following genres does Hadoop produce ?
A. Distributed file system
B. JAX-RS
C. Java Message Service
D. JSP
ANSWER: A
6. What was Hadoop written in ?
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Data analysis and exploration and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Dr.G.R.Damodaran College of Science

(Autonomous, affiliated to the Bharathiar University, recognized by the UGC)Re- accredited at the 'A' Grade Level by the NAAC and ISO 9001:2008 Certified CRISL rated 'A' (TN) for MBA and MIB Programmes II MCA [2018-2021 Batch] Semester IV Elective II: Big Data Analytics - 454U Multiple Choice Questions.

  1. Facebook Tackles Big Data With _______ based on Hadoop A. Project Prism B. Prism C. Project Data D. Project Bid ANSWER: A
  2. What are the 3v's of Big Data? A. Volume B. Variety C. Velocity D. all the above ANSWER: D
  3. What license is Hadoop distributed under? A. Apache License 2. B. Mozilla C. Shareware D. Middleware ANSWER: A
  4. Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop cluster using a live CD A. OpenOffice.org B. OpenSolaris C. OpenSolaris D. Linux ANSWER: C
  5. Which of the following genres does Hadoop produce? A. Distributed file system B. JAX-RS C. Java Message Service D. JSP ANSWER: A
  6. What was Hadoop written in?

A. C

B. C++

C. Java D. JSP ANSWER: C

  1. Which of the following platforms does Hadoop run on? A. Bare metal B. Debian C. Cross-platform D. Unix-Like ANSWER: C
  2. Hadoop achieves reliability by replicating the data across multiple hosts, and hence does not require ________ storage on hosts. A. RAID B. ZFS C. Operating System D. DFS ANSWER: A
  3. Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs. A. MapReduce B. Google C. Functional Programming D. Facebook ANSWER: A
  4. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. A. Machine learning B. Pattern recognition C. Statistical classification D. Artificial intelligence ANSWER: A
  5. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. A. Pig Latin B. Oozie C. Pig D. Hive ANSWER: C
  6. Point out the correct statement A. Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data B. Hive is a relational database with SQL support C. Pig is a relational database with SQL support D. All of the mentioned
  1. ______ is a framework for performing remote procedure calls and data serialization. A. Mapreduce B. Dril C. Avro D. Chuckro ANSWER: C
  2. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including A. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including B. Improved extract, transform and load features for data integration C. Improved data warehousing functionality D. Improved security, workload management and SQL support ANSWER: D
  3. Point out the correct statement A. Hadoop do need specialized hardware to process the data B. Hadoop 2.0 allows live stream processing of real time data C. In Hadoop programming framework output files are divided in to lines or records D. None of the mentioned ANSWER: B
  4. According to analysts, for what can traditional IT systems provide a foundation when they are integrated with big data technologies like Hadoop? A. Big data management and data mining B. Data warehousing and business intelligence C. Management of Hadoop clusters D. Collecting and storing unstructured data ANSWER: A
  5. Hadoop is a framework that works with a variety of related tools. Common cohorts include A. MapReduce, MySQL and Google Apps B. MapReduce, Hive and HBase C. MapReduce, Hummer and Iguana D. MapReduce, Heron and Trumpet ANSWER: B
  6. Which of the following is not an input format in Hadoop? A. TextInputFormat B. ByteInputFormat C. SequenceFileInputformat D. KepInputFormat ANSWER: B
  7. What was Hadoop named after? A. Creator Doug Cutting favorite circus act B. Cutting high school rock band C. The toy elephant of Cutting son D. A sound Cutting laptop made during Hadoop development

ANSWER: C

  1. All of the following accurately describe Hadoop, EXCEPT A. Open source B. Real-time C. Java-based D. Distributed computing approach ANSWER: B
  2. __________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. A. MapReduce B. Mahout C. Oozie D. All of the mentioned ANSWER: A
  3. __________ has the world's largest Hadoop cluster. A. Apple B. Datamatics C. Facebook D. None of the mentioned ANSWER: C
  4. Facebook Tackles Big Data With _______ based on Hadoop. A. Prism B. Project Prism C. Project Big D. Project Data ANSWER: B
  5. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. A. MapReduce B. Mapper C. TaskTracker D. JobTracker ANSWER: C
  6. Point out the correct statement A. Map Task in MapReduce is performed using the Mapper() function B. Reduce Task in MapReduce is performed using the Map() function C. All of the mentioned D. MapReduce tries to place the data and the compute as close as possible ANSWER: D
  7. ___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. A. Maptask B. Mapper C. Task execution

A. HashPar B. Partitioner C. HashPartitioner D. None of the mentioned ANSWER: C

  1. Mapper implementations are passed the JobConf for the job via the ________ method A. JobConfigure.configure B. JobConfigurable.configure C. JobConfigurable.configureable D. None of the mentioned ANSWER: B
  2. Point out the correct statement A. Applications can use the Reporter to report progress B. The HadoopMapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job C. The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format D. All of the mentioned ANSWER: D
  3. Input to the _______ is the sorted output of the mappers. A. Reducer B. Mapper C. Shuffle D. All of the mentioned ANSWER: A
  4. The right number of reduces seems to be : A. 0. B. 0. C. 0. D. 0. ANSWER: C
  5. Point out the wrong statement A. Reducer has 2 primary phases B. Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures C. It is legal to set the number of reduce-tasks to zero if no reduction is desired D. The framework groups Reducer inputs by keys (since different mappers may have output the same key) in sort stage ANSWER: A
  6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop. A. Mapper B. Cascader C. Scalding D. None of the mentioned ANSWER: D
  1. Which of the following phases occur simultaneously? A. Reduce and Sort B. Shuffle and Sort C. Shuffle and Map D. All of the mentioned ANSWER: B
  2. Mapper and Reducer implementations can use the ________ to report progress or just indicate that they are alive. A. Partitioner B. OutputCollector C. Reporter D. All of the mentioned ANSWER: C
  3. __________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer A. Partitioner B. OutputCollector C. Reporter D. All of the mentioned ANSWER: B
  4. _________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution. A. Map Parameters B. JobConf C. MemoryConf D. All of the mentioned ANSWER: B
  5. A ________ serves as the master and there is only one NameNode per cluster A. Data Node B. NameNode C. Data block D. Replication ANSWER: B
  6. Point out the correct statement A. DataNode is the slave/worker node and holds the user data in the form of Data Blocks B. Each incoming file is broken into 32 MB by default C. Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance D. None of the mentioned ANSWER: A
  7. HDFS works in a __________ fashion A. master-worker B. master-slave C. worker/slave. D. All of the mentioned
  1. ___________ is the world's most complete, tested, and popular distribution of Apache Hadoop and related projects. A. MDH B. CDH C. ADH D. BDH ANSWER: B
  2. Point out the correct statement A. Cloudera is also a sponsor of the Apache Software Foundation B. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls C. More enterprises have downloaded CDH than all other such distributions combined D. All of the mentioned ANSWER: D
  3. Cloudera ___________ includes CDH and an annual subscription license (per node) to Cloudera Manager and technical support. A. Enterprise B. Express C. Standard D. All the above ANSWER: A
  4. Cloudera Express includes CDH and a version of Cloudera ___________ lacking enterprise features such as rolling upgrades and backup/disaster recovery A. Enterprise B. Express C. Standard D. Manager ANSWER: D
  5. Point out the wrong statement A. CDH contains the main, core elements of Hadoop B. In October 2012, Cloudera announced the Cloudera Impala project C. CDH may be downloaded from Cloudera's website at no charge D. None of the mentioned ANSWER: D
  6. Cloudera Enterprise comes in ___________ edition. A. One B. Two C. Three D. Four ANSWER: C
  7. __________ is a online NoSQL developed by Cloudera. A. HCatalog B. Hbase C. Imphala D. Oozie

ANSWER: B

  1. _______ is an open source set of libraries, tools, examples, and documentation engineered. A. Kite B. Kize C. Ookie D. All of the mentioned ANSWER: A
  2. To configure short-circuit local reads, you will need to enable ____________ on local Hadoop. A. librayhadoop B. libhadoop C. libhad D. hadoop ANSWER: B
  3. CDH process and control sensitive data and facilitate A. multi-tenancy B. flexibilty C. scalabilty D. resuability ANSWER: A
  4. _______ can change the maximum number of cells of a column family A. set B. reset C. alter D. connect ANSWER: C
  5. Point out the correct statement A. You can add a column family to a table using the method addColumn() B. Using alter, you can also create a column family C. Using disable-all, you can truncate a column family D. None of the mentioned ANSWER: A
  6. Which of the following is not a table scope operator? A. MEMSTORE_FLUSH B. MEMSTORE_FLUSHSIZE C. MAX_FILESIZE D. All of the mentioned ANSWER: A
  7. You can delete a column family from a table using the method _________ of HBAseAdmin class. A. delColumn() B. removeColumn() C. deleteColumn() D. All of the mentioned ANSWER: A

A. They are made for formal presentations. B. They are typically made very quickly. C. Axes, legends, and other details are clean and exactly detailed. D. They are used in place of formal modeling. ANSWER: B

  1. Which of the following is true about the base plotting system? A. Margins and spacings are adjusted automatically depending on the type of plot and the data B. Plots are typically created with a single function call C. Plots are created and annotated with separate functions D. The system is most useful for conditioning plots ANSWER: C
  2. Which of the following is an example of a valid graphics device in R? A. A socket connection B. A Microsoft Word document C. A PDF file D. A file folder ANSWER: C
  3. Which of the following is an example of a vector graphics device in R? A. JPEG B. GIF C. PNG D. SVG ANSWER: D
  4. Bitmapped file formats can be most useful for A. Plots that may need to be resized B. Plots that require animation or interactivity C. Plots that are not scaled to a specific resolution D. Scatterplots with many many points ANSWER: D
  5. Which of the following functions is typically used to add elements to a plot in the base graphics system A. lines() B. hist() C. plot() D. boxplot() ANSWER: D
  6. Which function opens the screen graphics device for the Mac? A. bitmap() B. quartz() C. pdf() D. png() ANSWER: B
  7. What does the 'pch' option to par() control? A. the size of the plotting symbol in a scatterplot B. the line width in the base graphics system

C. the orientation of the axis labels on the plot D. the plotting symbol/character in the base graphics system ANSWER: D

  1. MapReduce was devised by______________ A. Apple B. Google C. Facebook D. Samsung ANSWER: B
  2. _____ programming language is a dialect of S. A. B B. C C. R D. K ANSWER: C
  3. Point out the WRONG statement A. Early versions of the S language contain functions for statistical modeling B. The book Programming with Data by John Chambers documents S version of the language C. In 1993 Bell Labs gave StatSci (later Insightful Corp.) an exclusive license to develop and sell the S language D. All of the mentioned ANSWER: A
  4. In 2004, ________ purchased the S language from Lucent for $2 million A. Insightful B. Amazon C. IBM D. All the above ANSWER: A
  5. In 1991, R was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of _________. A. John Hopkins B. California C. Harvard D. Auckland ANSWER: D
  6. Point out the wrong statement A. R is a language for data analysis and graphics B. K is language for statistical modelling and graphics C. One key limitation of the S language was that it was only available in a commercial package, S-PLUS D. None of the mentioned ANSWER: B
  7. Finally, in _________ R version 1.0.0 was released to the public. A. 2000 B. 2005

ANSWER: A

  1. The _________ R system contains, among other things, the base package which is required to run R A. root B. child C. base D. none of the above ANSWER: C
  2. Point out the wrong statement A. One nice feature that R shares with many popular open source projects is frequent releases B. R has sophisticated graphics capabilities C. S's base graphics system allows for very fine control over essentially every aspect of a plot or graph. D. All the above ANSWER: C
  3. Which of the following is a base package for R language? A. util B. lang C. tools D. all the above ANSWER: C
  4. Which of the following is "Recommended" package in R? A. util B. lang C. stats D. spatial ANSWER: D
  5. How many packages exist in R language for statistics? A. 2000 B. 3000 C. 4000 D. all the above ANSWER: D
  6. Advanced users can write ___ code to manipulate R objects directly. A. C B. C++ C. Java D. None of the mentioned ANSWER: A
  7. Which of the following is used for Statistical analysis in R language? A. RStudio B. Studio C. Heck D. None of the mentioned ANSWER: A
  1. The most convenient way to use R is at a graphics workstation running a ________ system. A. windowing B. running C. interfacing D. All of the mentioned ANSWER: A
  2. Point out the wrong statement A. Setting up a workstation to take full advantage of the customizable features of R is a straightforward thing B. q() is used to quit the R program C. R has an inbuilt help facility similar to the man facility of UNIX D. None of the mentioned ANSWER: B
  3. Which of the following is default prompt for UNIX environment? A. > B. << C. << D. < ANSWER: A 114.. Point out the wrong statement A. Windows versions of R have other optional help system also B. The help.search command (alternatively ??) allows searching for help in various ways C. R is case insensitive as are most UNIX based packages, so A and a are different symbols and would refer to different variables D. All of the mentioned ANSWER: C
  4. Which of the following statement is alternative to ?solve A. help(solve) B. man(solve) C. hel(solve) D. All of the mentioned ANSWER: A
  5. Elementary commands in R consist of either _______ or assignments. A. utilstats B. language C. expressions D. None of the mentioned ANSWER: C
  6. If a command is not complete at the end of a line, R will give a different prompt, by default it is : A. * B. - C. + D. All the above ANSWER: C
  1. If commands are stored in an external file, say commands.R in the working directory work, they may be executed at any time in an R session with the command : A. source("commands.R") B. exec("commands.R") C. execute("commands.R") D. All of the mentioned ANSWER: A
  2. _______ will divert all subsequent output from the console to an external file. A. sink B. div C. dip D. exp ANSWER: A
  3. The entities that R creates and manipulates are known as ________ A. task B. objects C. function D. expression ANSWER: B
  4. Which of the following can be used to display the names of (most of) the objects which are currently stored within R? A. object() B. objects() C. list() D. none of the above ANSWER: B
  5. Collection of objects currently stored in R is called as : A. package B. workspace C. list D. array ANSWER: B
  6. What will be the output of following code snippet? > paste("a", "b", se = ":") A. a+b B. a-b C. ab D. none ANSWER: D
  7. Point out the correct statement A. In R, a function is an object which has the mode function B. R interpreter is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions that are desired C. Functions are also often written when code must be shared with others or the public D. All of the mentioned ANSWER: D
  1. The __________ function returns a list of all the formal arguments of a function A. formals() B. funct() C. formal() D. All of the mentioned ANSWER: A
  2. Point out the wrong statement A. A formal argument can be a symbol, a statement of the form 'symbol = expression', or the special formal argument B. The first component of the function declaration is the keyword function C. The value returned by the call to function is not a function D. None of the mentioned ANSWER: A
  3. You can check to see whether an R object is NULL with the _________ function. A. is.nullobj() B. null() C. is.null() D. obj.null() ANSWER: C
  4. Which of the following code will print NULL? A. > args(pastebin) B. > args(paste) C. > arg(paste) D. > argc(paste) ANSWER: B
  5. What are the main components of Big Data? A. MapReduce B. HDFS C. YARN D. all the above ANSWER: D
  6. What are the different features of Big Data Analytics? A. Open Source B. Data Recovery C. Scalability D. all of the above ANSWER: D
  7. For YARN, the ___________ Manager UI provides host and port information. A. Data Node B. NameNode C. Resource D. Replication ANSWER: C