Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding MapReduce: Google's Algorithm for Processing Large Data Sets, Study notes of Data Analysis & Statistical Methods

AMET University Data Analysis & Statistical Methods

Why MapReduce was developed by Google to address the limitations of traditional enterprise systems in handling large volumes of data. MapReduce is a programming model for processing large datasets distributed on a large cluster using the concept of Divide and Conquer. It consists of two methods: map() and Reduce(). an overview of how MapReduce works, including its architecture, phases, and example use cases.

Typology: Study notes

2019/2020

Uploaded on 08/01/2020

bagga-dhruv 🇮🇳

1 document

1 / 32

This page cannot be seen from the preview

Don't miss anything!

CH-1

INTRODUCTION TO BIG

DATA

BY: PROF. AJAYSINH RATHOD

Discover Study notes of Data Analysis & Statistical Methods AMET University

Partial preview of the text

Download Understanding MapReduce: Google's Algorithm for Processing Large Data Sets and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

CH-

INTRODUCTION TO BIG

DATA

BY: PROF. AJAYSINH RATHOD

Why MapReduce?

Why MapReduce? 

Google solved this bottleneck issue using an algorithm

called MapReduce.



MapReduce divides a task

small parts and assigns

them to many computers.



Later, the results are collected at one place and

integrated to form the result dataset.

Algorithms using map reduce

How MapReduce Works?  (^) The MapReduce algorithm contains two important tasks, namely Map and Reduce.  (^) sorting and filtering. The Map task takes a set of data and converts it into another set of data,  (^) where individual elements are broken down into tuples (key- value pairs).  (^) sorting and filtering. The Reduce task takes the output from the Map as an input and combines  (^) those data tuples (key-value pairs) into a smaller set of tuples.  (^) The reduce task is always performed after the map job.

Map Reduce

Map Reduce algorithms

MapReduce is a programming model designed for processing

large volumes of data in parallel by dividing the work into a set

of independent tasks.

For example twitter data was processed on different servers on

basis of months.

sorting and filtering. Hadoop is the physical implementation of Mapreduce.

sorting and filtering. It is combination of 2 java functions : Mapper() and

Reducer().

sorting and filtering. example: to check popularity of text.

Big Data and Its Sources

Mapper function maps the split files and provide input to reducer.  Mapper ( filename , file –contents): for each word in file-contents: emit (word , 1).

Reducer function clubs the input provided by mapper and produce output  Reducer ( word , values): sum=0; for each value in values: sum=sum + value emit(word , sum).

How MapReduce Works?  The MapReduce algorithm contains two important tasks, namely Map and Reduce. 

The Map task takes a set of data and converts it into

another set of data, where individual elements are broken down into tuples (key-value pairs). 

The Reduce task takes the output from the Map as an

input and combinesthose data tuples (key-value pairs) into a smaller set of tuples.  The reduce task is always performed after the map job.

How MapReduce Works?

How MapReduce Works?  (^) Shuffle and Sort − The Reducer task starts with the Shuffle and Sort step. It downloads the grouped key-value pairs onto the local machine, where the Reducer is running. The individual key-value pairs are sorted by key into a larger data list. The data list groups the equivalent keys together so that their values can be iterated easily in the Reducer task.  (^) Reducer − The Reducer takes the grouped key-value paired data as input and runs a Reducer function on each one of them. Here, the data can be aggregated, filtered, and combined in a number of ways, and it requires a wide range of processing. Once the execution is over, it gives zero or more key-value pairs to the final step.  (^) Output Phase − In the output phase, we have an output formatter that translates the final key-value pairs from the Reducer function and writes them onto a file using a record writer.

How MapReduce Works?  (^) As shown in the illustration, the MapReduce algorithm performs the following actions −  (^) Tokenize − Tokenizes the tweets into maps of tokens and writes them as key-value pairs.  (^) Filter − Filters unwanted words from the maps of tokens and writes the filtered maps as key-value pairs.  (^) Count − Generates a token counter per word.  Aggregate Counters − Prepares an aggregate of similar counter values into small manageable units.

Understanding MapReduce: Google's Algorithm for Processing Large Data Sets, Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Understanding MapReduce: Google's Algorithm for Processing Large Data Sets and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

CH-

INTRODUCTION TO BIG

DATA

BY: PROF. AJAYSINH RATHOD

Why MapReduce?

Google solved this bottleneck issue using an algorithm

called MapReduce.

MapReduce divides a task

small parts and assigns

them to many computers.

Later, the results are collected at one place and

integrated to form the result dataset.

Algorithms using map reduce

large volumes of data in parallel by dividing the work into a set

of independent tasks.

For example twitter data was processed on different servers on

basis of months.

sorting and filtering. Hadoop is the physical implementation of Mapreduce.

sorting and filtering. It is combination of 2 java functions : Mapper() and

Reducer().

sorting and filtering. example: to check popularity of text.

The Map task takes a set of data and converts it into

The Reduce task takes the output from the Map as an

Algorithms using map reduce