Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding the Core of Cloud Computing: MapReduce and Datastores, Study notes of Computer Science

Alagappa University Computer Science

Explore the core concepts of cloud computing, including mapreduce, serial and parallel databases, and distributed data storage. Learn how to operate on large amounts of data, design considerations for the core, and the importance of mapreduce in cloud computing.

Typology: Study notes

2012/2013

Uploaded on 04/23/2013

aslesha 🇮🇳

4.4

(14)

160 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

Service switching

Serial computations

Horizontal scaling

So far, we've studied the "edge" of the cloud:

Distributed objects

Parallel computations

Core scaling

Now, we turn our attention to the "core" of the cloud:

The core

Tuesday, February 23, 2010

5:50 PM

Core Page 1

Docsity.com

Discover Study notes of Computer Science Alagappa University

Partial preview of the text

Download Understanding the Core of Cloud Computing: MapReduce and Datastores and more Study notes Computer Science in PDF only on Docsity!

Service switching Serial computations Horizontal scaling So far, we've studied the "edge" of the cloud: Distributed objects Parallel computations Core scaling Now, we turn our attention to the "core" of the cloud: The core The core Tuesday, February 23, 2010 5:50 PM Docsity.com

Operate on really large amounts of data. Use redundant data storage for robustness. Compute queries quickly regardless of data size. Continue functioning even if there are multiple points of failure. Provide effective programming abstractions for manipulating data without having to know details. Design considerations for the core Design considerations for the core Tuesday, February 23, 2010 8:45 PM Docsity.com

(Serial) Databases (Cloud) Datastores Tables and rows Key/value pairs SQL Queries MapReduce and PIG Serial execution Parallel execution NoSQL: protest of how hard it is to parallelize SQL. If you do everything with <k,v> pairs, then it is much easier to put it into the cloud. Databases and Datastores Tuesday, February 23, 2010 6:00 PM Docsity.com

The key is a unique identifier for data. The value might be data associated with the key, or alternatively, a key to other data. Any datastore can be represented by a collection of sets of <key,value> pairs where: (this might be considered the fundamental theorem of database theory; the resulting representation of data is called 4th normal form ). Thus distributed datastores can concentrate on being able to store and retrieve key-value pairs rather than tables and rows. Some key concepts Some key concepts Tuesday, February 23, 2010 5:53 PM Docsity.com

The mapping phase is " embarrassingly parallel "; parallel time = (total serial time)/(# of processors). Parallelism is ideal ; we can't do any better than that! Time for the reduce phase is proportional to the log of the number of processors utilized. Not quite perfect runtime, but as good as it gets. Why MapReduce is important Why MapReduce is important Tuesday, February 23, 2010 6:03 PM Docsity.com

Trackers : initiate a MapReduce, gather results. Mappers : perform the map part of an operation, contain data. Reducers : perform the reduce part of an operation, don't contain data. Three kinds of nodes Select a Tracker by flowless switching; send it the query. Tracker contacts its mappers (mechanisms differ ; google uses UDP broadcast ; hadoop uses tree propogation ). Answer flows back from mappers to reducers to tracker to client, in a tree ("funnel") shape. How a MapReduce is implemented: Theory of operation of MapReduce Theory of operation Tuesday, February 23, 2010 8:26 PM Docsity.com

M

R

S T R

S=switch T=tracker R=reducer M=mapper Picture of a MapReduce: making request (google) Tuesday, February 23, 2010 8:30 PM Docsity.com

M

R

S T R

S=switch T=tracker R=reducer M=mapper Picture of a MapReduce: making request (hadoop) Tuesday, February 23, 2010 8:30 PM Docsity.com

Edge service MapReduce Can't operate on local data Operates solely on local data Queries a datastore Is a datastore Scales horizontally Scales vertically Adds serial instances Adds parallel instances Switches between edge servers Switches between tracker nodes Edge service versus MapReduce Tuesday, February 23, 2010 8:00 PM Docsity.com

Input: numbered lines of text. Output: index of the line numbers in which each word appears, sorted by word. Example: book index Example: book index Tuesday, February 23, 2010 6:11 PM Docsity.com

For every line in the datastore, Label words with the line in which they appear. Part 2: Mapping E.g., 1 When that Aprillis with his showers swoot, 5 When Zephyrus eke with his swoote breath Becomes, after mapping when: 1 5 that: 1 Aprillis: 1 with: 1 5 his: 1 5 showers: 1 swoot: 1 Zephyrus: 5 eke: 5 swoote: 5 breath: 5 (depicting them in order of discovery) This is the result of one node's mapping. Mapping Tuesday, February 23, 2010 6:21 PM Docsity.com

Create a global view of data from the local views Combine work from several nodes. Part 3: Reduce Node01 handles lines 1 and 5 Node02 handles lines 2 and 6 Node03 handles lines 3 and 7 Node04 handles lines 4 and 8 E.g. if Then the reduce depicts results for lines 1-8. (It is easy enough to produce this in sorted order). Reduce Tuesday, February 23, 2010 6:27 PM Docsity.com

CRUD : create/retrieve/update/delete for datastores. MergeSort : sort data into any desired order. Index : produce an index for existing data. Search : produce data only if there's a match. Count : count instances of a search term. Common MapReduce Patterns MapReduce Patterns Tuesday, February 23, 2010 7:31 PM Docsity.com

C reate a <key,value> pair. R etrieve a <key, value> pair. U pdate a value for a key. D elete a <key,value> pair. MapReduce can implement distributed CRUD Map: choose one (or more) elements to store <key, value> Reduce: count number of created pairs. Create Map: return value for key, nothing if no match. Reduce: ignore empty returns. Retrieve Map: put new value everywhere old value lives. Reduce: count changes made. Update: Map: delete all instances of given key. Reduce: count deletions done. Delete: CRUD and MapReduce CRUD Tuesday, February 23, 2010 7:57 PM Docsity.com

Understanding the Core of Cloud Computing: MapReduce and Datastores, Study notes of Computer Science

Related documents

Partial preview of the text

Download Understanding the Core of Cloud Computing: MapReduce and Datastores and more Study notes Computer Science in PDF only on Docsity!

M

M

M

M

R

R

S T R

M

M

M

M

R

R

S T R