Download CS347 Lecture 1: The Two-Generals Problem and Eventual Commit Protocol and more Slides Distributed Database Management Systems in PDF only on Docsity!
CS 347 Lecture 1 7
- Blue and red army must attack at same time
- Blue and red generals synchronize through messengers
- Messengers can be lost
Rules:
CS 347 Lecture 1 8
How Many Messages Do We Need?
BG RG attack at 9am
assume blue starts...
Is this enough??
CS 347 Lecture 1 9
How Many Messages Do We Need?
BG RG attack at 9am
assume blue starts...
Is this enough??
ack (red goes at 9am)
CS 347 Lecture 1 10
How Many Messages Do We Need?
BG RG attack at 9am
assume blue starts...
Is this enough??
ack (red goes at 9am) got ack
CS 347 Lecture 1 11
Stated problem is Impossible!
- Theorem: There is no protocol that uses a finite number of messages that solves the two-generals problem (as stated here)
Alternatives??
CS 347 Lecture 1 12
Probabilistic Approach?
- Send as many messages as possible, hope one gets through...
BG RG attack at 9am
assume blue starts...
attack at 9am attack at 9am attack at 9am
CS 347 Lecture 1 13
Eventual Commit
- Eventually both sides attack...
BG RG attack ASAP
assume blue starts...
on my way!
retransmits retransmits
CS 347 Lecture 1 14
Eventual Commit
- One message sent every time unit
- Probability of success one message is p
- What is probability that red commits by time t?
BG (^) attack ASAP RG
on my way!
retransmits retransmits
CS 347 Lecture 1 15
Eventual Commit
BG (^) attack ASAP RG
on my way!
retransmits retransmits
CS 347 Lecture 1 16
Eventual Commit BG (^) attack ASAP RG
on my way!
retransmits retransmits
- C(1) = p
- C(2) = p + (1-p)p
CS 347 Lecture 1 17
Eventual Commit
BG (^) attack ASAP RG
on my way!
retransmits retransmits
- C(1) = p
- C(2) = p + (1-p)p
- C(3) = p + (1-p)p + (1-p)^2 p
- C(4) = p + (1-p)p + (1-p)^2 p + (1-p) 3 p
Eventual Commit
CS 347 Lecture 1 18
C(t)
t
p
CS 347 Lecture 1 25
- Renewed Interest in Distributed/Parallel Data Processing! - Massive web data, manage with many computers - How to crawl and search the web? - Peer-to-peer systems manage huge amounts of data - Data from many sources (e.g., comparison shopping): how to integrate? - Sensor Networks: data generated an many sensors/devices, need to analyze - Multi-player games (e.g., Second Life): tons of distributed data CS 347 Lecture 1 26
It’s the Economy, Stupid!
- Example: Multi-player games
Data
state
P P
P P
P
P
P
P
P
P
CS 347 Lecture 1 27
It’s the Economy, Stupid!
- Example: Multi-player games
Data
state
P P
P P
P
P
P
P
P
P state
CS 347 Lecture 1 28
Logistics
- LECTURES: Mondays and Wednesdays 12:50pm to 2:05pm, Gates B
- INSTRUCTOR: Hector Garcia-Molina; Office: Gates Hall 434 Email: [email protected]; Office Hours: Mondays, Wednesdays 11am to 12noon.
- TEACHING ASSISTANT: Kushal Tayal; Email: [email protected]; News Group: su.class.cs347; Office Hours: TBD
- SECRETARY: Marianne Siroker; Office: Gates Hall 436; Email: [email protected]; Phone: (650) 723-
CS 347 Lecture 1 29
Logistics
- TEXTBOOK: No required textbook. Some material for the lectures will be drawn from the following book: - M. Tamer Ozsu and Patrick Valduriez, "Principles of Distributed Database Systems," Second Edition, Prentice Hall 1999.
- CLASS WEB PAGE: http://www.stanford.edu/class/cs Will contain homework assignments, course news, etc. Be sure to check it periodically.
- ASSIGNMENTS: about 5 homeworks
- GRADING: Homeworks: 20%, Midterm 30%, Final: 50%.
CS 347 Lecture 1 30
Tentative Syllabus 2012 (Part I)
DATE TOPIC
- Monday April 2 Introduction [N01]
- Wednesday April 4 Data Fragmentation [N02]
- Monday April 9 Query processing [N03]
- Wednesday April 11 Query processing & Optimization [N04]
- Monday April 16 Concurrency Control, Failures [N05]
- Wednesday April 18 Reliable Data Management [N06]
- Monday April 23 Reliable Data Management [N06]
- Wednesday April 25 Replicated Data Management [N07]
- Monday April 30 Partitions, Entity Resolution [N11]
- Wednesday May 2 Midterm
CS 347 Lecture 1 31
Tentative Syllabus 2012 (Part II)
DATE TOPIC
- Monday May 7 Peer to Peer Systems [N08]
- Wednesday May 9 Peer to Peer Systems [N08]
- Monday May 14 Map-Reduce [N09]
- Wednesday May 16 Map-Reduce [N09]
- Monday May 21 Distributed IR [N10]
- Wednesday May 23 Publish Subscribe Systems [N14]
- Wednesday May 30 Time [N12]
- Monday June 4 Heterogeneous Systems [N13]
- Wednesday June 6 Extra Topic
- Friday June 8 8:30 am!!! FINAL EXAM
Interesting New Systems
- Storm (from Twitter)
- S4 (from Yahoo)
- Casandra (key-value store)
- Hive (SQL over Hadoop)
- Pregel (graph execution)
- Kestrel (queues?)
- ZooKeeprer (replicated data)
- Sparkl or Spark (Berkeley?)
- H-Base
- HyRacks (UC Irvine)
CS 347 Lecture 1 32
- MemCache-D
- Pnuts
- Dynamo (Amazon)
- Mega-Store (Google)
- Paxos
- G-Store (UC Santa Barbara)
- Elastras (UC Santa Barbara)
- Tao (Facebook)
CS 347 Lecture 1 33
Concepts you should be familiar with:
- CS245: query plan, cost estimation, join algorithms, recovery, logging,…
- Interconnection networks (bus, mesh, hypercube,…)
- Computer networks (LAN, WAN,…)
CS 347 Lecture 1 34
Introductory topics
- Database architectures
- Client-server systems
- Distributed vs. parallel DB systems
- Cloud Computing
CS 347 Lecture 1 35
DB architectures
(1) Shared memory
P P (^) ... P
M
CS 347 Lecture 1 36
DB architectures
(2) Shared disk
P
M
P P
M M
CS 347 Lecture 1 43
(5) Unusual — processor per track or processor per disk
M
P
P’
P’
P’
“small” processors
CS 347 Lecture 1 44
(6) Unusual — sensor networks
P’
M
M
B P
M
B P
M
B P
M
B P
M
B P
data collection node sensor
battery
CS 347 Lecture 1 45
Issues for selecting architecture
- Reliability
- Scalability
- Geographic distribution of data
- Data “clusters”
- Performance
- Cost
CS 347 Lecture 1 46
Client-Server Systems
(or how to partition software)
Application Front End Query Processor Transaction Processing File Access
client server
CS 347 Lecture 1 47
Client-Server Systems
(or how to partition software)
Application Front End Query Processor Transaction Processing File Access
client server
CS 347 Lecture 1 48
Client-Server Systems
(or how to partition software)
Application Front End Query Processor Transaction Processing File Access
client server
CS 347 Lecture 1 49
Transaction Servers
- Clients ship transactions consisting of 1 or more SQL commands
E.g., Open DataBase Connectivity (ODBC) (standard API)
CS 347 Lecture 1 50
Data Servers
- Client requests pages or records
- Popular for OODB systems
CS 347 Lecture 1 51
Issues
- Object granularity
- Where is data cached?
- Where is locking done?
CS 347 Lecture 1 52
Basic Tradeoff
- Offloading work to clients
- Data transmitted
C C
S S
Get pages
Reserve hotel room
CS 347 Lecture 1 53
Note: Similar issues arise when we partition
software/functionality within server
Reserve hotel room (^) P
M
P
M
P
M
•Where is data cached? •Where is locking done? CS 347 Lecture 1 54
Parallel or distributed DB system?
- More similarities than differences!
CS 347 Lecture 1 61
Next
- How to describe distributed data
- Query processing in parallel DBs
- Query processing in distributed DBs
CS 347 Lecture 1 62
Query processing in parallel DBs:
- Typically: we can distribute/ partition/ sort…. data to make certain DB operations (e.g., Join) fast
CS 347 Lecture 1 63
Query processing in distributed DBs:
- Typically: we are given data distribution; we need to find query processing strategy to minimize cost (e.g., communication cost)