Database Systems: An Introduction to Relational Database Management Systems, Slides of Introduction to Database Management Systems

An introduction to database systems, focusing on relational database management systems (rdbms). The goals and roadmap of the course, explaining the concept of a database system and its importance. It also discusses query processing and the history of dbms, from early efforts like codasyl to the relational revolution. The document also includes information about modern dbms features, major dbms providers, and people working with databases.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Introduction
CPS 116
Introduction to Database Systems
2
Course goals
Random things you might do (for fun or profit) after
taking this course
Develop your own database-driven Web sites (like
Amazon, eBay, etc.)
Be a “power user” of commercial database systems
Upgrade your Web sites with XML
Explain to friends why MySQL is not a “real” database
system without InnoDB or Berkeley DB support
… …
3
Course roadmap
Relational databases
Relational algebra, database design, SQL, application
programming
XML
Data model and query languages, application programming,
interplay between XML and relational databases
Database internals
Storage, indexing, query processing and optimization, concurrency
control and recovery
Research topics
Data warehousing and mining, stream data processing, etc.
4
What is a database system?
From Oxford Dictionary:
Database: an organized body of related information
Database system, DataBase Management System
(DBMS): a software system that facilitates the
creation and maintenance and use of an electronic
database
5
What do you want from a DBMS?
Keep data around (persistent)
Answer queries (questions) about data
Update data
Example: a traditional banking application
Each account belongs to a branch, has a number, an owner, a
balance, …
Each branch has a location, a manager, …
Persistency: Homer will be pretty upset if his balance disappears
after a power outage
Query: What’s the balance in Homer Simpson’s account?
Modification: Homer withdraws $100
6
Sounds simple!
ASCII file
Accounts/branches separated by newlines
Fields separated by #’s
1001#Springfield#Mr. Morgan
... ...
00987-00654#Ned Flanders#2500.00
00123-00456#Homer Simpson#400.00
00142-00857#Montgomery Burns#1000000000.00
... ...
pf3
pf4
pf5

Partial preview of the text

Download Database Systems: An Introduction to Relational Database Management Systems and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Introduction

CPS 116

Introduction to Database Systems

Course goals

™ Random things you might do (for fun or profit) after

taking this course

ƒ Develop your own database-driven Web sites (like

Amazon, eBay, etc.)

ƒ Be a “power user” of commercial database systems

ƒ Upgrade your Web sites with XML

ƒ Explain to friends why MySQL is not a “real” database

system without InnoDB or Berkeley DB support

3

Course roadmap

™ Relational databases

ƒ Relational algebra, database design, SQL, application programming

™ XML

ƒ Data model and query languages, application programming, interplay between XML and relational databases

™ Database internals

ƒ Storage, indexing, query processing and optimization, concurrency control and recovery

™ Research topics

ƒ Data warehousing and mining, stream data processing, etc.

4

What is a database system?

From Oxford Dictionary:

™ Database: an organized body of related information

™ Database system, DataBase Management System

(DBMS): a software system that facilitates the

creation and maintenance and use of an electronic

database

5

What do you want from a DBMS?

™ Keep data around (persistent)

™ Answer queries (questions) about data

™ Update data

™ Example: a traditional banking application

ƒ Each account belongs to a branch, has a number, an owner, a balance, … ƒ Each branch has a location, a manager, … ƒ Persistency: Homer will be pretty upset if his balance disappears after a power outage ƒ Query: What’s the balance in Homer Simpson’s account? ƒ Modification: Homer withdraws $

6

Sounds simple!

™ ASCII file

™ Accounts/branches separated by newlines

™ Fields separated by #’s

1001#Springfield#Mr. Morgan ... ... 00987-00654#Ned Flanders#2500. 00123-00456#Homer Simpson#400. 00142-00857#Montgomery Burns#1000000000. ... ...

Query

™ What’s the balance in Homer Simpson’s account?

™ A simple script

ƒ Scan through the accounts file

ƒ Look for the line containing “Homer Simpson”

ƒ Print out the balance

1001#Springfield#Mr. Morgan ... ... 00987-00654#Ned Flanders#2500. 00123-00456#Homer Simpson#400. 00142-00857#Montgomery Burns#1000000000. ... ...

Query processing tricks

™ Tens of thousands of accounts are not Homer’s

)Cluster accounts by owner’s initial: those owned by “A...” go into file A; those owned by “B...” go into file B; etc. → decide which file to search using the initial )Keep accounts sorted by owner name → binary search )Hash accounts using owner name → compute file offset directly )Index accounts by owner name: index entries have the form h owner_name , file_offset i → search index to get file offset )And the list goes on…

™ What happens when the query changes to: What’s the

balance in accounts 00142-00857?

9

Observations

™ Tons of tricks (not only in query processing, but also

in storage, concurrency control, recovery, etc.)

™ Different tricks may work better in different usage

scenarios (example?)

™ Same tricks get used over and over again in different

applications

) We need a library, or better yet, a server (to support

sharing, backup, etc.)

10

The birth of DBMS – 1

(Pretty drawing stolen from Hans-J. Schek’s VLDB 2000 slides)

11

The birth of DBMS – 2

(Pretty drawing stolen from Hans-J. Schek’s VLDB 2000 slides)

12

The birth of DBMS – 3

(Pretty drawing stolen from Hans-J. Schek’s VLDB 2000 slides)

DBMS is multi-user

™ Example

get account balance from database; if balance > amount of withdrawal then balance = balance - amount of withdrawal; dispense cash; store new balance into database;

™ Homer at ATM1 withdraws $

™ Marge at ATM2 withdraws $

™ Initial balance = $400, final balance =?

ƒ Should be $250 no matter who goes first

Final balance = $

read balance; $

if balance > amount then balance = balance - amount; $ write balance; $

read balance; $ if balance > amount then balance = balance - amount; $ write balance; $

Homer withdraws $100: Marge withdraws $50:

21

Final balance = $

read balance;

if balance > amount then balance = balance - amount; write balance;

read balance;

if balance > amount then balance = balance - amount; write balance;

Homer withdraws $100: Marge withdraws $50:

$

$ $

$

$ $

22

Concurrency control in DBMS

™ Appears similar to concurrent programming

problems?

ƒ But data not main-memory variables

™ Appears similar to file system concurrent access?

ƒ Approach taken by MySQL in the old days

(fun reading: http://openacs.org/philosophy/why-not-mysql.html)

ƒ But want to control at much finer granularity

  • Or else one withdrawal would lock up all accounts!

23

Recovery in DBMS

™ Example: balance transfer

decrement the balance of account X by $100; increment the balance of account Y by $100;

™ Scenario 1: Power goes out after the first instruction

™ Scenario 2: DBMS buffers and updates data in

memory (for efficiency); before they are written back

to disk, power goes out

™ Log updates; undo/redo during recovery

24

Summary of modern DBMS features

™ Persistent storage of data

™ Logical data model; declarative queries and updates

→ physical data independence

™ Multi-user concurrent access

™ Safety from system failures

™ Performance, performance, performance

ƒ Massive amounts of data (terabytes ~ petabytes)

ƒ High throughput (thousands ~ millions transactions per

minute)

ƒ High availability (≥ 99.999% uptime)

Major DBMS today

™ Oracle

™ IBM DB2 (from System R, System R*, Starburst)

™ Microsoft SQL Server

™ NCR Teradata

™ Sybase

™ Informix (acquired by IBM)

™ PostgreSQL (from UC Berkeley’s Ingres, Postgres)

™ Tandem NonStop (acquired by Compaq, now HP)

? MySQL and Microsoft Access

relational

inside

Modern DBMS architecture

™ OS layer is bypassed for performance and safety

™ Many details will be filled in the DBMS box

DBMS

Disk(s)

Applications

OS

27

People working with databases

™ End users: query/update databases through application user

interfaces (e.g., Amazon.com, 1-800-DISCOVER, etc.)

™ Database designers: design database “schema” to model

aspects of the real world

™ Database application developers: build applications that

interface with databases

™ Database administrators (a.k.a. DBA’s): load, back up, and

restore data, fine-tune databases for performance

™ DBMS implementors: develop the DBMS or specialized

data management software, implement new techniques for

query processing and optimization

28

Course information

™ Book

ƒ Database Systems: The Complete Book , by H. Garcia-Molina, J. D. Ullman, and J. Widom ƒ Get the value-pack edition with free access to Gradiance when it comes out (check Web site for updates)

™ Web site

ƒ http://www.cs.duke.edu/courses/fall04/cps116/ ƒ Course information; tentative syllabus and reference sections in GMUW; lecture slides, assignments, programming notes

™ Blackboard: for grades only

™ Newsgroup: duke.cs.cps

ƒ Preferred news server: news.cs.duke.edu (news.duke.edu also works but with a lag)

29

Course load

™ Four homework assignments (35%)

ƒ Include written and programming problems as well as online exercises on Gradiance

™ Course project (25%)

ƒ Details to be given in the third week of class

™ Optional presentation (8.75%; replaces lowest homework

grade)

ƒ Sign-up sheet to be given in the third week of class

™ Midterm and final (20% each)

ƒ Open book, open notes ƒ Final is comprehensive, but emphasizes the second half of the course