Review of Database Systems: Relational Model, XML, Data Organization, Query & Transaction, Slides of Introduction to Database Management Systems

A review of various topics in database systems, including the relational model and algebra, entity-relationship design, sql, null and three-value logic, bag versus set semantics, sfw queries, modifications, constraints, triggers, views, indexes, storage hierarchy, disk geometry, data layout, access paths, primary versus secondary indexes, tree-based indexes, query processing, optimization, and transaction processing. The document also includes announcements about homework grades, upcoming assignments, and the final exam.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
2
Announcements
Homework #3 grades recorded
Homework #4 due today (December 2)
Will be graded this weekend
Sample solution will be available at your demo
Remember to sign up for course project demo (see email)
Final exam next Friday (December 10) 7–10pm
Comprehensive (everything up to today’s lecture, with emphasis
on the second half of the course, and materials exercised in
homework assignments)
Open book, open notes
Same format as sample final exam
Sample solution will be available at your demo
No time pressure (I promise!)
3
Review: relational basics
Relational model/algebra physical data independence
Entity-relationship design
Design theory (FD’s, MVD’s, 3NF, BCNF, 4NF) help eliminate
redundancy
SQL
NULL and three-value logic nifty feature, big mess
Bag versus set semantics
SFW (or SPJ) queries, subqueries, grouping and aggregation
Modifications
Constraints the more you know the better you can do
Triggers (ECA) “active” data
Views logical data independence
Indexes reintroduce redundancy to improve performance
Transactions and isolation levels
4
Review: XML
Data model: well-formed vs. valid (DTD schema)
Query languages
XPath: (branching) path expressions (with conditions)
XQuery: FLWR, subqueries in return (restructuring), quantified
expressions, aggregation, ordering
XSLT: structural recursion with templates
Programming: SAX (one pass) vs. DOM (in memory)
Relational vs. XML
Tables vs. hierarchies (or graphs in general)
Storing XML as relations
Schema-oblivious: node/edge based, interval based, path based, etc.
Schema-aware
Joins vs. path traversals
5
Review: physical data organization
Storage hierarchy (DC vs. Pluto) count I/O’s
Disk geometry: three components of access cost; random vs.
sequential I/O
Data layout
Record layout (handling variable-length fields, NULL’s)
Block layout (NSM, PAX) inter-/intra-record locality
Access paths
Primary versus secondary indexes
Tree-based indexes: ISAM, B+-tree
Text indexes: inverted lists, signature files, tries
Again, reintroduce redundancy to improve performance
Fundamental trade-off: query versus update cost
6
Review: query processing, optimization
Processing
Scan-based algorithms
Sort- and hash-based algorithms (and their duality)
Index-based algorithms
Pipelined execution with iterators
Optimization (or “goodification”?)
Heuristics: push selections down; smaller joins first
Reduce the size of intermediate results
Cost-based
Query rewrite: merge blocks to get a bigger search space
Cost estimation: result size estimation; use statistics
Search algorithm: dynamic programming (+ interesting orders)
7
Review: transaction processing
ACID properties
Concurrency control
Serial and conflict-serializable schedules
Locking-based: 2PL, strict 2PL
Recovery with logging
Steal: requires undo logging
No force: requires redo logging
WAL (log holds the truth)
Fuzzy checkpointing

Partial preview of the text

Download Review of Database Systems: Relational Model, XML, Data Organization, Query & Transaction and more Slides Introduction to Database Management Systems in PDF only on Docsity!

2

Announcements

™ Homework #3 grades recorded

™ Homework #4 due today (December 2)

ƒ Will be graded this weekend ƒ Sample solution will be available at your demo

™ Remember to sign up for course project demo (see email)

™ Final exam next Friday (December 10) 7–10pm

ƒ Comprehensive (everything up to today’s lecture, with emphasis on the second half of the course, and materials exercised in homework assignments) ƒ Open book, open notes ƒ Same format as sample final exam

  • Sample solution will be available at your demo ƒ No time pressure (I promise!)

3

Review: relational basics

™ Relational model/algebra → physical data independence ™ Entity-relationship design ™ Design theory (FD’s, MVD’s, 3NF, BCNF, 4NF) → help eliminate redundancy ™ SQL ƒ NULL and three-value logic → nifty feature, big mess ƒ Bag versus set semantics ƒ SFW (or SPJ) queries, subqueries, grouping and aggregation ƒ Modifications ƒ Constraints → the more you know the better you can do ƒ Triggers (ECA) → “active” data ƒ Views → logical data independence ƒ Indexes → reintroduce redundancy to improve performance ƒ Transactions and isolation levels

4

Review: XML

™ Data model: well-formed vs. valid (DTD ≈ schema)

™ Query languages

ƒ XPath: (branching) path expressions (with conditions) ƒ XQuery: FLWR, subqueries in return (restructuring), quantified expressions, aggregation, ordering ƒ XSLT: structural recursion with templates

™ Programming: SAX (one pass) vs. DOM (in memory)

™ Relational vs. XML

ƒ Tables vs. hierarchies (or graphs in general) ƒ Storing XML as relations

  • Schema-oblivious: node/edge based, interval based, path based, etc.
  • Schema-aware → Joins vs. path traversals

5

Review: physical data organization

™ Storage hierarchy (DC vs. Pluto) → count I/O’s

™ Disk geometry: three components of access cost; random vs.

sequential I/O

™ Data layout

ƒ Record layout (handling variable-length fields, NULL’s) ƒ Block layout (NSM, PAX) → inter-/intra-record locality

™ Access paths

ƒ Primary versus secondary indexes ƒ Tree-based indexes: ISAM, B+^ -tree ƒ Text indexes: inverted lists, signature files, tries → Again, reintroduce redundancy to improve performance → Fundamental trade-off: query versus update cost

6

Review: query processing, optimization

™ Processing

ƒ Scan-based algorithms ƒ Sort- and hash-based algorithms (and their duality) ƒ Index-based algorithms ƒ Pipelined execution with iterators

™ Optimization (or “goodification”?)

ƒ Heuristics: push selections down; smaller joins first → Reduce the size of intermediate results ƒ Cost-based

  • Query rewrite: merge blocks to get a bigger search space
  • Cost estimation: result size estimation; use statistics
  • Search algorithm: dynamic programming (+ interesting orders)

7

Review: transaction processing

™ ACID properties

™ Concurrency control

ƒ Serial and conflict-serializable schedules

ƒ Locking-based: 2PL, strict 2PL

™ Recovery with logging

ƒ Steal: requires undo logging

ƒ No force: requires redo logging

ƒ WAL (log holds the truth)

ƒ Fuzzy checkpointing