Advanced Databases, Study notes of Algebra

The attica database system. ▻ Home-grown RDBMS, written in Java. ▻ Visit inf.ed.ac.uk/teaching/courses/adbs/attica to download the system and the API ...

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

tintoretto
tintoretto 🇬🇧

4.1

(8)

214 documents

1 / 721

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Advanced Databases
Stratis D. Viglas
University of Edinburgh
Stratis D. Viglas (University of Edinburgh) Advanced Databases 1/1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Advanced Databases and more Study notes Algebra in PDF only on Docsity!

Advanced Databases

Stratis D. Viglas

University of Edinburgh

Introduction Overview

Outline

Introduction Overview

Syllabus

Introduction

Relational databases overview

I (^) Data model, evaluation model

Storage

I (^) Indexes, multidimensional data

Query evaluation

I (^) Join evaluation algorithms, execution models

Query optimisation

I (^) Cost models, search space exploration, randomised optimisation

Concurrency control and recovery

I (^) Locking and transaction processing

Parallel databases

Introduction Overview

Assignments and software

Programming assignments

The attica database system

I (^) Home-grown RDBMS, written in Java I (^) Visit inf.ed.ac.uk/teaching/courses/adbs/attica to download the system and the API documentation I (^) All programming assignments will be using the attica front-end and code-base

Plagiarism policy: You cheat, you’re caught, you fail

I (^) No discussion

Introduction Relational databases overview

Outline

Introduction Relational databases overview

Three basic building blocks

Attribute

I (^) A (name, value) pair

Tuple

I (^) A set of attributes

Relation

I (^) A set of tuples with the same schema

SID 123-ABC

SID 123-ABC

Name Mary Jones

... ...

Year 4

SID 123-ABC

Name Mary Jones

... ...

Year 4 456-DEF John Smith ... 3 ... ... ... ... 999-XYZ Jack Black ... 4

Introduction Relational databases overview

Data storage

Page

Platter

Track

Cylinder }Drive

Disk drives are organised in records of 512 bytes The DB (and the OS) I/O unit is a disk page (typically, 4,096 bytes long) Pages (and records) are stored on tracks Tracks make up a platter (or a disk) Platters make up a drive The same tracks across all platters make up a cylinder The disk head (arm) reads the same block of all tracks on all platters

Introduction Relational databases overview

A bit of perspective

The dimensions of the head are impressive^1. With a width of less than

a hundred nanometers and a thickness of about ten, it flies above the

platter at a speed of up to 15,000 RPM, at a height that is the

equivalent of 40 atoms. If you start multiplying these infinitesimally

small numbers, you begin to get an idea of their significance.

Consider this little comparison: if the read/write head were a Boeing

747 , and the hard-disk platter were the surface of the Earth

I (^) The head would fly at Mach 800 I (^) At less than one centimeter from the ground I (^) And count every blade of grass I (^) Making fewer than 10 unrecoverable counting errors in an area equivalent to all of Ireland

(^1) Source: Matthieu Lamelot, Tom’s Hardware.

Introduction Relational databases overview

Storing tuples

Every disk block contains

I (^) A header I (^) Data (i.e., tuples) I (^) Padding (maybe)

Two ways of storing tuples

I (^) Either interleave tuples of multiple relations, or I (^) Keep the tuples of the same relation clustered

Header Relation 1 Relation 2 Relation 3 Relation 2 Relation 3 Relation 1 Relation 2 Relation 3 Padding

Interleaved tuples

Header Relation 1

Relation 1 Padding

Relation 1 Relation 1 Relation 1 Relation 1 Relation 1 Relation 1 Relation 1

Clustered tuples

Introduction Relational databases overview

Advantages of clustering

Scan a relation of X tuples, Y

tuples per block

I (^) If unclustered, worst case scenario: read X blocks I (^) Clustered: read X /Y blocks

How about clustering disk

blocks?

I (^) Reduces unnecessary arm movement

Unclustered storage

Clustered storage

Introduction Relational databases overview

What does the buffer manager do?

When a page is requested it:

I (^) Checks to see if the page is in the buffer pool; if so it returns it I (^) If not, it checks whether there is room in the buffer pool; if so it reads it in and places it in the available room I (^) If not, it picks a page for replacement; if the page has been “touched” it writes the page to disk and replaces it I (^) In all three cases, it updates the reference count for the requested page I (^) If necessary, it pins the new page I (^) It returns a handle to the new page

Introduction Relational databases overview

Page replacement

Least recently used (LRU): check the number of references for each

page; replace a page from the group with the lowest count (usually

implemented with a priority queue)

I (^) Variant: clock replacement

First In First Out (FIFO)

Most recently used (MRU): the inverse of LRU

Random!

Storage and indexing Overview

Outline

Storage and indexing Overview

Indexing and sorting

Can be summarised as:

I (^) Forget whatever you’ve learned about indexing, searching and sorting in main memory (well, almost.. .)

Remember, we are operating over disk files

I (^) The main idea is to minimise disk I/O and not number of comparisons (i.e., complexity) I (^) Just an idea: comparing two values in memory costs 4. 91 · 10 −^8 seconds; Comparing two values on disk costs 18. 2 · 10 −^5 seconds ( orders of magnitude more expensive.)