Monitoring Streams - Advanced Database System - Lecture Slides, Slides of Database Management Systems (DBMS)

Some concept of Advanced Database System are Types Supported, Simple Data Model, Concurrency Control Two, Continuously Adaptive, Cost-Based Optimization, Data Access From Disks, Data Warehousing. Main points of this lecture are: Monitoring Streams, First Aurora, Borealis, Practical System, Designed For Scalablility, Stream Storage Management, Realiability/, Fault Tolerance, Distribution and Adaptivity, First Stream Startup

Typology: Slides

2012/2013

Uploaded on 04/27/2013

dhanapati
dhanapati 🇮🇳

4.1

(24)

123 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Monitoring Streams -- A New Class of
Data Management Applications
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Monitoring Streams - Advanced Database System - Lecture Slides and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Monitoring Streams -- A New Class of

Data Management Applications

Background

  • MIT/Brown/Brandeis team
  • First Aurora, then Borealis
    • Practical system
    • Designed for Scalablility : 10

6

stream inputs, queries

  • QoS-Driven Resource Management
  • Stream Storage Management
  • Realiability/ Fault Tolerance
  • Distribution and Adaptivity
  • First stream startup: StreamBase
  • Financial applications

Not Your Average DBMS

  1. External, Autonomous Data Sources
  2. Querying Time-Series
  3. Triggers-in-the-large
  4. Real-time response requirements
  5. Noisy Data, Approximate Query Results

Outline

  1. Aurora Overview/ Query Model
  2. Runtime Operation
  3. Adaptivity

Aurora from 100 Feet

App

QoS

App

QoS

App

QoS

Queries = Workflow (Boxes and Arcs)

  • Workflow Diagram = “Aurora Network”
  • Boxes = Query Operators
  • Arcs = Streams

Slide

Tumble

Streams (Arcs)

  • stream: tuple sequence from common source

(e.g., sensor)

  • tuples timestamped on arrival (Internal use: QoS)

Query Operators (Boxes)

  • Simple: FILTER, MAP, RESTREAM
  • Binary: UNION, JOIN, RESAMPLE
  • Windowed: TUMBLE, SLIDE, XSECTION, WSORT

Aurora in Action

App

QoS

App

QoS

App

QoS

Slide

Tumble

σσσσ μμ^ App

TumbleTumble App

“Box-at-a-time” Scheduling

Arcs → Tuple Queues

Outputs Monitored for QoS

Quality-of-Service (QoS)

Specifies “Utility” Of Imperfect Query Results

Delay-Based (specify utility of late results)

Delivery-Based, Value-Based (specify utility of partial results)

QoS Influences…

Scheduling, Storage Management, Load Shedding

QoS

Output value

1

0

Output Value

QoS

% messages delivered

1

0

100 0 % Tuples Delivered

B

QoS

delay

1

0

good zone

Delay

A C

Talk Outline

  1. Introduction
  2. Aurora Overview
    1. Runtime Operation
  3. Adaptivity
  4. Related Work and Conclusions

Runtime Operation

Scheduling: Maximize Overall QoS

Choice 1:

A: Cost: 1 sec (…, age: 1 sec)

B: Cost: 2 sec (…, age: 3 sec)

Delay = 2 sec

Utility = 0.

Delay = 5 sec

Utility = 0.

Schedule Box A now rather than later

Ideal: Maximize Overall Utility

Presently exploring scalable heuristics (e.g., feedback-based)

Choice 2:

Runtime Operation

Scheduling: Minimizing Per Tuple Processing Overhead

Train Scheduling:

A B

… z yx A (z)^ A (y) A (x) B (A (z)) B (A (y)) B (A (x))

Default Operation: = Context Switch

AB

… z yx B (A (z)) B (A (y)) B (A (x))

Box Trains:

A B

… z yx A (z, y, x) B (A (z), A (y), A (x))

Tuple Trains:

Talk Outline

  1. Introduction
  2. Aurora Overview
  3. Runtime Operation
    1. Adaptivity
  4. Related Work and Conclusions

Stream Query Optimization

  • Differences with Traditional

Query Optimization?

Query Optimization

Compile-time, Global Optimization Infeasible

Too Many Boxes

Too Much Volatility in Network, Data

Dynamic, Local Optimization

Threshold re when to optimize

Motivation of ‘Query Migration’

  • Continuous query over streams
    • Statistics unknown before start
    • Statistics changing during execution
      • Stream rates, arrival pattern, distribution, etc
  • Need for dynamic adaptation
    • Plan re-optimization
      • Change the shape of query plan tree