Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Physical Privacy Management in Database Systems for Sensor Networks, Slides of Introduction to Database Management Systems

Duke University Introduction to Database Management Systems

The challenges of implementing privacy management in database systems for sensor networks. It introduces the concept of smartcards and their limitations, and proposes solutions to reexamine each component of dbms to address the problem. The document also covers sensor network overview, regular databases vs. Sensornets, constraints, opportunities, model-driven approach, query processing, optimization, and experimental results.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

1

DBMS on Small Scale

Devices

Based on the papers:

“PicoDBMS: Scaling down database techniques for the smartcard”

by Philippe Pucheral, Luc B ouganim, Patrick Valduriez, Christophe Bobineau

“Smart card embedded information systems: a methodology for

privacy oriented architectural design”

by C. Bolchini, F.A. Schreiber

By Amy Nathanson

Agenda

• Small device (smartcard) overview

• Problems with DBMS on smartcards

• Solutions

– PicoDBMS: a new database architecture

– A methodology for Privacy Management

• Summary of DBMS on smartcards

Overview of small devices

• Portable computing device

– Secure

– widely used

•Banking

• Healthcare

• Insurance

• Smartcards (example: credit card)

– Single, issuer-dependent application

• Moving to multi-application

• Merge many cards to one

Smartcards and DBMS

• Volume of data growing

• Complexity of queries increasing

• Privacy issues

•ACID

• Separate management code from data

code

• High security and availability

Problems of scaling down DBMS

for small devices

• Small size and low cost

– 96 kB ROM Æstores OS, fixed data,

standard routines

– 4 kB RAM Æfor the stack and calculations

– 128 kB EEPROM Æpersistent data

• VERY slow write time ( > 1ms/word)

Design Requirements for

DBMS

• Minimize data structure size

• Minimize RAM usage

• Minimize write operations

• Maximize fast read and direct access

capability of stable memory

• Don’t externalize private data and

minimize algorithm complexity for

security

• Enforce ACID

Discover Slides of Introduction to Database Management Systems Duke University

Partial preview of the text

Download Physical Privacy Management in Database Systems for Sensor Networks and more Slides Introduction to Database Management Systems in PDF only on Docsity!

DBMS on Small Scale

Devices

Based on the papers:

“PicoDBMS: Scaling down database techniques for the smartcard” by Philippe Pucheral, Luc Bouganim, Patrick Valduriez, Christophe Bobineau

“Smart card embedded information systems: a methodology for privacy oriented architectural design” by C. Bolchini, F.A. Schreiber

By Amy Nathanson

Agenda

**- Small device (smartcard) overview

Problems with DBMS on smartcards
Solutions** - PicoDBMS: a new database architecture **- A methodology for Privacy Management
Summary of DBMS on smartcards**

Overview of small devices

- Portable computing device - Secure - widely used - Banking - Healthcare **- Insurance

Smartcards (example: credit card)** - Single, issuer-dependent application - Moving to multi-application - Merge many cards to one

Smartcards and DBMS

**- Volume of data growing

Complexity of queries increasing
Privacy issues
ACID
Separate management code from data**

code

- High security and availability

Problems of scaling down DBMS

for small devices

- Small size and low cost - 96 kB ROM Æ stores OS, fixed data, standard routines - 4 kB RAM Æ for the stack and calculations - 128 kB EEPROM Æ persistent data - VERY slow write time ( > 1ms/word)

Design Requirements for

DBMS

**- Minimize data structure size

Minimize RAM usage
Minimize write operations
Maximize fast read and direct access**

capability of stable memory

- Don’t externalize private data and

minimize algorithm complexity for

security

- Enforce ACID

Storage model problem:

scale down data structures

- FS (flat storage) – tuples stored **sequentially with attributes imbedded

Space consuming and inefficient
DS (domain storage) – factoring out values** **into a domain table for data compactness
Factor out values and put in domain
Only use if:

data size > pointer size
and duplicates exist**

Storage model problem cont…

RS (ring storage) – index compactness

Value n

Value 2

Value 1

Index on S.a

Ring index on a regular attribute

Domain

Relation S

Ring index on a foreign-key attribute

Relation RR.a Relation SS.b

SOLUTION: Use a combination of FS, DS and RS

Query Processing problem:

use no RAM, no write and simple algorithms

- Select/Project/Join queries - Query Execution Plan (QEP) should be extreme one-side tree for no RAM usage - Implement pipelining using Iterator Model so no need for materialization - Project operators are pushed up (no materialization) - Ring index makes it time consuming to find values for attributes… done at end

Query Processing problem

cont…

- Aggregate/Sort/Duplicate Removal queries - Group incoming tuples by distinct values - Begin with group-by attribute and join with domain table - Pipelines aggregate/duplicate removal queries - Order is preserved in pipeline operators because they handle tuples in arrival order

• Solution: Use pipelining and enforce order

at tree leaves

Query Processing Example

Query = Number of prescriptions per

type of drug.

count

prescription

drug

drug.type Drug(DrugId, name, type, …) Prescription(VisId, DrugId, qty, …)

Transaction Management

problem: enforce ACID

• A tomicity: commit or rollback persistency

- Local: enforced by write-ahead logging (WAL) **- Problem: cost is higher in small DBMS

Global: enforced by ACP**

• C onsistency: tradition form used because

all integrity constraints satisfied

• I solation: not an issue because single-user

• D urability: committed updates never lost

2. Physical Privacy Management

- Each view of database is determined by user to **enforce data protection

Common format system table for a smartcard** - Object description table—tables and views stored - User description table—users and their access levels - Privilege description table—database privileges and **views for users
SCQL dictionary is a view on the system table to** access information - Provide distinct SCQL dictionaries for each user - View definition stored in data folders

Summary

- Smartcards are emerging as a multipurpose **technology

Need for DBMS that will fit on a small, flexible** **card is increasing
Limitations destroy foundations DBMS have** **been built on
Must reexamine each component of DBMS to** solve problem

Database Systems for Sensor

Networks

Cem Goncu

December 2, 2004

Outline

Sensor network overview Regular databases vs. sensornets Constraints Opportunities Model-based approach Comparative systems Snapshot queries Conclusion and suggestions for future work

Sensor Network Overview

Tiny devices embedded in the physical

world

Battery powered microprocessors

Combine sensing, computation and

communication

Monitor environment for interesting events

Acquire and transmit data at specified

intervals

Sensornets

Distributed data acquisition with multiple

sensors (nodes)

Could consist of any number of nodes (N)

For large N, no concern for reliability of a

single sensor

Wireless communication between nodes

Requires position detection, fault tolerance,

aggregation, etc.

Sensornet Applications

Habitat/environmental monitoring

Temperature, light, humidity Voltage, radiation

Military surveillance & reconnaissance

Traffic

Movement, velocity, acceleration Vehicle tracking

Regular databases vs. Sensornets

Regular DBMS process information about a stored collection of data (complete) Sensornets work with real-time information about the environment The set of relevant data is continuous both in time and space (infinite)! Impossible to gather all relevant data acquire samples of physical phenomena at discreet points in time and space Provide approximate answers with a degree of uncertainty

Definition of Cost

Let O = {o 1 , o 2 , …., o n } be a set of n

observations

C(O) = ∑C(o i )

The system cost of an observation is the

sum of acquisition and transmission costs:

C(o (^) i ) = Ca(oi ) + Ct(o (^) i )

C(O) = C a (O) + C t (O)

Data acquisition cost Ca

Sum of energy required to observe attributes O Ca (O) = Σi ∈ OCa (i) Observations of different variables require different amounts of energy per sample:

Voltage 0.

Humidity and temperature0.

Barometric pressure 0.

Solar radiation.

Sensor Energy per sample (@3V), in mJ

Data transmission cost Ct

Ct (O) = ∑ Ct (o (^) i) Communication cost required to download the data expect transmission cost to be proportional to the number of nodes used: C (^) t = kN Depends on data collection mechanism used to collect observations from network (TinyDB, approximate caching) Depends on network topology If topology is unknown or changing, cost function is basically random Therefore, assume networks with known topologies

Definition of Benefit

Let O = {o 1 , o 2 , …., o (^) n } be a set of n observations Ri ( o ): benefit to the accuracy of a reading X (^) i given the set of observation values o For value and average queries: Xi = x (^) i

Ri ( o ) = P(Xi ∈ [x i -e, x i +e] | o )

For range queries: X i ∈ [a i ,b i ]

Ri ( o ) = max[P(Xi ∈ [a (^) i , b (^) i ] | o ), 1- P(Xi ∈ [a (^) i , b (^) i ] | o )]

Expected benefit

Specific value o of O is not known a priori

Must compute expected benefit Ri (O) Ri (O) = ∫ p( o ) Ri ( o ) d o For a set of queried readings Q define the average benefit as R( o ) = 1/|Q| Σi ∈ QRi ( o ) Use average benefit to decide when to stop observing new attributes

Choosing an observation plan

Problem: Given an error bound e and

confidence level 1-d, pick the set of

observations O s from O to

Minimize C(Os) such that R(O (^) s) ≥ 1-d

Solutions:

Option 1 - exhaustive search

Option 2 - greedy algorithm

Exhaustive search

Exhaustively search over all possible

subsets of possible observations, O

Finds the optimal subset O (^) s with minimum cost C(O (^) s) Exponential running time

Greedy algorithm

Start with an empty set of observations, O = ø For each observation o (^) i that is not in our set O Compute the new expected benefit R(O ∪ oi) and expected cost C(O ∪ oi ) If a subset of observations G reach the desired confidence such that R(O ∪ og) ≥ 1-d for every og ∈ G Pick og with the lowest cost C(o (^) g), and terminate search Else if G = ø, simply keep on adding oi with the highest benefit over cost ratio to the existing set O until R(O) ≥ 1-d

A simple example

Query: SELECT nodeId, temp +-.1ºC, conf(.95) W H E R E nodeID in (1..8)

Observation plan: {[Voltage,1], [Voltage, 2], [temp, 4]}

Data: {[V1=2.73], [V2=2.65], [T4=22.1]}

Results: {[22.5, 97%], [25.6, 99%], [24.4, 98%], [22.1, 100%], ….}

Review of alternative approaches

1. TinyDB-style Querying

2. Approximate Caching

TinyDB

Query disseminated into the sensor network

using a tree structure

At each mote, sensor reading is observed

Results reported back along the same tree to

the base station

Combine results on the way back to minimize communication costs

BBQ - Performance

Query: requires system to report

temperatures at all motes to within specified

error bound

Confidence 95%, with varying e Different values of e lead to varying cost of observation C(O)

Results

Varied e from between 0 and 1 degrees C The cost of BBQ falls rapidly as e increases The percentage of errors stays well below the specified confidence threshold of 5%

Comparison

TinyDB:

Makes no mistakes Cost remains constant for all e

Approximate Caching:

Always reports values to within e Makes no mistakes Average observation error close to that of BBQ

Comparison (cont.)

BBQ:

Succeeds to report observations within the given error bound at least 95% of the time For reasonable values of epsilon, uses significantly less communication More efficient use of time and energy

BBQ - Cost Efficiency

Percentage of sensors that BBQ observes by

hour

Varying e

As e gets small (<0.1), must observe all nodes on every query Variance between nodes high enough that it cannot infer value of one sensor from another’s with any accuracy As e gets large (>1), few observations are needed Changes in one sensor predict values of others Intermediate e More observations are needed, especially during times when readings change drastically

Decreasing confidence intervals or epsilon

reduces energy per query

Confidence 95% Errors 0. Reduce expected energy cost from 5.4 J to 150 mJ per query Factor of 40 reduction

Another approach

Snapshot Queries: (Kotidis)

Data-driven approach in which a node can

represent another node in a query when

their collected measurements are similar

Algorithm for nodes to elect a local

representative

Determine a threshold value T such that

d(actual, estimate) ≤ T

Idea: expect a lot of correlations among the collected measurements of neighboring nodes Goal: Use only a subset of nodes (a representative from each neighborhood) to create a “snapshot” of the whole system

Answer certain queries (snapshot queries) without using the other nodes to save time and energy

Reduction of up to 90% in the number of nodes that need to participate in a snapshot query

Local algorithm for picking up representatives: N (^) i can represent Nj if d(xj ,x (^) ij ) < T where x (^) j is the actual reading of node j, and x (^) ij is Ni ’s estimate of x (^) j The “snapshot” is not static, but changes over time: Ni may fail (Nj requests a new representative) Due to the dynamic nature of the environment, d(x (^) j,x (^) ij) might get bigger than the threshold value Ideally, we would like to have a rotating set of representatives so that energy resources are drained uniformly (larger lifespan for an average node)

Physical Privacy Management in Database Systems for Sensor Networks, Slides of Introduction to Database Management Systems

Related documents

Partial preview of the text

Download Physical Privacy Management in Database Systems for Sensor Networks and more Slides Introduction to Database Management Systems in PDF only on Docsity!

DBMS on Small Scale

Devices

Agenda

Overview of small devices

Smartcards and DBMS

code

Problems of scaling down DBMS

for small devices

Design Requirements for

DBMS

capability of stable memory

minimize algorithm complexity for

security

Storage model problem:

scale down data structures

Storage model problem cont…

RS (ring storage) – index compactness

Query Processing problem:

use no RAM, no write and simple algorithms

Query Processing problem

cont…

• Solution: Use pipelining and enforce order

Query Processing Example

Query = Number of prescriptions per

type of drug.

Transaction Management

problem: enforce ACID

• A tomicity: commit or rollback persistency

• C onsistency: tradition form used because

• I solation: not an issue because single-user

• D urability: committed updates never lost

2. Physical Privacy Management

Summary

Database Systems for Sensor

Networks

Cem Goncu

December 2, 2004

Outline

Sensor Network Overview

 Tiny devices embedded in the physical

world

 Battery powered microprocessors

 Combine sensing, computation and

communication

 Monitor environment for interesting events

 Acquire and transmit data at specified

intervals

Sensornets

 Distributed data acquisition with multiple

sensors (nodes)

 Could consist of any number of nodes (N)

 For large N, no concern for reliability of a

single sensor

 Wireless communication between nodes

 Requires position detection, fault tolerance,

aggregation, etc.

Sensornet Applications

 Habitat/environmental monitoring

 Military surveillance & reconnaissance

 Traffic

Regular databases vs. Sensornets

Definition of Cost

 Let O = {o 1 , o 2 , …., o n } be a set of n

observations

 C(O) = ∑C(o i )

 The system cost of an observation is the

sum of acquisition and transmission costs:

 C(O) = C a (O) + C t (O)

Data acquisition cost Ca

Data transmission cost Ct

Definition of Benefit

 Ri ( o ) = P(Xi ∈ [x i -e, x i +e] | o )

 For range queries: X i ∈ [a i ,b i ]

Expected benefit

Choosing an observation plan

 Problem: Given an error bound e and

Tiny devices embedded in the physical

Battery powered microprocessors

Combine sensing, computation and

Monitor environment for interesting events

Acquire and transmit data at specified

Distributed data acquisition with multiple

Could consist of any number of nodes (N)

For large N, no concern for reliability of a

Wireless communication between nodes

Requires position detection, fault tolerance,

Habitat/environmental monitoring

Military surveillance & reconnaissance

Traffic

Let O = {o 1 , o 2 , …., o n } be a set of n

C(O) = ∑C(o i )

The system cost of an observation is the

C(O) = C a (O) + C t (O)

Ri ( o ) = P(Xi ∈ [x i -e, x i +e] | o )

For range queries: X i ∈ [a i ,b i ]

Problem: Given an error bound e and

Solutions:

Option 1 - exhaustive search

Option 2 - greedy algorithm

Exhaustively search over all possible

Query disseminated into the sensor network

At each mote, sensor reading is observed

Results reported back along the same tree to

Query: requires system to report

TinyDB:

Approximate Caching:

BBQ:

Percentage of sensors that BBQ observes by

Decreasing confidence intervals or epsilon

Snapshot Queries: (Kotidis)

Data-driven approach in which a node can

Algorithm for nodes to elect a local

Determine a threshold value T such that

BBQ: a global model to capture

Snapshot: capture localized correlations in

Snapshot more successful in networks

General idea: tolerate a certain amount of

Exploit spatiotemporal correlations among

Use only a subset of nodes to gather