Physical Privacy Management in Database Systems for Sensor Networks, Slides of Introduction to Database Management Systems

The challenges of implementing privacy management in database systems for sensor networks. It introduces the concept of smartcards and their limitations, and proposes solutions to reexamine each component of dbms to address the problem. The document also covers sensor network overview, regular databases vs. Sensornets, constraints, opportunities, model-driven approach, query processing, optimization, and experimental results.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
DBMS on Small Scale
Devices
Based on the papers:
“PicoDBMS: Scaling down database techniques for the smartcard”
by Philippe Pucheral, Luc B ouganim, Patrick Valduriez, Christophe Bobineau
“Smart card embedded information systems: a methodology for
privacy oriented architectural design”
by C. Bolchini, F.A. Schreiber
By Amy Nathanson
Agenda
Small device (smartcard) overview
Problems with DBMS on smartcards
Solutions
PicoDBMS: a new database architecture
A methodology for Privacy Management
Summary of DBMS on smartcards
Overview of small devices
Portable computing device
Secure
widely used
•Banking
Healthcare
Insurance
Smartcards (example: credit card)
Single, issuer-dependent application
Moving to multi-application
Merge many cards to one
Smartcards and DBMS
Volume of data growing
Complexity of queries increasing
Privacy issues
•ACID
Separate management code from data
code
High security and availability
Problems of scaling down DBMS
for small devices
Small size and low cost
96 kB ROM Æstores OS, fixed data,
standard routines
4 kB RAM Æfor the stack and calculations
128 kB EEPROM Æpersistent data
VERY slow write time ( > 1ms/word)
Design Requirements for
DBMS
Minimize data structure size
Minimize RAM usage
Minimize write operations
Maximize fast read and direct access
capability of stable memory
Don’t externalize private data and
minimize algorithm complexity for
security
Enforce ACID
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Physical Privacy Management in Database Systems for Sensor Networks and more Slides Introduction to Database Management Systems in PDF only on Docsity!

DBMS on Small Scale

Devices

Based on the papers:

“PicoDBMS: Scaling down database techniques for the smartcard” by Philippe Pucheral, Luc Bouganim, Patrick Valduriez, Christophe Bobineau

“Smart card embedded information systems: a methodology for privacy oriented architectural design” by C. Bolchini, F.A. Schreiber

By Amy Nathanson

Agenda

**- Small device (smartcard) overview

  • Problems with DBMS on smartcards
  • Solutions** - PicoDBMS: a new database architecture **- A methodology for Privacy Management
  • Summary of DBMS on smartcards**

Overview of small devices

- Portable computing device - Secure - widely used - Banking - Healthcare **- Insurance

  • Smartcards (example: credit card)** - Single, issuer-dependent application - Moving to multi-application - Merge many cards to one

Smartcards and DBMS

**- Volume of data growing

  • Complexity of queries increasing
  • Privacy issues
  • ACID
  • Separate management code from data**

code

- High security and availability

Problems of scaling down DBMS

for small devices

- Small size and low cost - 96 kB ROM Æ stores OS, fixed data, standard routines - 4 kB RAM Æ for the stack and calculations - 128 kB EEPROM Æ persistent data - VERY slow write time ( > 1ms/word)

Design Requirements for

DBMS

**- Minimize data structure size

  • Minimize RAM usage
  • Minimize write operations
  • Maximize fast read and direct access**

capability of stable memory

- Don’t externalize private data and

minimize algorithm complexity for

security

- Enforce ACID

Storage model problem:

scale down data structures

- FS (flat storage) – tuples stored **sequentially with attributes imbedded

  • Space consuming and inefficient
  • DS (domain storage) – factoring out values** **into a domain table for data compactness
  • Factor out values and put in domain
  • Only use if:
  1. data size > pointer size
  2. and duplicates exist**

Storage model problem cont…

RS (ring storage) – index compactness

Value n

Value 2

Value 1

Index on S.a

Ring index on a regular attribute

Domain

Relation S

Ring index on a foreign-key attribute

Relation RR.a Relation SS.b

SOLUTION: Use a combination of FS, DS and RS

Query Processing problem:

use no RAM, no write and simple algorithms

- Select/Project/Join queries - Query Execution Plan (QEP) should be extreme one-side tree for no RAM usage - Implement pipelining using Iterator Model so no need for materialization - Project operators are pushed up (no materialization) - Ring index makes it time consuming to find values for attributes… done at end

Query Processing problem

cont…

- Aggregate/Sort/Duplicate Removal queries - Group incoming tuples by distinct values - Begin with group-by attribute and join with domain table - Pipelines aggregate/duplicate removal queries - Order is preserved in pipeline operators because they handle tuples in arrival order

• Solution: Use pipelining and enforce order

at tree leaves

Query Processing Example

Query = Number of prescriptions per

type of drug.

count

prescription

drug

drug.type Drug(DrugId, name, type, …) Prescription(VisId, DrugId, qty, …)

Transaction Management

problem: enforce ACID

• A tomicity: commit or rollback persistency

- Local: enforced by write-ahead logging (WAL) **- Problem: cost is higher in small DBMS

  • Global: enforced by ACP**

• C onsistency: tradition form used because

all integrity constraints satisfied

• I solation: not an issue because single-user

• D urability: committed updates never lost

2. Physical Privacy Management

- Each view of database is determined by user to **enforce data protection

  • Common format system table for a smartcard** - Object description table—tables and views stored - User description table—users and their access levels - Privilege description table—database privileges and **views for users
  • SCQL dictionary is a view on the system table to** access information - Provide distinct SCQL dictionaries for each user - View definition stored in data folders

Summary

- Smartcards are emerging as a multipurpose **technology

  • Need for DBMS that will fit on a small, flexible** **card is increasing
  • Limitations destroy foundations DBMS have** **been built on
  • Must reexamine each component of DBMS to** solve problem

Database Systems for Sensor

Networks

Cem Goncu

December 2, 2004

Outline

ƒ Sensor network overview ƒ Regular databases vs. sensornets ƒ Constraints ƒ Opportunities ƒ Model-based approach ƒ Comparative systems ƒ Snapshot queries ƒ Conclusion and suggestions for future work

Sensor Network Overview

ƒ Tiny devices embedded in the physical

world

ƒ Battery powered microprocessors

ƒ Combine sensing, computation and

communication

ƒ Monitor environment for interesting events

ƒ Acquire and transmit data at specified

intervals

Sensornets

ƒ Distributed data acquisition with multiple

sensors (nodes)

ƒ Could consist of any number of nodes (N)

ƒ For large N, no concern for reliability of a

single sensor

ƒ Wireless communication between nodes

ƒ Requires position detection, fault tolerance,

aggregation, etc.

Sensornet Applications

ƒ Habitat/environmental monitoring

ƒ Temperature, light, humidity ƒ Voltage, radiation

ƒ Military surveillance & reconnaissance

ƒ Traffic

ƒ Movement, velocity, acceleration ƒ Vehicle tracking

Regular databases vs. Sensornets

ƒ Regular DBMS process information about a stored collection of data (complete) ƒ Sensornets work with real-time information about the environment ƒ The set of relevant data is continuous both in time and space (infinite)! ƒ Impossible to gather all relevant data ƒ acquire samples of physical phenomena at discreet points in time and space ƒ Provide approximate answers with a degree of uncertainty

Definition of Cost

ƒ Let O = {o 1 , o 2 , …., o n } be a set of n

observations

ƒ C(O) = ∑C(o i )

ƒ The system cost of an observation is the

sum of acquisition and transmission costs:

ƒ C(o (^) i ) = Ca(oi ) + Ct(o (^) i )

ƒ C(O) = C a (O) + C t (O)

Data acquisition cost Ca

ƒ Sum of energy required to observe attributes O ƒ Ca (O) = Σi ∈ OCa (i) ƒ Observations of different variables require different amounts of energy per sample:

Voltage 0.

Humidity and temperature0.

Barometric pressure 0.

Solar radiation.

Sensor Energy per sample (@3V), in mJ

Data transmission cost Ct

ƒ Ct (O) = ∑ Ct (o (^) i) ƒ Communication cost required to download the data ƒ expect transmission cost to be proportional to the number of nodes used: C (^) t = kN ƒ Depends on data collection mechanism used to collect observations from network (TinyDB, approximate caching) ƒ Depends on network topology ƒ If topology is unknown or changing, cost function is basically random ƒ Therefore, assume networks with known topologies

Definition of Benefit

ƒ Let O = {o 1 , o 2 , …., o (^) n } be a set of n observations ƒ Ri ( o ): benefit to the accuracy of a reading X (^) i given the set of observation values o ƒ For value and average queries: Xi = x (^) i

ƒ Ri ( o ) = P(Xi ∈ [x i -e, x i +e] | o )

ƒ For range queries: X i ∈ [a i ,b i ]

ƒ Ri ( o ) = max[P(Xi ∈ [a (^) i , b (^) i ] | o ), 1- P(Xi ∈ [a (^) i , b (^) i ] | o )]

Expected benefit

ƒ Specific value o of O is not known a priori

ƒ Must compute expected benefit Ri (O) ƒ Ri (O) = ∫ p( o ) Ri ( o ) d o ƒ For a set of queried readings Q define the average benefit as ƒ R( o ) = 1/|Q| Σi ∈ QRi ( o ) ƒ Use average benefit to decide when to stop observing new attributes

Choosing an observation plan

ƒ Problem: Given an error bound e and

confidence level 1-d, pick the set of

observations O s from O to

ƒ Minimize C(Os) such that R(O (^) s) ≥ 1-d

ƒ Solutions:

ƒ Option 1 - exhaustive search

ƒ Option 2 - greedy algorithm

Exhaustive search

ƒ Exhaustively search over all possible

subsets of possible observations, O

ƒ Finds the optimal subset O (^) s with minimum cost C(O (^) s) ƒ Exponential running time

Greedy algorithm

ƒ Start with an empty set of observations, O = ø ƒ For each observation o (^) i that is not in our set O ƒ Compute the new expected benefit R(O ∪ oi) and expected cost C(O ∪ oi ) ƒ If a subset of observations G reach the desired confidence such that R(O ∪ og) ≥ 1-d for every og ∈ G ƒ Pick og with the lowest cost C(o (^) g), and terminate search ƒ Else if G = ø, simply keep on adding oi with the highest benefit over cost ratio to the existing set O until R(O) ≥ 1-d

A simple example

ƒ Query: SELECT nodeId, temp +-.1ºC, conf(.95) W H E R E nodeID in (1..8)

ƒ Observation plan: ƒ {[Voltage,1], [Voltage, 2], [temp, 4]}

ƒ Data: {[V1=2.73], [V2=2.65], [T4=22.1]}

ƒ Results: ƒ {[22.5, 97%], [25.6, 99%], [24.4, 98%], [22.1, 100%], ….}

Review of alternative approaches

1. TinyDB-style Querying

2. Approximate Caching

TinyDB

ƒ Query disseminated into the sensor network

using a tree structure

ƒ At each mote, sensor reading is observed

ƒ Results reported back along the same tree to

the base station

ƒ Combine results on the way back to minimize communication costs

BBQ - Performance

ƒ Query: requires system to report

temperatures at all motes to within specified

error bound

ƒ Confidence 95%, with varying e ƒ Different values of e lead to varying cost of observation C(O)

Results

ƒ Varied e from between 0 and 1 degrees C ƒ The cost of BBQ falls rapidly as e increases ƒ The percentage of errors stays well below the specified confidence threshold of 5%

Comparison

ƒ TinyDB:

ƒ Makes no mistakes ƒ Cost remains constant for all e

ƒ Approximate Caching:

ƒ Always reports values to within e ƒ Makes no mistakes ƒ Average observation error close to that of BBQ

Comparison (cont.)

ƒ BBQ:

ƒ Succeeds to report observations within the given error bound at least 95% of the time ƒ For reasonable values of epsilon, uses significantly less communication ƒ More efficient use of time and energy

BBQ - Cost Efficiency

ƒ Percentage of sensors that BBQ observes by

hour

ƒ Varying e

ƒ As e gets small (<0.1), must observe all nodes on every query ƒ Variance between nodes high enough that it cannot infer value of one sensor from another’s with any accuracy ƒ As e gets large (>1), few observations are needed ƒ Changes in one sensor predict values of others ƒ Intermediate e ƒ More observations are needed, especially during times when readings change drastically

ƒ Decreasing confidence intervals or epsilon

reduces energy per query

ƒConfidence 95% ƒErrors 0. ƒReduce expected energy cost from 5.4 J to 150 mJ per query ƒFactor of 40 reduction

Another approach

ƒ Snapshot Queries: (Kotidis)

ƒ Data-driven approach in which a node can

represent another node in a query when

their collected measurements are similar

ƒ Algorithm for nodes to elect a local

representative

ƒ Determine a threshold value T such that

ƒ d(actual, estimate) ≤ T

ƒ Idea: expect a lot of correlations among the collected measurements of neighboring nodes ƒ Goal: Use only a subset of nodes (a representative from each neighborhood) to create a “snapshot” of the whole system

ƒ Answer certain queries (snapshot queries) without using the other nodes to save time and energy

ƒ Reduction of up to 90% in the number of nodes that need to participate in a snapshot query

ƒ Local algorithm for picking up representatives: ƒ N (^) i can represent Nj if d(xj ,x (^) ij ) < T ƒ where x (^) j is the actual reading of node j, and ƒ x (^) ij is Ni ’s estimate of x (^) j ƒ The “snapshot” is not static, but changes over time: ƒ Ni may fail (Nj requests a new representative) ƒ Due to the dynamic nature of the environment, d(x (^) j,x (^) ij) might get bigger than the threshold value ƒ Ideally, we would like to have a rotating set of representatives so that energy resources are drained uniformly (larger lifespan for an average node)

Snapshot vs. BBQ

ƒ BBQ: a global model to capture

dependencies assuming a relatively stable

network topology

ƒ Snapshot: capture localized correlations in

highly dynamic networks

ƒ Snapshot more successful in networks

consisting of a large number of nodes

N>1000?

Conclusions

ƒ General idea: tolerate a certain amount of

uncertainty in return for crucial time and

energy savings

ƒ Exploit spatiotemporal correlations among

individual nodes to enable better estimates

ƒ Use only a subset of nodes to gather

information about the whole system