






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The challenges of implementing privacy management in database systems for sensor networks. It introduces the concept of smartcards and their limitations, and proposes solutions to reexamine each component of dbms to address the problem. The document also covers sensor network overview, regular databases vs. Sensornets, constraints, opportunities, model-driven approach, query processing, optimization, and experimental results.
Typology: Slides
1 / 12
This page cannot be seen from the preview
Don't miss anything!







Based on the papers:
“PicoDBMS: Scaling down database techniques for the smartcard” by Philippe Pucheral, Luc Bouganim, Patrick Valduriez, Christophe Bobineau
“Smart card embedded information systems: a methodology for privacy oriented architectural design” by C. Bolchini, F.A. Schreiber
By Amy Nathanson
**- Small device (smartcard) overview
- Portable computing device - Secure - widely used - Banking - Healthcare **- Insurance
**- Volume of data growing
- High security and availability
- Small size and low cost - 96 kB ROM Æ stores OS, fixed data, standard routines - 4 kB RAM Æ for the stack and calculations - 128 kB EEPROM Æ persistent data - VERY slow write time ( > 1ms/word)
**- Minimize data structure size
- Don’t externalize private data and
- Enforce ACID
- FS (flat storage) – tuples stored **sequentially with attributes imbedded
Value n
Value 2
Value 1
Index on S.a
Ring index on a regular attribute
Domain
Relation S
Ring index on a foreign-key attribute
Relation RR.a Relation SS.b
SOLUTION: Use a combination of FS, DS and RS
- Select/Project/Join queries - Query Execution Plan (QEP) should be extreme one-side tree for no RAM usage - Implement pipelining using Iterator Model so no need for materialization - Project operators are pushed up (no materialization) - Ring index makes it time consuming to find values for attributes… done at end
- Aggregate/Sort/Duplicate Removal queries - Group incoming tuples by distinct values - Begin with group-by attribute and join with domain table - Pipelines aggregate/duplicate removal queries - Order is preserved in pipeline operators because they handle tuples in arrival order
at tree leaves
count
prescription
drug
drug.type Drug(DrugId, name, type, …) Prescription(VisId, DrugId, qty, …)
- Local: enforced by write-ahead logging (WAL) **- Problem: cost is higher in small DBMS
all integrity constraints satisfied
- Each view of database is determined by user to **enforce data protection
- Smartcards are emerging as a multipurpose **technology
Sensor network overview Regular databases vs. sensornets Constraints Opportunities Model-based approach Comparative systems Snapshot queries Conclusion and suggestions for future work
Temperature, light, humidity Voltage, radiation
Movement, velocity, acceleration Vehicle tracking
Regular DBMS process information about a stored collection of data (complete) Sensornets work with real-time information about the environment The set of relevant data is continuous both in time and space (infinite)! Impossible to gather all relevant data acquire samples of physical phenomena at discreet points in time and space Provide approximate answers with a degree of uncertainty
C(o (^) i ) = Ca(oi ) + Ct(o (^) i )
Sum of energy required to observe attributes O Ca (O) = Σi ∈ OCa (i) Observations of different variables require different amounts of energy per sample:
Voltage 0.
Humidity and temperature0.
Barometric pressure 0.
Solar radiation.
Sensor Energy per sample (@3V), in mJ
Ct (O) = ∑ Ct (o (^) i) Communication cost required to download the data expect transmission cost to be proportional to the number of nodes used: C (^) t = kN Depends on data collection mechanism used to collect observations from network (TinyDB, approximate caching) Depends on network topology If topology is unknown or changing, cost function is basically random Therefore, assume networks with known topologies
Let O = {o 1 , o 2 , …., o (^) n } be a set of n observations Ri ( o ): benefit to the accuracy of a reading X (^) i given the set of observation values o For value and average queries: Xi = x (^) i
Ri ( o ) = max[P(Xi ∈ [a (^) i , b (^) i ] | o ), 1- P(Xi ∈ [a (^) i , b (^) i ] | o )]
Specific value o of O is not known a priori
Must compute expected benefit Ri (O) Ri (O) = ∫ p( o ) Ri ( o ) d o For a set of queried readings Q define the average benefit as R( o ) = 1/|Q| Σi ∈ QRi ( o ) Use average benefit to decide when to stop observing new attributes
Minimize C(Os) such that R(O (^) s) ≥ 1-d
Finds the optimal subset O (^) s with minimum cost C(O (^) s) Exponential running time
Start with an empty set of observations, O = ø For each observation o (^) i that is not in our set O Compute the new expected benefit R(O ∪ oi) and expected cost C(O ∪ oi ) If a subset of observations G reach the desired confidence such that R(O ∪ og) ≥ 1-d for every og ∈ G Pick og with the lowest cost C(o (^) g), and terminate search Else if G = ø, simply keep on adding oi with the highest benefit over cost ratio to the existing set O until R(O) ≥ 1-d
Query: SELECT nodeId, temp +-.1ºC, conf(.95) W H E R E nodeID in (1..8)
Observation plan: {[Voltage,1], [Voltage, 2], [temp, 4]}
Data: {[V1=2.73], [V2=2.65], [T4=22.1]}
Results: {[22.5, 97%], [25.6, 99%], [24.4, 98%], [22.1, 100%], ….}
Combine results on the way back to minimize communication costs
Confidence 95%, with varying e Different values of e lead to varying cost of observation C(O)
Varied e from between 0 and 1 degrees C The cost of BBQ falls rapidly as e increases The percentage of errors stays well below the specified confidence threshold of 5%
Makes no mistakes Cost remains constant for all e
Always reports values to within e Makes no mistakes Average observation error close to that of BBQ
Succeeds to report observations within the given error bound at least 95% of the time For reasonable values of epsilon, uses significantly less communication More efficient use of time and energy
Varying e
As e gets small (<0.1), must observe all nodes on every query Variance between nodes high enough that it cannot infer value of one sensor from another’s with any accuracy As e gets large (>1), few observations are needed Changes in one sensor predict values of others Intermediate e More observations are needed, especially during times when readings change drastically
Confidence 95% Errors 0. Reduce expected energy cost from 5.4 J to 150 mJ per query Factor of 40 reduction
d(actual, estimate) ≤ T
Idea: expect a lot of correlations among the collected measurements of neighboring nodes Goal: Use only a subset of nodes (a representative from each neighborhood) to create a “snapshot” of the whole system
Answer certain queries (snapshot queries) without using the other nodes to save time and energy
Reduction of up to 90% in the number of nodes that need to participate in a snapshot query
Local algorithm for picking up representatives: N (^) i can represent Nj if d(xj ,x (^) ij ) < T where x (^) j is the actual reading of node j, and x (^) ij is Ni ’s estimate of x (^) j The “snapshot” is not static, but changes over time: Ni may fail (Nj requests a new representative) Due to the dynamic nature of the environment, d(x (^) j,x (^) ij) might get bigger than the threshold value Ideally, we would like to have a rotating set of representatives so that energy resources are drained uniformly (larger lifespan for an average node)