

















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of punctuations in data stream processing and their impact on query processing. It covers the reasons why punctuations are important, their sources, and how they can help unblock group-by and join operators. The document also explores the concept of punctuated streams in haskell and the behavior of stream iterators.
Typology: Study notes
1 / 25
This page cannot be seen from the preview
Don't miss anything!


















h h k
10/10/2007 Data Streams: Lecture 6 1
with thanks to
10/10/2007 Data Streams: Lecture 6 2
Example
Person(p_id, name, email, city, state) Auction(a_id, expires, seller category)
10/10/2007 Data Streams: Lecture 6 3
seller, category) Bid(a_id, bidder, hour, minute, second, price) (^) BidBidBid
BidBid
Example Query
σ
10/10/2007 Data Streams: Lecture 6 4
σ
Generalizing End of Input (EOI)
EOI tells an operator that no more data
will arrive from a given inputwill arrive from a given input
Blocking operators can output results
Stateful operators can purge state
Idea – items in the stream denoting the
end of data subsets might improve blocking
10/10/2007 Data Streams: Lecture 6 7
end of data subsets m ght mprove block ng
and stateful operators
Punctuations
Apunctuation describes a subset of data in a stream
A data item d is said tomatch a punctuation p
ifif dd b lbelongs to the subset described by s t th s bs t d s ib d b p
Punctuations in a stream will indicate no more data items will occur that match that punctuation
A punctuated stream isgrammatical if, for
each punctuation p , no following data item matches p
10/10/2007 Data Streams: Lecture 6 8
p
Example Query, part 3
End of auction punctuations allow us to reduce size ofallow us to reduce size of state for join End of auction punctuations unblock group-by
σ
10/10/2007 Data Streams: Lecture 6 9
SELECT A.a_id, MAX(price) FROM Auction A, Bid B WHERE A.a_id=B.a_id AND A.category = 10 GROUP BY A.a_id
Auction Bid
σ
Consequences of Punctuations
Blocking operators may produce some output before end of streamoutput before end of stream
Stateful operators may keep less state
10/10/2007 Data Streams: Lecture 6 10
Any operator should output punctuation whenever possible
Sources of Punctuations
Source or sensor intelligence
Knowledge of access order
Knowledge of stream or application
semantics
Auxiliary information
10/10/2007 Data Streams: Lecture 6 13
y
Stream operator behavior
Initial Testing
Conducted an ad hoc test of our ideas
Promising results
10/10/2007 Data Streams: Lecture 6 14
Test Implementation
10/10/2007 Data Streams: Lecture 6 15
Only those necessary to make our test query work
Initial Test Results
Query Performance State Size for Union Operator
0
10
20
30
40
50
0 1 5 10 30 Punctuations per Hour
Time (Sec)
Fi t O t t L t O t t
0
500
1000
1500
1 127 253 379 505 631 757 883 1009 1135 1261 1387 No. of Tuples Arrived
Tuples in State
10/10/2007 Data Streams: Lecture 6 16
First Output Last Output (^) No Punctuation With Punctuation
Representation of Streams
Stream represented as a
"sliced list"sliced list S = [[ 1 2 ] [ 3 ] [ ] [ 4 5 ] [ 6 ] ]
Easier to model a finite stream
Variability in input arrival rate vs. operator processing rate
Interleavings of multiple inputs
Notation
S = [[ 1 , 2 ], [ 3 ], [ ], [ 4 , 5 ], [ 6 ], … ]
10/10/2007 Data Streams: Lecture 6 19
Notation
S [ i ]: First i slices of S
S @ i : i- th slice of S
S [ 2 ] = [ 1 , 2 , 3 ] S [ 4 ] = [ 1 , 2 , 3 , 4 , 5 ]
S @ 2 = [ 3 ] S @ 4 = [ 4 , 5 ]
Stream Iterator
Not all stream-to-stream functions are
suitable: sort on positive numberssuitable: sort on positive numbers
Astream iterator is a function that
accesses input incrementally
10/10/2007 Data Streams: Lecture 6 20
f ( S ) = q ( S @ 1 , st 0 ) ++ q ( S @ 2 , st 1 ) ++ … ++
q ( S @ i , sti-1 ) ++ …
where stj = r ( S @ j, stj-1 )
Punctuated Streams in Haskell
Each tuple may be Either a data item or
punctuationpunctuation
Constructors Left , Right
New class Pattern New class Punc is a tuple of Pattern s
10/10/2007 Data Streams: Lecture 6 21
[[Left 1, Left 5], [Left 3], [Right (Range (0,4))], [Left 5,Left 6,Left 7], … ]
type Stream a b = [[Either a b]]
Representing Stream Iterators
(No Punctuation Case)
A stream iterator is a 3-tuple:
Encapsulated in a single data type
data Basic state input output = B ([input] -> state -> ([output],state)) -- step
10/10/2007 Data Streams: Lecture 6 22
([ p ] ([ p ], )) p (state -> ([output],state)) -- final --duplicate elimination ([], step, final), where step xs st = ((nub xs \ st), union st xs) final st = ([], [])
Stream Iterators for Punctuated
Streams
Punctuated stream iterator has 5 parts:
10/10/2007 Data Streams: Lecture 6 25
data Basic state input inputp output outputp = B ([input] -> state -> ([output],state)) --Step ([inputp] -> state -> [output]) --Pass ([inputp] -> state -> [outputp]) --Prop ([inputp] -> state -> state) --Keep
Common Behavior Function for
Punctuated Streams
Read input slice, separate data items from punctuationsp , p p Output appropriate result data items and punctuation Manage state
unary :: s -> (Basic s it ip ot op) -> Stream it ip -> Stream ot op unary st (basic@(B step pass prop keep)) (xs:rest) = [map norm tsOut ++ map norm tsExtra ++ map punct psOut] ++ ( tN ' b i t)
10/10/2007 Data Streams: Lecture 6 26
++ (unary stNew' basic rest) where (ts,ps) = splitPunc xs ([],[]) (tsOut,stNew) = step ts st tsExtra = pass ps stNew psOut = prop ps stNew stNew' = keep ps stNew
Example Stream Iterators
--select iterator
--dupelim iterator d li S St b > St b
state reduced
selectS :: (a -> Bool) -> Stream a b -> Stream a b selectS pred = unary [] (B step passT prop keepT) where step ts st = (filter pred ts, []) prop ps st = ps
10/10/2007 Data Streams: Lecture 6 27
dupelimS :: Stream a b -> Stream a b dupelimS = unary [] (B step passT prop keep) where step ts st = ((nub ts \ st), union st ts) prop ps st = ps keep ps st = setNomatchTs st ps
Question 2: “How do we know our
iterators are behaving reasonably?”
Have implementations of iterators on
punctuated streamst t d t
What does it mean to behave
reasonably?
10/10/2007 Data Streams: Lecture 6 28
Data items output
Punctuations emitted
Punctuation Invariants
Punctuation invariants define cumulative behavior
Based on the arrival of some prefix of input, what should be done
10/10/2007 Data Streams: Lecture 6 31
Pass Invariants
Pass invariants take the form
cpass(T (T 1 , PP 1 , …, TT (^) n , PP (^) n) ) = T Toutout
Note the ' c ' to denote cumulative behavior
Examples
difference cpass(T P T P ) { t | t ∈ T ∧ t ∉ T ∧ setMatch ( t P )}
10/10/2007 Data Streams: Lecture 6 32
difference cpass(T 1 ,P 1 ,T 2 ,P 2 ) { t | t ∈ T 1 ∧ t ∉ T 2 ∧ setMatch ( t, P 2 )}
Keep Invariants
ckeepckeep (^) j(T(T 1 , PP 1 , …, TT (^) n , PP (^) n) = ) = ŤŤj
select ckeep 1 (T 1 ,P 1 ) [] dupelim ckeep 1 (T 1 ,P 1 ) setNomatchTs ( T 1 ,P 1 )
10/10/2007 Data Streams: Lecture 6 33
difference ckeep 1 (T 1 ,P 1 ,T 2 ,P 2 ) [ t|t ∈ T 1 ∧ t ∉ T 2 ∧ setNomatch ( t, P 2 )] ckeep 2 (T 1 ,P 1 ,T 2 ,P 2 ) setNomatchTs ( T 2 , P 1 )
Proof Strategy for Faithfulness
Prove an iterator implementation is
faithful and proper to its corresponding
table operatortable operator
Two-stage proof
Step 1: Prove invariants imply faithfulness and propriety
Step 2: Prove a particular iterator
10/10/2007 Data Streams: Lecture 6 34
Step 2: Prove a particular iterator implementation conforms to invariants
Performance Scenario
Online auction scenario discussed earlier
NiNi agara query enginei
Generally two query plans for each query
10/10/2007 Data Streams: Lecture 6 37
Punctuations on: Nothing, a_id, hour, 15-minute period, 1-minute period, 30-second period, 15-second period
Performance Query 1
No punctuations required
(Optional) Describe on no
attributes – filters out all punctuations
bid
10/10/2007 Data Streams: Lecture 6 38
punctuations
Indicates query overhead
when punctuations not required
Performance Query 3
Query requires help from punctuations
bid
10/10/2007 Data Streams: Lecture 6 39
“build up” on minute attribute if needed
Performance Query 4
σc IN {92,136,208,294}
⋈ a_id=a_id
10/10/2007 Data Streams: Lecture 6 40
auction bid