









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The challenges and methods for indexing and searching high-dimensional data, including similarity measures such as euclidean distance, dynamic time warping, and wavelets. It also covers various index structures such as r-trees, r* trees, and m-tree. The document also touches on the curse of dimensionality and its impact on high-dimensional data.
Typology: Study notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!










c (^) e f g d^ A B
A B c d e f g We descend both branches tosearch for
A B c d e d f g
c (^) e f g d^ A B
A B x d e f g c
c (^) e f g B^ d^ A^ x
d=
d=1 d=
Generally: exponential growth of thehypervolume as a function of dimension
Other manifestations: number of samples required to maintain the
SoundMusic (“Query by humming”) ImagesVideo
DNA sequence matchingMedical imagery
Define a function s : V V Real What properties should s have? Reflexive:s(x,x) = 0 // or infinity Symmetric:s(x,y) = s(y,x) Triangle Inequality:s(x,y) + s(y,z) >= s(x,z)
A B
D C
Q
Chen, Ozsu, Oria 2005
Drawbacks: Sensitive to noise
s si (^) i + s- si+1i+
Hierarchical decomposition allows fine-tuning
After one Horizontal filtering
After twovertical and horizontalfilterings
Wavelets can Principal Component Analysis (PCA), reduce dimensionality, like
Indexing in the reduced feature space False positives ok, False negatives aren’t