Similarity processing in multi-observation data
Beschreibung
vor 11 Jahren
Many real-world application domains such as sensor-monitoring
systems for environmental research or medical diagnostic systems
are dealing with data that is represented by multiple observations.
In contrast to single-observation data, where each object is
assigned to exactly one occurrence, multi-observation data is based
on several occurrences that are subject to two key properties:
temporal variability and uncertainty. When defining similarity
between data objects, these properties play a significant role. In
general, methods designed for single-observation data hardly apply
for multi-observation data, as they are either not supported by the
data models or do not provide sufficiently efficient or effective
solutions. Prominent directions incorporating the key properties
are the fields of time series, where data is created by temporally
successive observations, and uncertain data, where observations are
mutually exclusive. This thesis provides research contributions for
similarity processing - similarity search and data mining - on time
series and uncertain data. The first part of this thesis focuses on
similarity processing in time series databases. A variety of
similarity measures have recently been proposed that support
similarity processing w.r.t. various aspects. In particular, this
part deals with time series that consist of periodic occurrences of
patterns. Examining an application scenario from the medical
domain, a solution for activity recognition is presented. Finally,
the extraction of feature vectors allows the application of spatial
index structures, which support the acceleration of search and
mining tasks resulting in a significant efficiency gain. As feature
vectors are potentially of high dimensionality, this part
introduces indexing approaches for the high-dimensional space for
the full-dimensional case as well as for arbitrary subspaces. The
second part of this thesis focuses on similarity processing in
probabilistic databases. The presence of uncertainty is inherent in
many applications dealing with data collected by sensing devices.
Often, the collected information is noisy or incomplete due to
measurement or transmission errors. Furthermore, data may be
rendered uncertain due to privacy-preserving issues with the
presence of confidential information. This creates a number of
challenges in terms of effectively and efficiently querying and
mining uncertain data. Existing work in this field either neglects
the presence of dependencies or provides only approximate results
while applying methods designed for certain data. Other approaches
dealing with uncertain data are not able to provide efficient
solutions. This part presents query processing approaches that
outperform existing solutions of probabilistic similarity ranking.
This part finally leads to the application of the introduced
techniques to data mining tasks, such as the prominent problem of
probabilistic frequent itemset mining.
systems for environmental research or medical diagnostic systems
are dealing with data that is represented by multiple observations.
In contrast to single-observation data, where each object is
assigned to exactly one occurrence, multi-observation data is based
on several occurrences that are subject to two key properties:
temporal variability and uncertainty. When defining similarity
between data objects, these properties play a significant role. In
general, methods designed for single-observation data hardly apply
for multi-observation data, as they are either not supported by the
data models or do not provide sufficiently efficient or effective
solutions. Prominent directions incorporating the key properties
are the fields of time series, where data is created by temporally
successive observations, and uncertain data, where observations are
mutually exclusive. This thesis provides research contributions for
similarity processing - similarity search and data mining - on time
series and uncertain data. The first part of this thesis focuses on
similarity processing in time series databases. A variety of
similarity measures have recently been proposed that support
similarity processing w.r.t. various aspects. In particular, this
part deals with time series that consist of periodic occurrences of
patterns. Examining an application scenario from the medical
domain, a solution for activity recognition is presented. Finally,
the extraction of feature vectors allows the application of spatial
index structures, which support the acceleration of search and
mining tasks resulting in a significant efficiency gain. As feature
vectors are potentially of high dimensionality, this part
introduces indexing approaches for the high-dimensional space for
the full-dimensional case as well as for arbitrary subspaces. The
second part of this thesis focuses on similarity processing in
probabilistic databases. The presence of uncertainty is inherent in
many applications dealing with data collected by sensing devices.
Often, the collected information is noisy or incomplete due to
measurement or transmission errors. Furthermore, data may be
rendered uncertain due to privacy-preserving issues with the
presence of confidential information. This creates a number of
challenges in terms of effectively and efficiently querying and
mining uncertain data. Existing work in this field either neglects
the presence of dependencies or provides only approximate results
while applying methods designed for certain data. Other approaches
dealing with uncertain data are not able to provide efficient
solutions. This part presents query processing approaches that
outperform existing solutions of probabilistic similarity ranking.
This part finally leads to the application of the introduced
techniques to data mining tasks, such as the prominent problem of
probabilistic frequent itemset mining.
Weitere Episoden
vor 11 Jahren
vor 11 Jahren
vor 11 Jahren
In Podcasts werben
Kommentare (0)