Beschreibung

vor 17 Jahren
Due to the rapid development of computer technology and new methods
for the extraction of data in the last few years, more and more
applications of databases have emerged, for which an efficient and
effective similarity search is of great importance. Application
areas of similarity search include multimedia, computer aided
engineering, marketing, image processing and many more. Special
interest adheres to the task of finding similar objects in large
amounts of data having complex representations. For example,
set-valued objects as well as tree or graph structured objects are
among these complex object representations. The grouping of similar
objects, the so-called clustering, is a fundamental analysis
technique, which allows to search through extensive data sets. The
goal of this dissertation is to develop new efficient and effective
methods for similarity search in large quantities of complex
objects. Furthermore, the efficiency of existing density-based
clustering algorithms is to be improved when applied to complex
objects. The first part of this work motivates the use of vector
sets for similarity modeling. For this purpose, a metric distance
function is defined, which is suitable for various application
ranges, but time-consuming to compute. Therefore, a filter
refinement technology is suggested to efficiently process range
queries and k-nearest neighbor queries, two basic query types
within the field of similarity search. Several filter distances are
presented, which approximate the exact object distance and can be
computed efficiently. Moreover, a multi-step query processing
approach is described, which can be directly integrated into the
well-known density-based clustering algorithms DBSCAN and OPTICS.
In the second part of this work, new application ranges for
density-based hierarchical clustering using OPTICS are discussed. A
prototype is introduced, which has been developed for these new
application areas and is based on the aforementioned similarity
models and accelerated clustering algorithms for complex objects.
This prototype facilitates interactive semi-automatic cluster
analysis and allows visual search for similar objects in multimedia
databases. Another prototype extends these concepts and enables the
user to analyze multi-represented and multi-instance data. Finally,
the problem of music genre classification is addressed as another
application supporting multi-represented and multi-instance data
objects. An extensive experimental evaluation examines efficiency
and effectiveness of the presented techniques using real-world data
and points out advantages in comparison to conventional approaches.

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15
:
: