Biclustering: Methods, Software and Application ~ Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU

Over the past 10 years, biclustering has become popular not only in
the field of biological data analysis but also in other
applications with high-dimensional two way datasets. This technique
clusters both rows and columns simultaneously, as opposed to
clustering only rows or only columns. Biclustering retrieves
subgroups of objects that are similar in one subgroup of variables
and different in the remaining variables. This dissertation focuses
on improving and advancing biclustering methods. Since most
existing methods are extremely sensitive to variations in
parameters and data, we developed an ensemble method to overcome
these limitations. It is possible to retrieve more stable and
reliable bicluster in two ways: either by running algorithms with
different parameter settings or by running them on sub- or
bootstrap samples of the data and combining the results. To this
end, we designed a software package containing a collection of
bicluster algorithms for different clustering tasks and data
scales, developed several new ways of visualizing bicluster
solutions, and adapted traditional cluster validation indices (e.g.
Jaccard index) for validating the bicluster framework. Finally, we
applied biclustering to marketing data. Well-established algorithms
were adjusted to slightly different data situations, and a new
method specially adapted to ordinal data was developed. In order to
test this method on artificial data, we generated correlated
original random values. This dissertation introduces two methods
for generating such values given a probability vector and a
correlation structure. All the methods outlined in this
dissertation are freely available in the R packages biclust and
orddata. Numerous examples in this work illustrate how to use the
methods and software.

Biclustering: Methods, Software and Application

Beschreibung

Weitere Episoden

Generalized Bayesian inference under prior-data conflict

Regularity for degenerate elliptic and parabolic systems

Reifegradmodelle für Werkzeuglandschaften zur Unterstützung von ITSM-Prozessen

Similarity search and mining in uncertain spatial and spatio-temporal databases

Tensor factorization for relational learning

Kommentare (0)

Abonnenten

Anmelden mit