Biclustering: Methods, Software and Application
Beschreibung
vor 13 Jahren
Over the past 10 years, biclustering has become popular not only in
the field of biological data analysis but also in other
applications with high-dimensional two way datasets. This technique
clusters both rows and columns simultaneously, as opposed to
clustering only rows or only columns. Biclustering retrieves
subgroups of objects that are similar in one subgroup of variables
and different in the remaining variables. This dissertation focuses
on improving and advancing biclustering methods. Since most
existing methods are extremely sensitive to variations in
parameters and data, we developed an ensemble method to overcome
these limitations. It is possible to retrieve more stable and
reliable bicluster in two ways: either by running algorithms with
different parameter settings or by running them on sub- or
bootstrap samples of the data and combining the results. To this
end, we designed a software package containing a collection of
bicluster algorithms for different clustering tasks and data
scales, developed several new ways of visualizing bicluster
solutions, and adapted traditional cluster validation indices (e.g.
Jaccard index) for validating the bicluster framework. Finally, we
applied biclustering to marketing data. Well-established algorithms
were adjusted to slightly different data situations, and a new
method specially adapted to ordinal data was developed. In order to
test this method on artificial data, we generated correlated
original random values. This dissertation introduces two methods
for generating such values given a probability vector and a
correlation structure. All the methods outlined in this
dissertation are freely available in the R packages biclust and
orddata. Numerous examples in this work illustrate how to use the
methods and software.
the field of biological data analysis but also in other
applications with high-dimensional two way datasets. This technique
clusters both rows and columns simultaneously, as opposed to
clustering only rows or only columns. Biclustering retrieves
subgroups of objects that are similar in one subgroup of variables
and different in the remaining variables. This dissertation focuses
on improving and advancing biclustering methods. Since most
existing methods are extremely sensitive to variations in
parameters and data, we developed an ensemble method to overcome
these limitations. It is possible to retrieve more stable and
reliable bicluster in two ways: either by running algorithms with
different parameter settings or by running them on sub- or
bootstrap samples of the data and combining the results. To this
end, we designed a software package containing a collection of
bicluster algorithms for different clustering tasks and data
scales, developed several new ways of visualizing bicluster
solutions, and adapted traditional cluster validation indices (e.g.
Jaccard index) for validating the bicluster framework. Finally, we
applied biclustering to marketing data. Well-established algorithms
were adjusted to slightly different data situations, and a new
method specially adapted to ordinal data was developed. In order to
test this method on artificial data, we generated correlated
original random values. This dissertation introduces two methods
for generating such values given a probability vector and a
correlation structure. All the methods outlined in this
dissertation are freely available in the R packages biclust and
orddata. Numerous examples in this work illustrate how to use the
methods and software.
Weitere Episoden
vor 11 Jahren
vor 11 Jahren
vor 11 Jahren
In Podcasts werben
Kommentare (0)