CMA - a comprehensive Bioconductor package for supervised classification with high dimensional data
Podcast
Podcaster
Beschreibung
vor 16 Jahren
Background: For the last eight years, microarray-based
classification has been a major topic in statistics, bioinformatics
and biomedicine research. Traditional methods often yield
unsatisfactory results or may even be inapplicable in the so-called
"p >> n" setting where the number of predictors p by far
exceeds the number of observations n, hence the term
"ill-posed-problem". Careful model selection and evaluation
satisfying accepted good-practice standards is a very complex task
for statisticians without experience in this area or for scientists
with limited statistical background. The multiplicity of available
methods for class prediction based on high-dimensional data is an
additional practical challenge for inexperienced researchers.
Results: In this article, we introduce a new Bioconductor package
called CMA (standing for "Classification for MicroArrays") for
automatically performing variable selection, parameter tuning,
classifier construction, and unbiased evaluation of the constructed
classifiers using a large number of usual methods. Without much
time and effort, users are provided with an overview of the
unbiased accuracy of most top-performing classifiers. Furthermore,
the standardized evaluation framework underlying CMA can also be
beneficial in statistical research for comparison purposes, for
instance if a new classifier has to be compared to existing
approaches. Conclusion: CMA is a user-friendly comprehensive
package for classifier construction and evaluation implementing
most usual approaches. It is freely available from the Bioconductor
website at http://bioconductor.org/packages/2.3/bioc/html/CMA.html.
classification has been a major topic in statistics, bioinformatics
and biomedicine research. Traditional methods often yield
unsatisfactory results or may even be inapplicable in the so-called
"p >> n" setting where the number of predictors p by far
exceeds the number of observations n, hence the term
"ill-posed-problem". Careful model selection and evaluation
satisfying accepted good-practice standards is a very complex task
for statisticians without experience in this area or for scientists
with limited statistical background. The multiplicity of available
methods for class prediction based on high-dimensional data is an
additional practical challenge for inexperienced researchers.
Results: In this article, we introduce a new Bioconductor package
called CMA (standing for "Classification for MicroArrays") for
automatically performing variable selection, parameter tuning,
classifier construction, and unbiased evaluation of the constructed
classifiers using a large number of usual methods. Without much
time and effort, users are provided with an overview of the
unbiased accuracy of most top-performing classifiers. Furthermore,
the standardized evaluation framework underlying CMA can also be
beneficial in statistical research for comparison purposes, for
instance if a new classifier has to be compared to existing
approaches. Conclusion: CMA is a user-friendly comprehensive
package for classifier construction and evaluation implementing
most usual approaches. It is freely available from the Bioconductor
website at http://bioconductor.org/packages/2.3/bioc/html/CMA.html.
Kommentare (0)