Dimension reduction and Classification with High-Dimensional Microarray Data
Beschreibung
vor 19 Jahren
Usual microarray data sets include only a handful of observations,
but several thousands of predictor variables. Transforming the
high-dimensional predictor space to make classification (for
instance cancer diagnosis) possible is a major challenge. This
thesis deals with various dimension reduction approaches which can
handle such data. Chapter 2 gives an introduction into
classification with microarray data as well as an overview of a few
specific problems such as variable selection and comparison of
classification methods. In Chapter 3, I discuss a particular class
of interaction structures in the classification framework:
"emerging patterns". I propose a new and more general definition
referring to underlying probabilities and present a new simple
method which is based on the CART algorithm to find the
corresponding empirical patterns in concrete data sets. In
addition, the detected patterns can be used to define new variables
for classification. Thus, I propose a simple scheme to use the
patterns to improve the performance of classification procedures. I
implemented the search algorithm as well as the classification
procedure in the language R. Some of these programs are publicly
available from my homepage. Chapter 4 deals with classical linear
dimension reduction methods. In the context of binary
classification with continuous predictors, I prove two properties
concerning the connections between Partial Least Squares (PLS)
dimension reduction, between-group PCA and between linear
discriminant analysis and between-group PCA. PLS dimension
reduction for classification is examined thoroughly in Chapter 5.
The classification procedure consisting of PLS dimension reduction
and linear discriminant analysis on the new components is compared
favorably with some of the best state-of-the-art classification
methods using nine real microarray cancer data sets. Moreover, I
apply a boosting algorithm to this classification method, which is
a novel approach. In addition, I suggest a simple procedure to
choose the number of PLS components. At last, I examine the
connection between PLS dimension reduction and variable selection
and prove a property concerning the equivalence between a common
univariate selection criterion and a variable selection approach
based on the first PLS component.
but several thousands of predictor variables. Transforming the
high-dimensional predictor space to make classification (for
instance cancer diagnosis) possible is a major challenge. This
thesis deals with various dimension reduction approaches which can
handle such data. Chapter 2 gives an introduction into
classification with microarray data as well as an overview of a few
specific problems such as variable selection and comparison of
classification methods. In Chapter 3, I discuss a particular class
of interaction structures in the classification framework:
"emerging patterns". I propose a new and more general definition
referring to underlying probabilities and present a new simple
method which is based on the CART algorithm to find the
corresponding empirical patterns in concrete data sets. In
addition, the detected patterns can be used to define new variables
for classification. Thus, I propose a simple scheme to use the
patterns to improve the performance of classification procedures. I
implemented the search algorithm as well as the classification
procedure in the language R. Some of these programs are publicly
available from my homepage. Chapter 4 deals with classical linear
dimension reduction methods. In the context of binary
classification with continuous predictors, I prove two properties
concerning the connections between Partial Least Squares (PLS)
dimension reduction, between-group PCA and between linear
discriminant analysis and between-group PCA. PLS dimension
reduction for classification is examined thoroughly in Chapter 5.
The classification procedure consisting of PLS dimension reduction
and linear discriminant analysis on the new components is compared
favorably with some of the best state-of-the-art classification
methods using nine real microarray cancer data sets. Moreover, I
apply a boosting algorithm to this classification method, which is
a novel approach. In addition, I suggest a simple procedure to
choose the number of PLS components. At last, I examine the
connection between PLS dimension reduction and variable selection
and prove a property concerning the equivalence between a common
univariate selection criterion and a variable selection approach
based on the first PLS component.
Weitere Episoden
vor 11 Jahren
vor 11 Jahren
vor 11 Jahren
In Podcasts werben
Kommentare (0)