Categorical variables with many categories are preferentially selected in model selection procedures for multivariable regression models on bootstrap samples

Categorical variables with many categories are preferentially selected in model selection procedures for multivariable regression models on bootstrap samples

Beschreibung

vor 10 Jahren
To perform model selection in the context of multivariable
regression, automated variable selection procedures such as
backward elimination are commonly employed. However, these
procedures are known to be highly unstable. Their stability can be
investigated using bootstrap-based procedures: the idea is to
perform model selection on a high number of bootstrap samples
successively and to examine the obtained models, for instance in
terms of the inclusion of specific predictor variables. However,
from the literature such bootstrap-based procedures are known to
yield misleading results in some cases. In this paper we aim to
thoroughly investigate a particular important facet of these
problems. More precisely, we assess the behaviour of regression
models--with automated variable selection procedure based on the
likelihood ratio test--fitted on bootstrap samples drawn with
replacement and on subsamples drawn without replacement with
respect to the number and type of included predictor variables. Our
study includes both extensive simulations and a real data example
from the NHANES study. The results indicate that models derived
from bootstrap samples include more predictor variables than models
fitted on original samples and that categorical predictor variables
with many categories are preferentially selected over categorical
predictor variables with fewer categories and over metric predictor
variables. We conclude that using bootstrap samples to select
variables for multivariable regression models may lead to overly
complex models with a preferential selection of categorical
predictor variables with many categories. We suggest the use of
subsamples instead of bootstrap samples to bypass these drawbacks.

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15
:
: