The behaviour of random forest permutation-based variable importance measures under predictor correlation ~ Medizin - Open Access LMU

Background: Random forests (RF) have been increasingly used in
applications such as genome-wide association and microarray studies
where predictor correlation is frequently observed. Recent works on
permutation-based variable importance measures (VIMs) used in RF
have come to apparently contradictory conclusions. We present an
extended simulation study to synthesize results. Results: In the
case when both predictor correlation was present and predictors
were associated with the outcome (H(A)), the unconditional RF VIM
attributed a higher share of importance to correlated predictors,
while under the null hypothesis that no predictors are associated
with the outcome (H(0)) the unconditional RF VIM was unbiased.
Conditional VIMs showed a decrease in VIM values for correlated
predictors versus the unconditional VIMs under H(A) and was
unbiased under H(0). Scaled VIMs were clearly biased under H(A) and
H(0). Conclusions: Unconditional unscaled VIMs are a
computationally tractable choice for large datasets and are
unbiased under the null hypothesis. Whether the observed increased
VIMs for correlated predictors may be considered a "bias" - because
they do not directly reflect the coefficients in the generating
model - or if it is a beneficial attribute of these VIMs is
dependent on the application. For example, in genetic association
studies, where correlation between markers may help to localize the
functionally relevant variant, the increased importance of
correlated predictors may be an advantage. On the other hand, we
show examples where this increased importance may result in
spurious signals.

The behaviour of random forest permutation-based variable importance measures under predictor correlation

Beschreibung

Weitere Episoden

Analysis of IL2/IL21 Gene Variants in Cholestatic Liver Diseases Reveals an Association with Primary Sclerosing Cholangitis

Biliary Bicarbonate Secretion Constitutes a Protective Mechanism against Bile Acid-Induced Injury in Man

Growth Pattern of Untreated Boys with Simple Virilizing Congenital Adrenal Hyperplasia Indicates Relative Androgen Insensitivity during the First Six Months of Life

Growth Patterns in the First Three Years of Life in Children with Classical Congenital Adrenal Hyperplasia Diagnosed by Newborn Screening and Treated with Low Doses of Hydrocortisone

Missing Heritability in the Tails of Quantitative Traits? A Simulation Study on the Impact of Slightly Altered True Genetic Models

Kommentare (0)

Abonnenten

Anmelden mit