Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index

Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index

Beschreibung

vor 19 Jahren
Evidence for variable selection bias in classification tree
algorithms based on the Gini Index is reviewed from the literature
and embedded into a broader explanatory scheme: Variable selection
bias in classification tree algorithms based on the Gini Index can
be caused not only by the statistical effect of multiple
comparisons, but also by an increasing estimation bias and variance
of the splitting criterion when plug-in estimates of entropy
measures like the Gini Index are employed. The relevance of these
sources of variable selection bias in the different simulation
study designs is examined. Variable selection bias due to the
explored sources applies to all classification tree algorithms
based on empirical entropy measures like the Gini Index, Deviance
and Information Gain, and to both binary and multiway splitting
algorithms.

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15
:
: