Conditional variable importance for random forests
Podcast
Podcaster
Beschreibung
vor 16 Jahren
Background: Random forests are becoming increasingly popular in
many scientific fields because they can cope with "small n large p"
problems, complex interactions and even highly correlated predictor
variables. Their variable importance measures have recently been
suggested as screening tools for, e. g., gene expression studies.
However, these variable importance measures show a bias towards
correlated predictor variables. Results: We identify two mechanisms
responsible for this finding: (i) A preference for the selection of
correlated predictors in the tree building process and (ii) an
additional advantage for correlated predictor variables induced by
the unconditional permutation scheme that is employed in the
computation of the variable importance measure. Based on these
considerations we develop a new, conditional permutation scheme for
the computation of the variable importance measure. Conclusion: The
resulting conditional variable importance reflects the true impact
of each predictor variable more reliably than the original marginal
approach.
many scientific fields because they can cope with "small n large p"
problems, complex interactions and even highly correlated predictor
variables. Their variable importance measures have recently been
suggested as screening tools for, e. g., gene expression studies.
However, these variable importance measures show a bias towards
correlated predictor variables. Results: We identify two mechanisms
responsible for this finding: (i) A preference for the selection of
correlated predictors in the tree building process and (ii) an
additional advantage for correlated predictor variables induced by
the unconditional permutation scheme that is employed in the
computation of the variable importance measure. Based on these
considerations we develop a new, conditional permutation scheme for
the computation of the variable importance measure. Conclusion: The
resulting conditional variable importance reflects the true impact
of each predictor variable more reliably than the original marginal
approach.
Kommentare (0)