Podcast-News NAPS - Neues aus der Podcast-Szene Podcast Meldungen Berichte Kommentare Service-News Technik
Finde Podcasts Podcast-Tipps Podcast-Charts Podcast-Verzeichnis Podcast-Kategorien
Mache Podcasts In 5 Minuten zu... Podcast Podcast-Wissen Podcasting-FAQ Podcaster Podcast-Hosting Podcast-Studio Berlin
Erlebe Podcasts Gezielt - Der Reichweiten-Podcast Podcast-Events Podcast-Jobs

Unbiased split selection for classification trees based on the Gini Index

0 B

Podcast

Podcaster

Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03

Bildung

Beschreibung

vor 19 Jahren

The Gini gain is one of the most common variable selection criteria
in machine learning. We derive the exact distribution of the
maximally selected Gini gain in the context of binary
classification using continuous predictors by means of a
combinatorial approach. This distribution provides a formal support
for variable selection bias in favor of variables with a high
amount of missing values when the Gini gain is used as split
selection criterion, and we suggest to use the resulting p-value as
an unbiased split selection criterion in recursive partitioning
algorithms. We demonstrate the efficiency of our novel method in
simulation- and real data- studies from veterinary gynecology in
the context of binary classification and continuous predictor
variables with different numbers of missing values. Our method is
extendible to categorical and ordinal predictor variables and to
other split selection criteria such as the cross-entropy criterion.