Logo image
Learning from high dimensional data based on weighted feature importance in decision tree ensembles
Journal article   Peer reviewed

Learning from high dimensional data based on weighted feature importance in decision tree ensembles

Nayiri Galestian Pour and Soudabeh Shemehsavar
Computational statistics, Vol.39(1), pp.313-342
2024

Abstract

Mathematics Physical Sciences Science & Technology Statistics & Probability
Learning from high dimensional data has been utilized in various applications such as computational biology, image classification, and finance. Most classical machine learning algorithms fail to give accurate predictions in high dimensional settings due to the enormous feature space. In this article, we present a novel ensemble of classification trees based on weighted random subspaces that aims to adjust the distribution of selection probabilities. In the proposed algorithm base classifiers are built on random feature subspaces in which the probability that influential features will be selected for the next subspace, is updated by incorporating grouping information based on previous classifiers through a weighting function. As an interpretation tool, we show that variable importance measures computed by the new method can identify influential features efficiently. We provide theoretical reasoning for the different elements of the proposed method, and we evaluate the usefulness of the new method based on simulation studies and real data analysis.

Details

Metrics

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics
4 Electrical Engineering, Electronics & Computer Science
4.61 Artificial Intelligence & Machine Learning
4.61.145 Classification Algorithms
Web Of Science research areas
Statistics & Probability
ESI research areas
Mathematics
Logo image