Learning from high dimensional data based on weighted feature importance in decision tree ensembles

Nayiri Galestian Pour; Soudabeh Shemehsavar

doi:10.1007/s00180-023-01347-3

Back

Learning from high dimensional data based on weighted feature importance in decision tree ensembles

Journal article

Peer reviewed

Learning from high dimensional data based on weighted feature importance in decision tree ensembles

Nayiri Galestian Pour and Soudabeh Shemehsavar

Computational statistics, Vol.39(1), pp.313-342

2024

DOI: https://doi.org/10.1007/s00180-023-01347-3

Abstract

Mathematics

Physical Sciences

Science & Technology

Statistics & Probability

Learning from high dimensional data has been utilized in various applications such as computational biology, image classification, and finance. Most classical machine learning algorithms fail to give accurate predictions in high dimensional settings due to the enormous feature space. In this article, we present a novel ensemble of classification trees based on weighted random subspaces that aims to adjust the distribution of selection probabilities. In the proposed algorithm base classifiers are built on random feature subspaces in which the probability that influential features will be selected for the next subspace, is updated by incorporating grouping information based on previous classifiers through a weighting function. As an interpretation tool, we show that variable importance measures computed by the new method can identify influential features efficiently. We provide theoretical reasoning for the different elements of the proposed method, and we evaluate the usefulness of the new method based on simulation studies and real data analysis.

Details

Title: Learning from high dimensional data based on weighted feature importance in decision tree ensembles
Authors/Creators: Nayiri Galestian Pour - University of Tehran
Soudabeh Shemehsavar - Murdoch University, College of Science, Technology, Engineering and Mathematics
Publication Details: Computational statistics, Vol.39(1), pp.313-342
Publisher: Springer Nature
Number of pages: 30
Identifiers: 991005728685607891
Murdoch Affiliation: College of Science, Technology, Engineering and Mathematics
Language: English
Resource Type: Journal article

Metrics

20 Record Views

2 Times Cited - Web of Science

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Citation topics: 4 Electrical Engineering, Electronics & Computer Science; 4.61 Artificial Intelligence & Machine Learning; 4.61.145 Classification Algorithms
Web Of Science research areas: Statistics & Probability
ESI research areas: Mathematics