Abstract
Adverse Drug Events are a major cause of morbidity and mortality globally. It leads to more than 1.3 million emergency hospitalizations per year in the United States alone. Timely identification of Adverse Drug Event from text sources (e.g. social media, clinical reports) is essential to avoid significant health consequences and enhance patient outcomes. Traditional manual monitoring procedures for ADE detection are labourintensive and are more prone to error. Machine learning algorithms help to overcome these obstacles by facilitating automated, scalable analysis of vast textual data sources with higher accuracy. While prior research has explored machine learning for Adverse Drug Event detection, the optimal combination of feature extraction techniques and classification models remains understudied, leading to non-optimal outcomes in real-world applications. Previous works have focused on single methods, like Bag-of-Words with Support Vector Machines or TF-IDF with Logistic Regression, but comparative evaluation of these methods with hybrid feature extraction methods is less explored. Moreover, computational efficiency is rarely considered. This research performs a detailed comparison of various feature extraction techniques including Bag of Words, TF-IDF, Word2Vec, and hybrid techniques with Logistic Regression and Random Forest algorithms. The performance was evaluated in terms of accuracy, precision, recall, F1-score, training, and testing time. The findings showed that Bag of Words with Random Forest provides the best accuracy of 89.88 %. Here, hybrid approaches such as an integration of TF-IDF with Word2Vec were not superior to single methods indicating that more straightforward strategies are better suited for Adverse Drug Detection tasks.