Abstract
Understanding consumer attitudes toward specific products is crucial for boosting sales in the e-commerce industry. To effectively target customers with popular products based on reviews, the classification of consumer feedback becomes imperative. However, classifying product reviews can be challenging, particularly when dealing with imbalanced data labels, which often result in suboptimal classification performance. This study builds upon previous efforts that utilized the Amazon Fine Food Reviews dataset for classification tasks. While these prior attempts showed promise, they were hindered by either poor embeddings or the prevalent class imbalance issue. In response, this research tries to solve these problems by using word embeddings with RoBERTa, a pre-trained transformer-based language model, to classify reviews. Additionally, the XGBoost classifier was implemented, along with embeddings from the language model. Losses were first calculated with equal weights for all class labels, and a re-weighted loss was subsequently adopted to balance the impact of each class on the loss function during training. The incorporation of RoBERTa and XGBoost, along with the class label re-weighting, contributed to improved capturing of intricate word relationships within reviews. As a result, this approach achieves significantly improved accuracy in both binary and multiclass classifications compared to earlier endeavors. Notably, it attained an impressive accuracy of 83.84% in multiclass classification and 93.29% in binary classification tasks, marking a substantial advancement in the field of consumer review analysis.