Fraud Transaction Detection using Machine Learning on Financial
Datasets
Affiliations
1
Department of Business Administration , Westcliff University
, 400 Irvine, CA 92614, USA
2
Department of Business Administration , International American University
, Los Angeles, CA 90010, USA
Abstract
Financial fraud poses a significant threat to the digital economy, with credit card fraud being a
prevalent challenge. This study evaluates the performance of Logistic Regression (LR) and
Extreme Gradient Boosting (XG Boost) models in detecting fraudulent transactions using
financial datasets. The study uses practical data from 284,807 transactions, but only 492 are
fraudulent; the imbalanced class issue is solved using the Synthetic Minority Oversampling
Technique (SMOTE). Our findings show that XG Boost with Random Search selection is better
than Logistic Regression in all aspects. XG Boost yielded an accuracy of 99.96%, precision of
95.11%, recall of 79.61%, and F1 score of 86.61%, while for Logistic Regression, the
corresponding percentages were 99.92%, 88.1%, 60.5%, and 71.7%. The AUC statistic of 0.98
for XG Boost against 0.97 for LR classified the model as having better discriminant power. The
results show that XG Boost is more suitable for real-time fraud detection. However,
computational limitations and explainability issues should be considered. For future work, it is
suggested that semi-supervised and supervised learning approaches be investigated and work
with larger datasets to improve fraud detection in financial systems.
Keywords:
Fraud Detection, Machine Learning,
XGBoost, Logistic Regression, and
Imbalanced Dataset (SMOTE)