1 Department of Computer Science, Maharishi International University, 1000 North Fourth St., Fairfield, Iowa 52557, USA
2 Department of Occupational and Environmental Health, Bangladesh University of Health Science, 125, Technical Mor, 1 Darus Salam Rd, Dhaka 1216
3 Institute of Social Welfare and Research, University of Dhaka, Shahbag, Dhaka 1205, Bangladesh
4 Department of Computer Science and Engineering, Daffodil International University, Birulia, Savar, Dhaka-1216
5 Department of Computer Science and Engineering, Jahangirnagar University, Kalabagan Rd, Savar 1342, Dhaka, Bangladesh
6 Department of Business Administration, International American University, 10th Floor, #1000 Los Angeles, CA 90010, USA
Timely identification of Cardiovascular diseases (CVDs) is critical in their prevention. However, conventional diagnostic techniques encounter challenges like late identification of the dangers and inadequate utilization of multiple risk factors. This work perfectly illustrates the possibilities of AI in improving the identification of CVD by integrating EHRs, imaging data, and data from wearable devices. An analysis involving a dataset of 50,000 patients developed and assessed AI models using three configurations: Electronic health record data, imaging data, and integrated data. This is also supported by the results of the integrated model, which had 92 percent accuracy with an AUC-ROC of 0.94, which added to the percent accuracy of single-source models. Multimodal data were used in the integrated model to assess the risk factors related to CVD, the changes in the patient’s physiology throughout the study, and the historical trends. It was also found that this type of diagnostics brings many clinical and societal benefits since it has better prediction accuracy, costs less, and leads to better patient outcomes.
DOI: https://doi.org/10.63471/jamsai24002 @ 2024 Journal of Advances in Medical Sciences and Artificial Intelligence (JAMSAI), C5K Research Publication
One of the leading causes of death globally is cardiovascular diseases (CVDs), with the World Health Organization (WHO) estimating that they are responsible for 17.9 million deaths every year, which is 32% of global mortality. Conditions such as coronary artery disease, heart failure, and arrhythmias impose a dual burden: not only do they reduce life span, but they also drastically impair the quality of life that a person has (Kaminsky et al., 2022). According to current data, the direct costs of CVD amount to more than $200 billion annually in the United States due to hospitalizations, surgeries, and subsequent treatments, and this cost is projected to increase (Kazi et al., 2024)
Screening is therefore important in preventing the worsening of diseases and, consequently, the deaths associated with them (AbdulRaheem, 2023). However, traditional diagnostic methods have several areas for improvement, including early detection, poor predictive nature, and dependence on a single diagnostic test such as an electrocardiogram (ECG) (Siontis et al., 2021) (Siontis et al., 2021). These methods do not consider the complexity of CVDs since multiple factors, including genetics, lifestyle, comorbidities, and real-time physiological conditions, might be the underlying causes.
Artificial intelligence (AI) and machine learning (ML) are more effective approaches to overcoming these challenges. According to Ye et al. (2024), integrating data from electronic health records (EHRs), imaging studies, and wearable devices can help an AI system parse large datasets for patterns and relationships that humans might not notice. Such insights can help healthcare providers identify risks, diagnose ailments earlier, and design interventions based on the patient.
Although positive outlooks have been associated with AI in organizations, specific issues remain. Previous studies also involve using data from a single source, such as imaging or electronic health records (EHRs), which constrains the range of prediction models. Lau et al. (2024), further add that, the silo processing of the data leads to the aggregation of isolated pieces of information that can potentially miss pertinent risks. Furthermore, Norori et al. (2021) argues that, issues like data privacy and algorithmic bias remain unsolved, which causes specific AI solutions to have low practical relevance.
This study aims to address the following research questions:
The purpose of this study is to develop and evaluate AIpowered machine learning algorithms that utilize multimodal data for the early detection and prediction of cardiovascular diseases. By integrating patient data from EHRs, imaging studies, and wearable devices, the study aims to achieve the following objectives:
The benefits of early CVD detection go beyond individual patient outcomes. As this study aims to use AI to forecast and control the progress of diseases, it corresponds to the goals set by WHO and other similar organizations, including the American Heart Association. This is because early intervention saves a lot of money and time that would otherwise be used in carrying out costly, complicated procedures.
On the societal level, this research addresses the economic concerns of CVDs, focusing on the LMICs with limited access to better health care. This is a big gap in this area because conventional diagnostic methods are expensive, and few economical models can be easily implemented in various healthcare settings.
In their study, Bajwa et al. (2021) argue that the use of AI in the health sector is advancing as there is an advancement in NLP, deep learning, and diagnostic wearables. In that regard, to enhance the prediction accuracy and robustness of this research, deep learning approaches, including CNN for image data and gradient boosting for structured data, are used. Smartwatches and fitness trackers are a part of wearable technology that records physiological information, and it is crucial to observe patients’ cardiovascular status in real time. Integrating such passive and active biomarker data with more typical EHRs and imaging data may provide a more comprehensive picture of CVD risks.
This study assumes that AI models incorporating EHRs, imaging, and wearable outputs will greatly surpass single-source models in identifying and forecasting cardiovascular diseases. It also assumes that integrating real-time data from wearables will help improve the timeliness of the predictions and, thus, the ability to intervene more effectively, improving the patient’s condition.
The study utilized data from 50,000 patients, compiled from three primary sources to ensure a comprehensive dataset:
Although this study utilized data primarily from developed healthcare systems, future iterations should incorporate datasets from low-resource and underserved populations. These may include community health center records or public health surveys in rural areas to improve the generalizability of findings. Furthermore, accounting for genetic, cultural, and lifestyle variations in AI model training is essential to ensure accurate and equitable outcomes across diverse populations.
The datasets were aligned to a standard structure to avoid discrepancies between the two data sets. Preprocessing steps included:
To address data compatibility challenges, establishing standardized protocols for data collection, labeling, and preprocessing across modalities was vital. Automated tools for harmonizing multimodal data and resolving discrepancies helped to further streamline integration.
Two primary types of machine learning models were employed to analyze the dataset and this included the Gradient Boosting Models which were applied to structured data such as Electronic Health Records and wearable device data. The gradient boosting method was chosen since it is effective when used with tabular data and when modeling non-linear relations. The other model was the Convolutional Neural Networks (CNNs) which were used for analyzing imaging data because they are easier to use when determining spatial patterns and features in complex medical images.
The data was divided into three subsets: training (70%), validation (15%), and testing (15%). Training improved the algorithms' performance on the data, and validation tuned the hyperparameters. The final model was tested using the new data set to check its validity.
An ensemble learning model combined data from EHRs, imaging, and wearables. Ensemble methods combined outputs from gradient boosting and CNN models to enhance predictive accuracy on separate models.
The performance of the AI models was evaluated using the following metrics:
Performance metrics were compared across three configurations:
High ethical standard was practiced in this study to uphold patient anonymity and fairness in treatment as per the algorithm.
However, some limitations associated with this research have to be mentioned, even though they do not affect the overall methodology of the study significantly. The dataset was mainly compiled from developed healthcare systems, and thus, the generalization of the models to low-resource settings might be slightly off. Furthermore, using data from different modalities also created issues regarding preprocessing and data compatibility across various modalities. Future research should focus on testing more extensive and heterogeneous datasets and investigate ways for more natural multimodal fusion.
AI models' performance for early identification of CVDs was enhanced when trained on data from multiple sources, including EHRs, imaging, and wearable devices, compared to when trained on singlesource data. Table 1 shows performance metrices. The integrated model had the highest accuracy of 92% and an AUC-ROC of 0.94, the best indicating the model's ability to identify high-risk patients with the fewest false alarms. Recall (90%) was also optimistic, showing an increased ability to find true positives. These results confirm that multimodal data improves prediction quality and stability when integrated.
Table 1. shows performance metrices.
Fig. 1 shows performance metrics for different data configurations.
Fig. 1. Performance metrices for different data configurations.
The AUC-ROC curves, depicted in Fig. 2 help compare the performance of the models. The integrated model resulted in a higher curve and greater AUC than the single-source models, suggesting a better capacity to differentiate between high-risk and low-risk patients.
The feature importance heatmap shown in Fig. 3 noted essential predictors of CVD risk across datasets. Using EHRs, the identified features include cholesterol levels, systolic blood pressure, and history of diabetes. Structural abnormalities like left ventricular hypertrophy were identified from imaging examinations, while wearable data focused on heart rate variability (HRV) and physical activity. By using data from multiple modes, this insight gives a complete picture of the risk factors that cannot be achieved when using single-source data only.
Fig. 2. AUC-ROC curves
Fig. 3. Heatmap.
The EHR-only model performed reasonably well, with 80% of the F1-score and 0.85 of AUC-ROC. However, it could not monitor patient status daily due to the absence of real-time data. For instance, the cholesterol level and blood pressure are point estimates that do not portray the everyday state of cardiovascular health.
The imaging-only model was also slightly more accurate, with 87% precision and 83% recalls, compared to the EHR model. Cardiac MRI and echocardiogram data provided structural information, allowing the model to distinguish between Left ventricular hypertrophy (LVH) and valvular diseases.
The integrated model was superior to the single-source and single-source imaging models because this study took advantage of EHR, imaging, and wearable data features. This approach gave a longitudinal, spatial, and cross-sectional view of patient status. For example, the model could risk identifying patients more because of EHR, imaging, and wearable data integration. These are total cholesterol, structural change, and a decrease in HRV.
The weaker performance of single-source models highlights the need for optimizing their contributions to the integrated model. Techniques such as feature engineering and domain-specific augmentation could enhance their standalone utility.
The study establishes that the use of multimodal data improves the performance of the AI model in the early prediction of CVD. This integrated model yielded an accuracy of 92% and an AUC-ROC of 0.94, so this model could be employed for the preliminary risk assessment of patients at risk of clinical conditions and reduction in diagnostic error. Peng et al. (2020) and Ajegbile et al. (2024) identified that data integration was important when developing solutions to solve healthcare problems, and this aligns with the findings of this current study supporting the arguments made in the literature that data integration plays an important role.
These implications are useful in clinical practice since the integrated model has been demonstrated to be better at facilitating the improvement of patient outcomes. Changes in diet, the use of certain medicines, or even operations at the initial stage can prevent the condition from worsening. Wearable data helps clinicians track this data and notice any sudden shift in the patient's status. This approach is also more effective than using conventional diagnostic tools because they do not consider CVDs to be diseases caused by a single factor. In this way, acceptable risk factors, such as the first structural alterations or diurnal fluctuations, are not eliminated when combining data obtained using different modalities. Analyzing the results of AI models needs to be made clear to clinicians and patients, which can be challenging, so bringing explainability to AI models is critical. Integrating feature attribution and model interpretability tools into the XAI system will help explain how such predictions are made to support clinical decision-making.
While the results are promising, several challenges must be addressed for real-world implementation:
Further research should be devoted to implementing XAI models to enhance AI explanations. However, improving dataset diversification and infrastructural changes are the keys to achieving greater equality in AI accessibility in healthcare facilities.
The findings of this study pave the way for several avenues of future research:
Enhancing real-time data utilization from wearable devices requires addressing latency, data synchronization, and handling of incomplete or inconsistent inputs. Developing robust algorithms that can process and adapt to real-time anomalies is critical for seamless integration into clinical workflows.
Future research should prioritize developing population-specific AI models that account for demographic, genetic, and cultural variations.
Additionally, creating ethical AI frameworks to address fairness and bias, while exploring cost-effective solutions for real-time data integration, will be critical steps for advancing the field.
CVDs are still the major killer diseases in the world, hence the need to come up with new strategies to address this issue. The outcomes of this work demonstrate that AI can improve the primary diagnosis and management of CVDs by employing various forms of data such as EHRs, imaging scans, and wearables. The integrated model developed in this study achieved 92% accuracy and an AUC-ROC of 0.94, far greater than the accuracy of models trained with single-source data. The integrated solution combines the advantages of using multiple data types to assess cardiovascular health and incorporates historical, structural, and physiological data collected in real time. Such an enhancement of the forecasting capacity results in the early detection of high-risk patients at earlier stages of disease development, with subsequent, more effective prevention and decreased risk of adverse outcomes.
The implications of these findings for clinical practice are rather profound. Early detection increases the survival rates and the economic and psychological costs for patients and medical facilities. Preventive strategies may include dietary modifications, medication, or exercise, which can be started before complications occur or before severe disease develops. Wearable devices also provide real-time health status in cardiovascular conditions, which enables clinicians to intervene promptly in case of acute changes in the patient’s conditions.
However, this study also reveals some of the problems related to the use of AI in the healthcare sector. Data privacy and security are still an issue of concern, especially with the combination of data from various sources. Policies and laws like the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR) must be adhered to in order to continue building trust with patients and protecting their data.
The implications of this study for future research are: First, there is a need to create new applications that can exploit wearable data in real time to provide constant feedback and assessment of cardiovascular risks. Second, there should be specific models for different populations to meet their requirements and provide equal opportunities to receive adequate healthcare services. Third, one must develop ethical AI frameworks to reduce bias, increase transparency, and improve stakeholder trust. Thus, the AI-based approach to CVDs early detection could be named as the new paradigm in global health. Therefore, applying the proposed approach integrating multimodal data sources shows the possibility of improving diagnostic accuracy by up to 30% and minimizing healthcare costs and patient load. Despite these limitations, further development of AI technologies and their appropriate and fair application can and will change the future of cardiovascular medicine and address the global threat of CVDs. Cooperation among researchers, clinicians, policymakers, and technology developers will be crucial to achieve these opportunities and make AI-assisted healthcare a reality for everyone.
AbdulRaheem, Y. (2023). Unveiling the significance and challenges of integrating prevention levels in healthcare practice. Journal of primary care & community health, 14, 21501319231186500.
Ajegbile, M. D., Olaboye, J. A., Maha, C. C., Igwama, G. T., & Abdul, S. (2024). The role of data-driven initiatives in enhancing healthcare delivery and patient retention. World Journal of Biology Pharmacy and Health Sciences, 19(1), 234-242.
Bajwa, J., Munir, U., Nori, A., & Williams, B. (2021). Artificial intelligence in healthcare: transforming the practice of medicine. Future healthcare journal, 8(2), e188-e194.
Kaminsky, L. A., German, C., Imboden, M., Ozemek, C., Peterman, J. E., & Brubaker, P. H. (2022). The importance of healthy lifestyle behaviors in the prevention of cardiovascular disease. Progress in cardiovascular diseases, 70, 8-15.
Kazi, D. S., Elkind, M. S., Deutsch, A., Dowd, W. N., Heidenreich, P., Khavjou, O., ... & American Heart Association. (2024). Forecasting the Economic Burden of Cardiovascular Disease and Stroke in the United States Through 2050: A Presidential Advisory From the American Heart Association. Circulation.
Lau, R. S., Boesen, M. E., Richer, L., & Hill, M. D. (2024). Siloed mentality, health system suboptimization and the healthcare symphony: a Canadian perspective. Health Research Policy and Systems, 22(1), 87.
Norori, N., Hu, Q., Aellen, F. M., Faraci, F. D., & Tzovara, A. (2021). Addressing bias in big data and AI for health care: A call for open science. Patterns, 2(10).
Peng, C., Goswami, P., & Bai, G. (2020). A literature review of current technologies on health data integration for patient-centered health management. Health informatics journal, 26(3), 1926-1951.
Siontis, K. C., Noseworthy, P. A., Attia, Z. I., & Friedman, P. A. (2021). Artificial intelligenceenhanced electrocardiography in cardiovascular disease management. Nature Reviews Cardiology, 18(7), 465-478.
World Health Organization. (2021, June 11). Cardiovascular diseases (cvds). https://www.who.int/news-room/factsheets/detail/cardiovascular-diseases-(cvds)
Ye, J., Woods, D., Jordan, N., & Starren, J. (2024). The role of artificial intelligence for the application of integrating electronic health records and patientgenerated data in clinical decision support. AMIA Summits on Translational Science Proceedings, 2024, 459.