Transforming Healthcare Decisions in the U.S. Through Machine Learning

Authors
Affiliations

1 Department of Business Analytics , Trine University , Indiana, USA

2 Department of Public Health, California State University Long Beach, CA 90840, USA

3 Department of Nursing , 855 N Vermont Ave, Los Angeles, CA 9002, USA

4 Department of Education, Westcliff University, 400 Irvine, CA 92614, USA

5 Department of Special Education and Counseling, California State University , Long Beach, CA 90840, Los Angeles, USA

6 Department of Computer Science , Westcliff University, 400 Irvine, CA 92614, USA

7 Department of Psychology , St.Francis College, USA

8 Department of Business Administration , Westcliff University , 400 Irvine, CA 92614, USA

A B S T R A C T 


In the United States, early detection of diseases is critical to ensuring timely and effective treatment, as many conditions, if not diagnosed promptly, can become untreatable or even fatal. As a result, there is a growing reliance on advanced technologies to analyze complex medical data, reports, and images with both speed and precision. In many cases, subtle abnormalities in medical imaging may go unnoticed by the human eye, which is where machine learning (ML) has become indispensable. ML techniques are increasingly used in healthcare for data driven decision making, uncovering hidden patterns and anomalies that traditional methods might miss. Although developing such algorithms is complex, the greater challenge lies in optimizing them for higher accuracy while reducing processing time. Over the years, the integration of ML into biomedical research has significantly advanced the field, paving the way for innovations like precision medicine, which customizes treatments based on a patient’s genetic profile. Today, machine learning supports nearly every stage of delivery, from extracting critical information from electronic health records to diagnosing diseases through medical image analysis. Its role extends to patient management, resource optimization, and treatment development. Particularly, deep learning, powered by modern high-performance computing, has shown remarkable accuracy and reliability in these applications. It is now evident that in the U.S. healthcare system, computational biology and clinical decision making are deeply intertwined with machine learning, making it a core component of artificial intelligence in medicine. In this paper, the aim is to explore the current applications, challenges, and potential of machine learning in supporting healthcare decision-making in the United States, with a focus on diagnosis, medical imaging, and personalized treatment strategies. 

DOI: https://doi.org/10.103/xxx @ 2025 Journal of Advances in Medical Sciences and Artificial Healthcare Intelligence (JAMSAI), C5K Research Publication


1. Introduction

Artificial Intelligence consists of a wide range of methods and technologies including machine learning, machine reasoning, and robotics. Among these, machine learning has gained the most attention in the United States healthcare sector due to its extensive applicability in solving complex medical challenges (Alanazi, 2022). This review places emphasis on machine learning, which is being applied through various algorithms to support healthcare systems in clinical decision making. The use of machine learning in clinical contexts is considered revolutionary, as it enables systems to analyze vast amounts of medical data and generate informed recommendations for improving or maintaining patient health.

When applied in healthcare, machine learning systems collect and interpret a range of patient-related data, such as clinical records, medical images, and genetic profiles (Shehab & others, 2022). These systems reason through the information to suggest potential actions that can lead to better health outcomes. Initially, machine learning models are not highly accurate or efficient. However, through repetitive exposure to similar tasks and the accumulation of data, the models gradually improve their accuracy and reliability. This process of learning from data allows the systems to adapt and perform more effectively over time.

Clinical decision-making supported by machine learning can follow two main approaches. The first is the intuitive or rapid method, which relies on pattern recognition and is often used in emergency medical situations (Jayatilake & Ganegoda, 2021). While this approach enables quick responses, it carries a higher risk of error and may overlook important details. The second approach is more deliberate and analytical, requiring time and intellectual resources. Although slower, it produces more accurate and comprehensive outcomes. Both methods benefit significantly from machine learning, which enhances the precision and speed of decision-making by processing and interpreting large and complex datasets (Sanchez-Martinez & others, 2022).

Healthcare data in the United States is increasingly heterogeneous, coming from sources such as electronic health records, medical imaging systems, wearable devices, and realtime monitoring technologies (Babarinde et al., 2023). As the volume and complexity of this data grow, the need for advanced computational tools becomes critical. Machine learning provides solutions that efficiently manage and analyze such data, facilitating improvements in diagnostic accuracy, patient care, and overall healthcare delivery.

The applications of machine learning in healthcare extend far beyond disease diagnosis and prediction. In the United States, these technologies support critical activities such as patient management, treatment research, hospital resource allocation, public health planning, and policymaking. The COVID-19 pandemic highlighted the urgent need for intelligent systems capable of handling diverse healthcare tasks under time constraints (Debnath & others, 2020). During this period, machine learning proved valuable in supporting rapid testing, treatment planning, and outbreak forecasting. This has led to a growing interest in the field of emergency machine learning, which aims to develop models that respond effectively to healthcare crises.

Despite its many advantages, the use of artificial intelligence in healthcare brings forward ethical considerations. These include concerns about the transparency and accountability of decisions made by algorithms, the risk of biased outcomes, and the shifting roles of healthcare professionals. In the United States, such concerns have resulted in regulatory measures that restrict the autonomous use of machine learning for final clinical decisions (Lysaght et al., 2019). Instead, these tools are employed as decision support systems that assist healthcare providers without replacing human judgment. This cautious approach ensures that the use of technology remains ethical and aligned with clinical standards.

Artificial intelligence systems in healthcare are capable of performing predictive analysis by filtering, organizing, and identifying patterns in large datasets (Ahmed et al., 2020). These datasets are often drawn from multiple sources and require sophisticated models to produce accurate and timely insights. While these systems are not permitted to make final decisions independently in most jurisdictions, they play a critical role in supporting clinicians through enhanced diagnostic capabilities and treatment recommendations.

This review aims to examine the role of machine learning in transforming computational decision-making in healthcare. The discussion begins with the initial introduction of machine learning in computational biology and follows its evolution to the present day, where it plays a central role in the development of precision medicine. In precision medicine, treatments are tailored based on a patient’s genetic information, lifestyle, and environmental factors, marking a significant shift from the traditional one-size-fits-all approach to healthcare. The upcoming sections of this paper will explore various machine learning techniques currently used in the United States healthcare system. Topics will include disease prediction and detection, medical imaging, biomedicine applications, biomedical event extraction, polypharmacology, and drug repurposing using systems biology. These discussions will highlight how different machine learning models contribute to clinical efficiency and the delivery of patient-centered care.

The discussion section will present a comparison of various machine learning algorithms based on their performance in healthcare applications. Factors such as prediction accuracy, processing time, and scalability will be evaluated. Special attention will be given to methods used for improving model performance and the ability of these models to scale across large healthcare systems. Scalable machine learning algorithms are particularly important for widespread implementation in hospitals, clinics, and research institutions across the United States.

The concluding section will summarize the findings and emphasize the growing dependence of modern healthcare on machine learning technologies. As the demand for more accurate, efficient, and personalized care continues to rise, machine learning will remain an essential part of healthcare innovation. From early diagnosis and individualized treatment to public health planning and crisis management, machine learning is shaping the future of healthcare by delivering datadriven solutions that improve outcomes and optimize resources.

2. Machine Learning Approach

Machine learning is a scientific field that focuses on enabling computers to learn from data and continuously enhance their performance over time. It is rooted in probability and statistics but often proves more powerful than traditional statistical methods, particularly in decision-making. The inputs provided  to machine learning algorithms, known as features, play a critical role in determining the accuracy of predictions. The effectiveness of a model is highly dependent on the quality of these features. Therefore, one of the primary responsibilities of a machine learning developer is to identify a subset of features that best support the algorithm’s purpose, which can significantly improve accuracy (Calamuneri et al., 2017). This task is complex and typically requires ongoing experimentation to refine the selection of relevant features.

Applying machine learning involves three essential stages: training, testing, and validation. The training phase is crucial because the quality of the training data directly influences model performance. During testing, the algorithm’s effectiveness is assessed, with attention given to minimizing bias and maximizing variance to ensure generalizability. An optimal model balances this bias-variance trade-off effectively. Finally, the model is evaluated using a validation dataset to verify its real-world applicability. Understanding different machine learning approaches and key algorithms commonly used for classification and clustering is essential for anyone entering the field.

2.1 Supervised Learning

Supervised learning involves using a labeled dataset where the input data is associated with known outcomes. This approach is primarily divided into two tasks: classification and regression. Classification methods assign input data to specific categories, while regression deals with predicting continuous output values. The performance of classification models is often evaluated using accuracy metrics, whereas regression models are typically assessed using root mean square error (Deo, 2015).

Supervised learning aims to build predictive models based on historical data, enabling the system to forecast known outcomes. These tasks are often ones that a trained human expert can perform, but supervised models can process much larger datasets and identify hidden relationships more efficiently (Bharat et al., 2018). In healthcare and biomedical applications, supervised learning is frequently used for risk estimation and to uncover associations not immediately evident to clinicians (Gu & others, 2023).

2.1.1 K-Nearest Neighbor (KNN)

KNN is a widely used supervised classification algorithm applied in various domains including pattern recognition and anomaly detection [13]. Its straightforward implementation and strong performance make it popular, though it can be computationally expensive. Both training and test data must be stored, leading to high memory usage. To classify a new data point, the algorithm identifies the most similar instances in the dataset using a distance metric—commonly the Euclidean distance—and assigns a label based on the majority class (mode) or average (mean) among the nearest neighbors.

2.1.2 Support Vector Machine (SVM)

SVM is a powerful supervised learning algorithm primarily used for classification but also capable of handling regression tasks. In SVM, data points are plotted in an n-dimensional space, and the algorithm identifies the optimal hyperplane that eparates the classes with the maximum margin (Chauhan et al., 2019; Jabin et al., 2024). One of the strengths of SVM is its ability to map input features into higher dimensions using kernel functions, which enables it to solve non-linear classification problems. While SVM generally delivers high accuracy, it is more effective with smaller datasets. Performance may degrade in the presence of noisy data or large datasets due to increased computational complexity.

2.1.3 Decision Trees (DTs)

Decision trees operate using a tree-like structure in which each internal node represents a decision based on an attribute, each branch represents the outcome of that decision, and each leaf node corresponds to a class label (Mishra et al., 2019). This model is intuitive and easy to interpret, making it suitable for simple problems and small datasets. However, decision trees are prone to overfitting and can produce biased results when dealing with imbalanced data. Despite these limitations, they are effective for modeling both linear and non-linear relationships.

2.1.4 Classification and Regression Trees (CARTs)

CART is a predictive modeling technique that uses a binary tree structure to make decisions. Each node in the tree represents an input feature and a threshold that splits the data, while the leaves contain the predicted outcomes (Charbuty & Abdulazeez, 2021). CART models are versatile and can be used for both classification and regression tasks. They work by recursively dividing the dataset based on feature values that maximize information gain or minimize error.

2.1.5 Logistic Regression (LR)

Logistic regression is a statistical modeling technique widely used in machine learning, especially in epidemiology and binary classification problems (Nusinovici & others, 2020). It uses a logistic function to model the probability of a binary outcome. The model consists of two main components: a linear component that calculates the weighted sum of inputs, and a link function that maps this sum to a probability value. The goal is to find the optimal coefficients by minimizing a cost function, which measures the difference between predicted and actual outcomes.

2.1.6 Random Forest Algorithm (RFA)

Random Forest is a popular ensemble learning method that can handle both classification and regression tasks (Ao et al., 2019). It builds multiple decision trees during training and uses a voting or averaging mechanism to make final predictions. This algorithm employs the bagging technique to reduce variance and improve accuracy. Random Forest is known for its robustness against overfitting, ability to handle noisy data, and effectiveness with imbalanced datasets. It is widely used in bioinformatics and healthcare analytics for its reliability and accuracy.

2.1.7 Naive Bayes (NB)

Naive Bayes is a probabilistic classifier based on Bayes’ theorem, commonly used for binary and multiclass classification (Farid et al., 2014). Despite its simplicity, it  performs well in many real-world situations, particularly when the assumption of feature independence is valid. In this method, the probability of each class is computed given the input features, and the input is assigned to the class with the highest probability. Naive Bayes is computationally efficient and wellsuited for high-dimensional datasets such as text classification and genetic data analysis.

2.1.8 Artificial Neural Network (ANN)

Artificial Neural Networks are inspired by the structure of biological neurons and are particularly effective in complex pattern recognition tasks such as image classification (Toraman et al., 2020). ANNs consist of three types of layers: input, hidden, and output. Each neuron in a layer is connected to every neuron in the next layer. The learning process involves adjusting weights through iterative training to minimize the error between predicted and actual outcomes. Key components include the error function, which evaluates model performance, the search function, which explores possible improvements, and the update function, which adjusts the network accordingly. Increasing the number of hidden layers results in deeper networks, which can model more complex relationships.

2.2 Unsupervised Learning

Unsupervised learning is applied in situations where the data involved cannot be clearly labeled due to a lack of prior knowledge about the system. In such cases, machine learning algorithms autonomously identify similarities and differences among data points. This learning method does not rely on labeled datasets for training. Instead, it discovers existing patterns in the data and groups similar items accordingly. The central aim of unsupervised learning is to reveal natural patterns or groupings in the data without predefined classifications (Tyagi et al., 2022).

A major application of unsupervised learning is in precision medicine, where patients may be grouped based on genetic traits, environmental factors, or medical history. Through this process, patterns and relationships that were previously unnoticed can be revealed. Common algorithms used in unsupervised learning include k-means, mean shift, affinity propagation, DBSCAN (density-based spatial clustering of applications with noise), Gaussian mixture models, Markov random fields, ISODATA (iterative self-organizing data analysis technique), and fuzzy C-means.

Clustering is a key method in unsupervised learning (Azimpour et al., 2020). It involves dividing data into groups, or clusters, based on shared features, though the cluster memberships are not known beforehand (Tejasree & Agilandeeswari, 2024). Clustering techniques can be classified into different categories based on their methodology: partitioning, hierarchical, gridbased, density-based, and model-based. They can also process various data types, including numerical, discrete, and mixed types. An ideal clustering algorithm should offer fast processing, robustness to noise and outliers, low sensitivity to input order, and the ability to work with diverse data structures.

In biomedical applications, clustering algorithms are especially valuable due to the vast size and complexity of biological datasets (Houssein et al., 2023). These algorithms help in automatically analyzing large datasets, saving time and improving efficiency. A reliable clustering method should satisfy several criteria: scalability with large datasets, robustness against outliers, consistency regardless of data order, minimal user-specified parameters, ability to handle mixed data types, detection of irregularly shaped clusters, and stability in the presence of duplicate entries. In healthcare, these algorithms can simultaneously analyze various types of data, allowing the system to detect multiple health conditions in a patient during the diagnosis process, thus streamlining decision-making and enhancing patient outcomes.

2.2.1 Partition Clustering

Partition clustering divides data objects into multiple groups based on dissimilarities. This technique is particularly useful when the number of desired clusters is known in advance, such as in small gene expression datasets. However, a limitation is the need for users to specify the number of clusters manually. Despite this, partition-based methods are frequently applied in bioinformatics. Notable algorithms in this category include fuzzy k-means, COOLCAT, CLARA (clustering large applications), and CLARANS (clustering large applications based on randomized search) (Kocheturov et al., 2019).

2.2.2 Graph-Based Clustering

Graph-based clustering methods are commonly used in analyzing biological networks or interactomes. These techniques help predict complex relationships and sequence networks by treating data points as nodes in a graph. While powerful, these algorithms can be slow and sensitive to parameter settings defined by the user. Examples include superparamagnetic clustering (SPC), the Markov cluster algorithm (MCL), molecular complex detection (MCODE), and restricted neighborhood search clustering (RNSC) (Z. Zhang et al., 2018).

2.2.3 Hierarchical Clustering

Hierarchical clustering organizes data into a tree structure of nodes, where each node represents a cluster. Parent nodes can have multiple child nodes, and each data point can be traced through this tree. This method is widely used in bioinformatics due to its flexibility in analyzing data at various levels of detail (Nielsen, 2016). However, it has drawbacks such as a slower processing speed and irreversible errors when incorrect merges occur. These limitations can result in the loss of significant local patterns. Applications of hierarchical clustering include protein sequence family classification and gene similarity mapping. Prominent examples are Chameleon, ROCK (robust clustering using links), LIMBO (scalable information bottleneck), and spectral clustering.

2.2.4 Density-Based Clustering

Density-based clustering identifies groups in the data by locating dense regions separated by areas of lower density. It is particularly useful in bioinformatics for identifying tightly connected subspaces within interactome networks. This approach excels at detecting clusters of arbitrary shapes and offers time efficiency. While some density-based algorithms require user-defined parameters, they do not necessitate a predefined number of clusters. Common algorithms in this category include OPTICS (ordering points to identify the clustering structure), CLIQUE (clustering in quest), DENCLUE (density-based clustering), and CACTUS (clustering categorical data using summaries) (Bhattacharjee & Mitra, 2020).

2.2.5 Model-Based Clustering

Model-based clustering assumes that the data follows a certain statistical model. The structure of this model, which can be specified and modified by the user, determines how the algorithm identifies clusters (Bouveyron & Brunet-Saumard, 2014). This technique is used in bioinformatics for incorporating existing knowledge into the analysis of gene expression, sequence data, and interactomes. Although effective, it can be computationally intensive when applied to large datasets. Moreover, inaccurate assumptions about the model may lead to misleading results. Representative algorithms in this category include SVM-based clustering, COBWEB, and AutoClass. 

2.3 Semi-supervised Learning

Semi-supervised learning operates using a combination of labeled and unlabeled data. In this approach, only a portion of the training dataset contains known outcomes, while the remaining data lacks explicit labels. This technique is particularly useful when it is impractical to label an entire dataset but possible to make reliable predictions using partially labeled data. Semi-supervised learning combines the strengths of both supervised and unsupervised learning, making it a powerful tool for scenarios where data labeling is costly or time-consuming. It has become increasingly relevant in fields like healthcare, where vast amounts of data are often available, but not all of it is annotated.

2.4 Evolutionary Learning

Evolutionary learning draws inspiration from natural selection and is widely used in the biological sciences. This approach is employed to understand the behavior and survival patterns of biological organisms. It helps in predicting outcomes such as adaptability and fitness levels. By simulating evolutionary processes, the algorithm iteratively improves its predictions, making it suitable for problems where the objective is to find optimal solutions under changing conditions.

2.5 Active Learning

Active learning is a strategy where the model is trained with a limited number of labeled instances and selectively queries an external source—such as a human expert or a database—to label the most informative data points. This approach enables the algorithm to improve efficiently by focusing on uncertain or ambiguous samples. It is especially advantageous in situations with limited resources or labeling budgets. The dynamic interaction between the model and the information source allows it to refine its understanding and improve accuracy with minimal labeled data. Active learning is considered a modern, cost-effective machine learning method that supports intelligent decision making.

2.6 Deep Learning

Deep learning represents an advanced stage of machine learning and is centered around neural networks with multiple layers. These deep neural networks are capable of learning complex patterns and relationships in large and varied datasets. Deep learning models are highly effective in tasks such as image recognition, natural language processing, and biomedical data interpretation (X. Wang et al., 2020).They can generalize across different types of problems and provide accurate predictions, even when dealing with high-dimensional or unstructured data. This flexibility makes deep learning particularly valuable in healthcare and biomedical research, where data complexity is often a significant challenge.

2.7 Reinforcement Learning

Reinforcement learning is a unique machine learning paradigm where the model learns through trial and error in an interactive environment. The algorithm receives feedback in the form of rewards or penalties based on its actions and uses this feedback to improve future performance. This iterative learning process allows the system to autonomously adapt to dynamic conditions. Reinforcement learning has applications in areas such as robotics, personalized medicine, and adaptive treatment strategies, where continuous learning from interactions is essential.

After outlining these various machine learning approaches, it is helpful to explore how they are practically applied in the field of biomedicine. For instance, in neuroscience, machine learning classifiers are utilized to examine both the functional and structural aspects of the brain. In oncology, algorithms are used for cancer detection and prognosis, with support vector machines (SVMs) playing a role in diagnosing prostate cancer. Hierarchical clustering methods have been employed to study Alzheimer’s disease, while artificial neural networks (ANN) have been applied to classify subtypes of psychogenic nonepileptic seizures (Vasta & others, 2018). These examples illustrate the significant impact machine learning has on advancing biomedical research and clinical applications. With an understanding of the various machine learning techniques and their corresponding algorithms, the next step is to delve deeper into their practical uses in computational biology and medicine.

3. Machine Learning in Disease Prediction and Detection

Machine learning techniques have been widely adopted in the early detection and diagnosis of various diseases, as early intervention often results in simpler treatment protocols and significantly improves patient outcomes. However, the effectiveness of these algorithms varies based on multiple factors including the type of algorithm used, the quality of input features, and the training dataset. This section highlights specific diseases where machine learning has been successfully applied, emphasizing the importance of early diagnosis, the techniques utilized, and the feature sets involved. A detailed comparative analysis of these approaches is provided later in the discussion section.

3.1 Cancer

Cancer typically begins with abnormal cell behavior where signals controlling cell growth and division become faulty, leading to uncontrolled multiplication and tumor formation. Among non-invasive techniques, thermography has emerged as a promising diagnostic tool due to its safety and effectiveness. From thermal images, machine learning algorithms, supported by feature extraction techniques such as SIFT and SURF, can detect potential cancerous growths. These features can be refined using Principal Component Analysis (PCA) to improve interpretability and accuracy.

3.1.1 Breast Cancer

Breast cancer is one of the most prevalent cancers among women and remains a leading cause of mortality. Early diagnosis through MRI, mammography, ultrasound, or biopsy significantly increases the chances of successful treatment. However, distinguishing between benign and malignant tumors can be challenging, which is where machine learning proves invaluable. These systems can autonomously improve over time, enhancing diagnostic accuracy.

The process typically involves three stages: preprocessing, feature extraction, and classification. Features such as image smoothness, coarseness, depth, and regularity are extracted using segmentation. While converting images to binary helps isolate information, some critical features are lost, prompting the use of grayscale formats instead. Discrete Wavelet Transformation (DWT) is employed to transform images into frequency domains, generating approximation and detail coefficient matrices used for classification by machine learning algorithms.

3.1.2 Lung Cancer

Lung cancer originates in the respiratory system and can spread to other organs if undetected. Risk factors include tobacco use, air pollution, and underlying respiratory conditions. Early symptoms are often absent, making early diagnosis difficult and increasing the danger.

Computed Tomography (CT) imaging provides clearer images than MRI or X-rays and is commonly used in diagnostics. Image preprocessing includes grayscale conversion, noise reduction using median filters, and segmentation to isolate relevant areas. Key features like area, perimeter, and eccentricity are then extracted.

Detecting Small-Cell Lung Cancer (SCLC) is particularly difficult due to its visual similarity to healthy tissues. Deep learning methods, particularly Convolutional Neural Networks (CNN), offer a solution but require large training datasets. The Entropy Degradation Method (EDM) addresses this issue by converting histograms into scores and using logistic functions for classification. Although the accuracy of this approach is promising, it can be further enhanced with larger datasets and deeper network structures.

3.1.3 Acute Lymphoblastic Leukemia (ALL)

ALL is a rapidly progressing blood cancer characterized by the accumulation of immature lymphocytes, which hinders the production of healthy blood cells. It can be fatal within weeks if untreated. Symptoms include fatigue, pale skin, fever, joint pain, and swollen lymph nodes.

Several machine learning models have been used for detection, including KNN, SVM, Naive Bayes, RBFN, and MLP. The workflow usually involves preprocessing, feature extraction, model training, and performance evaluation. Cropping highlights, the region of interest, and Gaussian blur is used to enhance image quality. Features used for classification include color, geometry, texture, image moments, and local binary patterns.

3.2 Diabetes

Diabetes is a chronic condition caused by elevated blood glucose levels and can significantly affect quality of life if not diagnosed early. It is categorized into Type 1, Type 2, and gestational diabetes.

Discriminant Analysis (DA) is often used to classify diabetes by deriving equations based on input features such as blood pressure, glucose levels, insulin ratio, skinfold thickness, age, and more. Machine learning models like GNB, Logistic Regression, KNN, CART, RFA, and SVM have shown potential in predicting Type 2 diabetes using data from electronic medical records (EMRs).

Neural networks, particularly feed-forward models trained via backpropagation, have demonstrated higher accuracy. Key input features include number of pregnancies, insulin levels, BMI, and plasma glucose levels. Deep neural networks (DNNs), trained with five-fold and ten-fold cross-validation, have achieved up to 97% accuracy in diabetes prediction.

3.3 Heart Diseases

Heart diseases are critical health conditions often caused by blocked coronary arteries. Risk factors include high blood pressure, smoking, lack of exercise, and age. Symptoms may include fatigue, breathlessness, and swollen extremities.

Precision medicine in cardiology addresses diagnostics and therapeutic planning. It supports personalized interventions based on individual characteristics, including genomics and gender-specific differences. Technologies such as patient monitoring and clinical decision support systems (CDSSs) benefit from machine learning integration.

Blood and genetic tests are essential in identifying heart disease, especially ischemic conditions. Precision cardiology focuses on areas such as cardiac genetics and cardiac oncology. Machine learning methods like CNN, RNN, NLP, SVM, and LSTM are used to enhance CDSS capabilities.

Heart disease prediction involves preprocessing data, selecting features, validating models, and evaluating classifier performance. Preprocessing techniques include handling missing data and normalizing input. Feature selection using Relief, mRMR, and LASSO improves model efficiency and accuracy (Silva & Ramos, 2025).

3.4 Chronic Kidney Disease (CKD)

CKD gradually impairs kidney function, potentially leading to failure. Although diagnosis typically involves lab tests, imaging, and biopsy, these methods can be invasive and costly. Machine learning offers a non-invasive alternative (Liu et al., 2020).

While SVM is widely used in many medical applications, research on its use in CKD is limited. Instead, ANN, Decision Trees, and Logistic Regression are commonly applied. Among these, ANN has shown superior diagnostic performance for CKD detection.

3.5 Parkinson’s Disease (PD)

PD is a neurodegenerative disorder that affects movement and coordination due to reduced dopamine production. Symptoms include tremors, stiffness, and postural instability. There is no known cure, and treatment options are limited.

Machine learning has been applied to video analysis and voice recordings to distinguish between PD patients and healthy individuals (Saikia et al., 2020). Feature selection methods like PCA and Genetic Algorithms (GA) are used. GA, inspired by natural selection, evaluates potential solutions through mutation and crossover processes to find optimal features. PCA helps reduce dimensionality and extract meaningful patterns from complex datasets.

3.6 Dermatological Diseases

Dermatological conditions are diverse and often difficult to diagnose due to limited expertise. Early detection is crucial for conditions like eczema, herpes, melanoma, and psoriasis.

A common approach involves three phases: data collection and augmentation, model training, and image analysis (Giger, 2018). Data augmentation techniques include SMOTE and various image preprocessing methods like greyscaling, sharpening, noise reduction, and contrast adjustment. A welltrained CNN model can address overfitting and improve prediction.

In the final stage, features from the last CNN layer are passed to an SVM classifier. For this to work, the SVM must first be trained using these CNN-generated features, which it converts into vectors for classification.

4. Machine Learning in Medical Imaging

Medical imaging has emerged as one of the fastest-growing domains in biomedical research due to its pivotal role in diagnosing a wide range of diseases. The integration of machine learning with medical imaging has further accelerated advancements in this field by enabling the automated extraction and classification of critical features from images. The typical workflow involves segmenting the input image to focus on regions of interest, applying feature extraction techniques to highlight relevant patterns, eliminating noise, and finally classifying the features to make diagnostic predictions.

In today’s medical landscape, accurately analyzing large volumes of imaging data is crucial for disease diagnosis and treatment planning. Machine learning has been extensively applied across various biomedical tasks, including classifying patient data based on attributes, examining medical records, detecting diseases, generating treatment recommendations, and enhancing diagnostic accuracy through image analysis (Vayadande et al., 2024).

Medical imaging has also significantly improved surgical planning, helping physicians tailor procedures to individual patient conditions. For example, the complexity of skull-base surgery has traditionally posed significant challenges due to anatomical variability. However, the adoption of endonasal endoscopic techniques allows surgeons to visualize the skull base and associated neurovascular structures through the nasal cavity, minimizing brain displacement and offering enhanced visibility with multi-angled, close-up views.

Magnetic resonance imaging (MRI) is particularly effective in planning treatments for conditions such as rectal cancer, as it provides detailed visualization of tumor extent and essential prognostic data. This information guides the development of personalized therapeutic strategies for each patient. Imageguided surgeries, in comparison to conventional methods, offer benefits such as reduced invasiveness, more precise targeting, and improved outcomes. These surgeries rely heavily on imaging for pre-operative planning, intra-operative navigation, and post-operative assessment. In neurosurgery, where precision is paramount, medical imaging ensures accurate localization of targets and supports instrument guidance to minimize damage to surrounding tissues. Modalities such as MRI, CT, ultrasound, PET, SPECT, and fluoroscopy are routinely used to support procedures like biopsies, tumor resections, epilepsy surgery, and vascular interventions (Folorunso et al., 2020). Advances in 3D imaging have also facilitated virtual modeling of anatomical structures, enhancing both understanding and execution of surgical procedures. Technologies such as 3D ultrasound and fetal MRI are increasingly being adopted in clinical practice.

Machine learning plays a transformative role in medical imaging by uncovering subtle patterns not easily perceived by human observers. Unsupervised learning techniques like clustering allow for analysis of large imaging datasets to support surgical decisions. These algorithms can help identify missed anomalies or validate the appropriateness of chosen surgical approaches.

Due to the complexity of anatomical structures, standard mathematical modeling often falls short in medical image interpretation. Machine learning overcomes this by applying pixel-level analysis, which does not require conventional feature extraction or segmentation. This enables effective information retrieval even from low-contrast images, although the approach requires extended training time due to high data dimensionality. Techniques such as histogram equalization (HE) are frequently used to enhance contrast, and its variants have further improved algorithmic performance.

Common machine learning methods applied in image analysis include linear discriminant analysis (LDA), support vector machines (SVM), and decision trees (DT). Additionally, texture-based descriptors like local binary patterns and the  application of neural networks provide powerful tools for interpreting biological images. These techniques are also embedded in medical expert systems to support clinical decision-making.

Convolutional Neural Networks (CNNs) are particularly effective for image-based tasks. With multiple convolutional layers, CNNs can transform input images into meaningful features. Image classification involves identifying overall patterns associated with diseases, while object classification focuses on smaller, specific regions within medical images. Deep learning techniques such as CNNs enable disease pattern recognition, categorization, and quantification at an advanced level.

One of the key advantages of computers is their ability to perform complex tasks consistently and efficiently. In recent years, machine learning has demonstrated a remarkable capacity to tackle problems once deemed too intricate for automation. These systems can even detect patterns beyond human perception.

In the context of medical imaging, a few core terms are essential to understand. "Classification" refers to labeling pixels or regions with specific categories. A "model" comprises the decision-making rules learned during training, while an "algorithm" outlines the steps used to create the model. "Labeled data" provide examples for learning specific classes. The "training set" is used to teach the algorithm, while a "validation set" helps evaluate its performance. In neural networks, a "node" is a unit combining inputs with an activation function, and a "layer" consists of multiple interconnected nodes. "Segmentation" divides an image into regions of interest. "Overfitting" occurs when a model is overly tailored to the training data and fails to generalize. "Features" are numerical values that represent characteristics of an image, such as pixel intensity, edge strength, or regional variance. Effective feature selection is essential for building accurate models.

Image recognition and time-series classification in biomedical applications often involve nonlinear classification challenges. Traditional algorithms and feature extraction methods struggle with highly nonlinear data. Deep neural networks (DNNs), however, overcome this by adding multiple layers and neurons, allowing for complex function approximation. Ensemble learning is another powerful approach in which multiple classifiers are combined to form a more robust decision model. In both SVM and ensemble learning, nonlinear functions are formed using combinations of kernel methods. Ensemble strategies typically fall into four categories: modifying training samples (e.g., bagging, boosting, cross-validation), altering input features (e.g., random subspace, feature decimation), adjusting class labels (e.g., output coding, label switching), and introducing randomness into learning algorithms (e.g., backpropagation with randomized weights). The goal is to create diverse classifiers that, when combined, deliver superior overall performance.

Machine learning continues to revolutionize medical imaging, enabling more accurate diagnoses, efficient surgical planning, and data-driven treatment strategies that were once only possible through human expertise.

5. Machine Learning in Biomedicine

Gene expression datasets play a vital role in biomedical research as they capture dynamic changes in gene activity over time. These datasets typically consist of numerical matrices, representing the increasing or decreasing expression levels of specific genes across various time points or tissue samples. Similarly, in protein-protein interaction (PPI) networks, nodes denote biomolecules and edges signify interactions between them. When applying clustering algorithms in such domains, minimizing user-defined parameters is critical, as inaccurate input values can reduce the effectiveness and accuracy of the model.

Machine learning has become an essential component of modern biomedical science, offering powerful tools to address complex challenges. In biomedicine, its applications are diverse, ranging from predicting protein structure and function based on genetic sequences to identifying optimal dietary plans tailored to an individual’s clinical profile and microbiome. Moreover, machine learning is instrumental in analyzing realtime, high-resolution physiological data used in various medical applications. There are three primary areas where machine learning significantly contributes to biomedicine. First, it enhances prognostic models. Traditional prognosis tools often rely on a limited number of manually entered variables, whereas machine learning models can extract and analyze thousands of features directly from electronic health records (EHRs), improving predictive accuracy. Second, it streamlines the workload of radiologists and pathologists by automating image analysis. Machine learning algorithms can interpret medical images with remarkable precision—often surpassing human capabilities—while operating continuously without fatigue. Third, it improves diagnostic accuracy by reducing human error. However, a significant challenge in this area is that the output is not always binary, which complicates algorithm training and requires complex, structured data preprocessing to handle the often-unstructured format of EHRs.

While raw data alone holds little value, machine learning algorithms can interpret, analyze, and extract actionable insights from it. This capability has made machine learning tools indispensable in clinical practice. Many traditional computer-based systems in medicine are expert systems that apply a predefined set of rules to clinical scenarios, resembling the approach used by medical students learning through general principles. In contrast, machine learning does not rely on handcrafted rules. Instead, it learns patterns and associations directly from patient-level data, processing vast numbers of variables to find predictive combinations. A key strength of machine learning lies in its ability to manage thousands of predictors— sometimes even more than the number of observations—and synthesize them in nonlinear, interactive ways to make accurate predictions. For reliable evaluation, machine learning models must be tested on truly independent validation datasets drawn from different populations and timeframes, ensuring no overlap with the training data. Failure to use independent datasets may lead to overfitting and poor generalization. High-quality, highvolume datasets are essential for optimal algorithm performance. However, biased datasets can compromise both accuracy and applicability. It's important to note that machine learning cannot resolve fundamental challenges related to causal inference in observational studies. While predictive accuracy may be high, these models often reflect correlations rather than causative relationships.  

 Many current studies in medical machine learning focus on binary outcomes, such as whether a patient has a particular disease. Some also assess disease staging, but most are diseasespecific and do not address multiple conditions simultaneously. To overcome this limitation, a technique known as Ensemble Label Power-set Pruned Dataset Joint Decomposition (ELPPJD) has been proposed. ELPPJD improves on the Label Power-set (LP) method, which accounts for label correlations but suffers from increased time complexity and imbalanced class distributions as label sets grow (Folorunso et al., 2020; Yoganathan et al., 2023). ELPPJD addresses these issues by dividing the training dataset into disjoint subsets and using similarity thresholds to cluster similar labels. This approach employs two main strategies for subset partitioning: Size Balanced (SB) and Label Similarity (LS).

Alternative multilabel classification methods include Random k-label sets (RAKEL) and Hierarchy of Multilabel Classifiers (HOMER). RAKEL operates on the MEKA framework and utilizes the C4.5 algorithm, while HOMER is based on the MULAN platform and employs Random Forest (RF) classifiers (Zhou & Zhong, 2015). Among the two, RAKEL generally performs better. However, ELPPJD—particularly when using the LS partitioning strategy—has demonstrated superior performance compared to both RAKEL and HOMER, making it a promising method for handling multilabel biomedical classification tasks.

6. Machine Learning in Biomedical Event Extraction

The relationships between diseases and drugs, diseases and genes, drug-drug interactions, and protein-protein interactions represent highly complex biological events. Accurately and efficiently extracting such events from scientific literature is essential for advancing biomedical research. Due to the exponential growth of unstructured and semi-structured data in biomedical publications, text mining techniques have become increasingly important (Lee et al., 2016).

While pattern-based methods have traditionally been used for extracting relationships in biomedical texts, they are not widely applied in event extraction due to their limitations. Biomedical event extraction systems are generally categorized into two main types: rule-based systems and machine learning-based systems. In the machine learning approach, event extraction is typically formulated as a classification problem. However, one of the primary challenges in this domain is dealing with highly imbalanced datasets, which can skew model performance. Support Vector Machines (SVMs) help address this issue through class weighting strategies that balance the training process (Tao & others, 2019). Machine learning-based event extraction models are further divided into three architectures. The first is the pipeline model, which has achieved promising results but suffers from error propagation, as each step is dependent on the output of the previous one. The second is the joint model, which resolves this issue by processing all tasks simultaneously, although it demands significantly more computational resources. The third is the pairwise model, which combines features of both pipeline and joint models. It offers improved speed compared to the joint model and higher accuracy than the pipeline model, effectively managing multiclass and multi-label classification without being hindered by data imbalance.

A comprehensive machine learning system designed to extract biomedical events from imbalanced datasets typically begins with text preprocessing. This step involves analyzing tokenlevel, sentence-level, syntactic dependency, and external resource features. Following preprocessing, a sample selection phase identifies frequent sequential patterns within the text to capture recurring biological events. These patterns are essential for detecting multi-argument events. The final output is generated through a joint scoring mechanism, where tools like sentence2vec, based on convolutional deep structured semantic models (C-DSSMs), compute semantic relevance scores to enhance interpretation (Javaid et al., 2022).

SVMs are commonly used to divide biomedical event extraction into multiple classification tasks. These tasks include identifying trigger words that signal events and detecting the associated entities—such as genes and proteins—that participate in these events. While supervised learning models perform well in these tasks, they often struggle with sparse or limited training data, particularly when dealing with rare or previously unseen features.

To address these limitations, researchers have turned to semisupervised and unsupervised learning methods. Large volumes of unlabeled data, such as those found in repositories like PubMed, offer valuable information that can supplement labeled datasets. A strategy known as event feature coupling generalization has been proposed to bridge the gap, allowing features derived from labeled data to be enhanced by incorporating insights from unlabeled data. This hybrid approach improves performance and mitigates the issues caused by sparse training features, enabling more robust event extraction systems.

In addition to event extraction, understanding protein function is another critical area in biomedical research. Proteins are the end products of gene expression and play a central role in biological systems. Despite the existence of extensive protein sequence databases, many proteins remain functionally uncharacterized due to the slow pace of experimental annotation. This discrepancy highlights the need for computational approaches capable of analyzing vast protein datasets with minimal labeled data. Machine learning provides an effective solution for this challenge. By analyzing protein sequences, machine learning models can infer structural, functional, and evolutionary characteristics. Protein classification aims to identify these traits with precision, and the use of machine learning has become indispensable in mining functional information from large-scale protein databases. These algorithms have significantly accelerated the ability of researchers to classify proteins, thereby enhancing the understanding of gene functions at the proteome level.

7. Machine Learning Approaches to Poly pharmacology 

Poly pharmacology is a growing field in biomedical research that focuses on developing treatments capable of interacting with multiple molecular targets rather than a single receptor. Drug efficacy and toxicity are shaped by a range of complex interactions involving pharmacodynamic and pharmacokinetic properties, as well as genetic, epigenetic, and environmental influences—even when the drug is designed to act on one or several specific targets. To effectively analyze and predict drug responses both in vitro and in vivo, advanced computational tools, particularly those based on machine learning, are essential (Kabir & Muth, 2022).

Accurately identifying drug-target interactions on a proteomewide scale is a fundamental step in understanding and predicting drug responses. Beyond genetic and epigenetic variations, the cellular environment—including intercellular communication and variability between individual cells—must be factored into predictive models. Systems biology approaches provide a comprehensive framework for mapping the interactions among these components, enabling more informed drug discovery efforts.

New computational strategies are needed to model proteinligand binding events more precisely, particularly by calculating free-energy landscapes associated with the association and dissociation of molecular complexes (Q. Zhang et al., 2022). These calculations support the investigation of both high- and low-affinity binding events across the entire proteome. Self-organizing maps, a form of unsupervised machine learning, are commonly used to cluster drug compounds. Combining the structural features of receptors with chemical fingerprints of ligands allows the development of machine learning models capable of predicting drug-target interactions. Four computational areas are especially crucial for advancing polypharmacology: large-scale drug-target interaction prediction, quantitative modeling of protein-ligand interactions, integrated analysis of biological networks, and the dynamic simulation of network behavior, including stoichiometry and kinetics.

Next-generation sequencing (NGS) encompasses both DNA and RNA sequencing technologies. It works by breaking genetic material into smaller fragments and determining the order of nucleotide bases within each segment (M. Wang, 2021). NGS can be categorized into whole-genome sequencing (WGS), whole-exome sequencing (WES) of coding regions, and targeted sequencing of specific genes linked to disease. In clinical practice, WGS and WES are increasingly used for diagnosing complex neurodevelopmental disorders such as autism, epilepsy, and intellectual disabilities.

Several commercially available NGS platforms employ different techniques to generate sequencing data, and ongoing technological improvements have reduced error rates considerably. Despite these advancements, Sanger sequencing remains the gold standard for validating genetic variants due to its superior accuracy

When machine learning techniques are applied to clinical data, they enable the creation of prediction models that can assist in various aspects of medical practice—from early warning systems to advanced imaging diagnostics that rival expert human performance. These models generate predictions based on patterns in existing data. However, a well-known cautionary example is the failure of Google Flu, which illustrated the pitfalls of using limited historical data for time-series forecasting.

Research in clinical decision support systems has shown that relying solely on large volumes of historical data does not necessarily improve prediction accuracy. In many cases, more accurate results are obtained by focusing on the most recent year of data rather than attempting to model long-term trends. The primary goal in evaluating prediction models is not to replicate past outcomes, but to forecast future events with precision.

While machine learning can outperform traditional regression techniques by uncovering nonlinear and complex relationships in the data, there are fundamental limitations. Even the most powerful algorithms cannot extract information that is not present in the dataset. Consequently, the predictive power of these models is restricted when they rely solely on clinical data. Integrating additional, relevant data streams can improve prediction performance, but there are inherent constraints. Small discrepancies or rounding errors—often considered negligible—can accumulate over time and significantly distort long-term predictions. This highlights the unpredictability of complex systems and the challenges associated with forecasting in medicine, even with advanced computational tools.

8. Machine Learning for Drug Repurposing Using System Biology

More than 90% of drugs that enter the early phases of clinical trials ultimately fail, primarily due to adverse reactions, undesirable side effects, or insufficient efficacy (Jain et al., 2023). To address these challenges, drug repurposing has emerged as a promising alternative. Drug repurposing strategies can be categorized as either drug-centric or diseasecentric. A drug that exhibits a strong negative correlation with a disease—meaning it counteracts the disease's gene expression profile—is often considered a viable candidate for repurposing.

One of the earliest efforts in this area was the Connectivity Map project, which aimed to establish functional links between drugs and between drugs and diseases. Systems biology plays a vital role in this context by analyzing how drugs influence complex biological systems, including gene interactions and cellular pathways (Živanović & Filipović, 2024). In these models, drugs are ranked based on the extent to which they perturb disease-associated genes.

A commonly used framework in drug repurposing is the DrugDisease Network (DDN), which integrates information about disease-related genes, drug targets, signaling pathways, and gene-gene interactions (Conte & others, 2020). The DDN maps out all known interactions relevant to a particular disease as defined by sources like the Kyoto Encyclopedia of Genes and Genomes (KEGG). To determine whether a drug could be repurposed for a specific disease, a repurposing score is calculated using the Pearson correlation coefficient between the gene perturbation signatures of the drug and the disease. This coefficient ranges from -1 to 1. A high positive score indicates similar biological effects, while a high negative score suggests the drug may effectively counteract the disease, making it a strong candidate for treatment.

This framework illustrates how machine learning supports decision making throughout the continuum of patient care. From the moment a patient is diagnosed, each step—whether it involves identifying the disease, uncovering comorbid conditions, or selecting appropriate treatments—is guided by machine learning models that help clinicians make timely and accurate decisions. These tools assist in disease prediction, diagnosis, clinical decision support, and evaluating drug efficacy and compatibility.

Even after a patient recovers, machine learning continues to play a role in preventive healthcare. By analyzing electronic health records (EHRs), machine learning algorithms can identify potential future health risks, allowing for early intervention. In this way, machine learning not only enhances diagnosis and treatment but also contributes to long-term patient monitoring and preventive care strategies.

9. Discussion

 This review has explored the evolving role of machine learning in healthcare, focusing on its applications in disease detection, medical imaging, drug repurposing, and precision medicine. Among the core observations, the performance of a machine learning algorithm is primarily judged by its classification accuracy and log loss—higher accuracy and lower log loss indicate a more effective model. However, algorithm performance also depends heavily on the dataset, feature selection, preprocessing, and hardware capabilities, making algorithm selection an iterative and context-specific process.

In clustering tasks, especially when dealing with biomedical data that is often high-dimensional and nonspherical, userdefined parameters like the number of clusters or starting points can significantly impact outcomes. Automatic Density Clustering with Multiple Kernels (ADCMK) addresses this by automatically determining kernel weights, cutoff distances, and centroids, leading to more consistent clustering results. This method is particularly beneficial for unsupervised learning where label data is unavailable.

Algorithm effectiveness varies across medical domains. For example, artificial neural networks (ANNs) have shown superior performance in diagnosing kidney disease, while support vector machines (SVMs) perform well in lung cancer detection and staging. In breast cancer prediction, deep neural networks (DNNs) have outperformed ANNs, SVMs, and knearest neighbors (KNN). Meanwhile, logistic regression is less suitable for complex disease modeling due to its simplicity, and KNN struggles with large-scale cancer datasets.

Convolutional Neural Networks (CNNs) are highly effective in extracting features from both structured and unstructured medical data. CNN-based unimodal and multimodal disease risk prediction models (CNN-UDRP and CNN-MDRP) further enhance prediction accuracy by representing test results using word embeddings from natural language processing (NLP) . Medical imaging is another area where machine learning has significantly improved diagnostic processes. CNNs with adaptive sliding window fusion provide robust, high-accuracy classification, especially for tumor detection. Deep learning combines unsupervised pretraining with supervised fine-tuning, enabling better learning from data with minimal labels. For biomedical time series, where traditional CNNs may fall short, multi-channel CNNs offer improved performance. The integration of 3D printing with medical imaging has revolutionized surgical planning and prosthetic design. By converting medical image data into physical models, clinicians can perform complex surgeries with higher precision. Advanced devices such as bionic eyes, antibacterial teeth, and hyperelastic bones have been successfully developed using 3D printing. These innovations, summarized in Table 1, highlight how machine learning and biomedical data analytics are contributing to personalized medicine and bio-prosthetic advancements. To effectively analyze high-dimensional biomedical data, techniques such as t-distributed stochastic neighbor embedding (t-SNE) and ADCMK have shown promising results in visualizing and clustering unlabeled data. In protein classification and biomedical event extraction, limited and imbalanced data present significant challenges. Semi-supervised learning techniques like the transductive SVM (TSVM) and expectation-maximization models have been introduced to enhance performance by leveraging unlabelled datasets.

In polypharmacology, the integration of machine learning must be approached cautiously. Even identical genetic and environmental conditions can lead to unpredictable outcomes, highlighting the limitations of purely data-driven models. Although machine learning excels at prediction, it often lacks interpretability and does not inherently provide causal explanations. Nevertheless, it supports clinicians in resource allocation and decision-making by highlighting trends and risk factors more efficiently than manual review.

Improving the efficiency and accuracy of machine learning models involves multiple strategies. Principal Component Analysis (PCA) and Genetic Algorithms (GA) have proven effective for feature selection, improving metrics such as positive predictive value, negative predictive value, sensitivity, and specificity. Ensemble learning methods—such as bagging, boosting, and majority voting—combine multiple weak classifiers into a strong classifier, enhancing performance through collaborative decision-making. Feature selection is essential for reducing model complexity and preventing overfitting. Methods like forward feature selection, backward elimination, and recursive feature elimination fall into filter, wrapper, and embedded categories, respectively. Selecting non-redundant features improves both computational efficiency and model accuracy.

Deep learning, particularly DNNs with more than 20 layers, is now used in tasks ranging from image recognition to genotypic and phenotypic classification. CNNs amplify important image features through convolution and pooling layers, eliminating the need for manual feature extraction. Activation functions like ReLU, sigmoid, and softmax play key roles in forming nonlinear layers that improve learning in deep architectures. Table 2 lists various open-source libraries across languages that support machine learning development, with Python being the most widely adopted . As shown in Figure 1, the adoption of machine learning frameworks varies by programming language, with Python and C++ dominating deep learning research. As computational power and algorithms improve, the applications of machine learning in biology and medicine continue to on their genetic, environmental, and lifestyle data, relies heavily on machine learning. Analyzing vast biomedical datasets, extracting knowledge from unstructured records, and identifying patterns are tasks best handled through unsupervised and semi-supervised learning. Given that over 80% of healthcare decisions are now data-driven, the integration of machine learning in computational biology and medicine is critical.

Ultimately, machine learning serves as a decision support tool—whether it's disease detection, risk prediction, treatment planning, or drug repurposing. It supports clinicians by offering data-backed insights, enabling faster and more confident decisions. As healthcare increasingly adopts these technologies, ensuring accuracy, interpretability, and ethical use will remain paramount to maintaining trust and improving patient outcomes.

Table 1. Roles of Deep Learning Techniques in Computational Biology 

Table 2. ML Libraries Categorized by Programming Language 

Figure 1. Analysis of Commonly Used Deep Learning Frameworks


10. Conclusions

As this paper comes to a close, it is evident that machine learning, a key part of artificial intelligence, has greatly influenced the field of computational biology and made a significant impact on the healthcare system in the United States. Machine learning has enabled faster, more accurate, efficient, and affordable decision making in a variety of applications. It plays a vital role in disease diagnosis and prediction, medical imaging, drug repurposing, biomedical event analysis, and more. Over the years, the integration of machine learning into healthcare has reached an advanced stage, now contributing to personalized treatment strategies through precision medicine. In the United States, one of the most striking examples of this progress was during the COVID-19 pandemic, where machine learning tools supported patient care, treatment research, hospital resource management, and planning for future healthcare demands. These developments clearly show that artificial intelligence has become a foundational element in healthcare decision making  and is now deeply embedded in the country’s medical systems.

References

Ahmed, Z., Mohamed, K., Zeeshan, S., & Dong, X. (2020). Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database, 2020, baaa010.

Alanazi, A. (2022). Using machine learning for healthcare challenges and opportunities. Informatics in Medicine Unlocked, 30, 100924. https://doi.org/10.1016/j.imu.2022.100924

Ao, Y., Li, H., Zhu, L., Ali, S., & Yang, Z. (2019). The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. Journal of Petroleum Science and Engineering, 174, 776–789. https://doi.org/10.1016/j.petrol.2018.11.067

Azimpour, P., Etemadfard, H., & others. (2020). Hyperspectral image clustering with Albedo recovery Fuzzy C-Means. International Journal of Remote Sensing, 41(16), 6117– 6134. https://doi.org/10.1080/01431161.2020.1736728

Babarinde, A. O., Ayo-Farai, O., Maduka, C. P., Okongwu, C. C., & Sodamade, O. (2023). Data analytics in public health, A USA perspective: A review. World Journal of Advanced Research and Reviews, 20(3), 211–224.

Bharat, A., Pooja, N., & Reddy, R. A. (2018). Using machine learning algorithms for breast cancer risk prediction and diagnosis. 2018 3rd International Conference on Circuits, Control, Communication and Computing (I4C), 1–4. https://doi.org/10.1109/CIMCA.2018.8739696

Bhattacharjee, P., & Mitra, P. (2020). A survey of density based clustering algorithms. Frontiers of Computer Science, 15(1), 151308. https://doi.org/10.1007/s11704-019- 9059-3

Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis, 71, 52–78. https://doi.org/10.1016/j.csda.2012.12.008

Calamuneri, A., Donato, L., Scimone, C., Costa, A., D’Angelo, R., & Sidoti, A. (2017). On machine learning in biomedicine. Life Safety and Security, 5(12), 96–99.

Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20–28.

Chauhan, V. K., Dahiya, K., & Sharma, A. (2019). Problem formulations and solvers in linear SVM: a review. Artificial Intelligence Review, 52(2), 803–855.

Conte, F., & others. (2020). A paradigm shift in medicine: A comprehensive review of network-based approaches. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, 1863(6), 194416. https://doi.org/10.1016/j.bbagrm.2019.194416

Debnath, S., & others. (2020). Machine learning to assist clinical decision-making during the COVID-19 pandemic. Bioelectronic Medicine, 6, 1–8. https://doi.org/10.1186/s42234-020-00050-8

 Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920–1930.

Farid, D. M., Zhang, L., Rahman, C. M., Hossain, M. A., & Strachan, R. (2014). Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Systems with Applications, 41(4), 1937–1946. https://doi.org/10.1016/j.eswa.2013.08.089

Folorunso, S. O., Fashoto, S. G., Olaomi, J., & Fashoto, O. Y. (2020). A multi-label learning model for psychotic diseases in Nigeria. Informatics in Medicine Unlocked, 19, 100326. https://doi.org/10.1016/j.imu.2020.100326

 Giger, M. L. (2018). Machine Learning in Medical Imaging. Journal of the American College of Radiology, 15(3, Part B), 512–520. https://doi.org/10.1016/j.jacr.2017.12.028

Gu, X., & others. (2023). Beyond supervised learning for pervasive healthcare. IEEE Reviews in Biomedical Engineering, 17, 42–62. https://doi.org/10.1109/RBME.2023.3296938

Houssein, E. H., Hosney, M. E., Emam, M. M., Younis, E. M. G., Ali, A. A., & Mohamed, W. M. (2023). Soft computing techniques for biomedical data analysis: open issues and challenges. Artificial Intelligence Review, 56(2), 2599–2649. https://doi.org/10.1007/s10462-023- 10585-2

Jabin, J. A., Khondoker, M. T. H., Sobuz, M. H. R., & Aditto, F. S. (2024). High-temperature effect on the mechanical behavior of recycled fiber-reinforced concrete containing volcanic pumice powder: An experimental assessment combined with machine learning (ML)-based prediction. Construction and Building Materials, 418, 135362. https://doi.org/10.1016/j.conbuildmat.2024.135362

Jain, R., Janakiraman, S., & Rathore, A. S. (2023). A review of therapeutic failures in late-stage clinical trials. Expert Opinion on Pharmacotherapy, 24(3), 389–399. https://doi.org/10.1080/14656566.2022.2161366

Javaid, M., Haleem, A., Singh, R. P., Suman, R., & Rab, S. (2022). Significance of machine learning in healthcare: Features, pillars and applications. International Journal of Intelligent Networks, 3, 58–73. https://doi.org/10.1016/j.ijin.2022.05.002

Jayatilake, S. M. D. A. C., & Ganegoda, G. U. (2021). Involvement of machine learning tools in healthcare decision making. Journal of Healthcare Engineering, 2021(1), 6679512.

Kabir, A., & Muth, A. (2022). Polypharmacology: The science of multi-targeting molecules. Pharmacological Research, 176, 106055. https://doi.org/10.1016/j.phrs.2021.106055 

Kocheturov, A., Pardalos, P. M., & Karakitsiou, A. (2019). Massive datasets and machine learning for computational biomedicine: trends and challenges. Annals of Operations Research, 276(1), 5–34. https://doi.org/10.1007/s10479-018-2891-2

Lee, J., Kim, H., Kim, N. -r., & Lee, J.-H. (2016). An approach for multi-label classification by directed acyclic graph with label correlation maximization. Information Sciences, 351, 101–114. https://doi.org/10.1016/j.ins.2016.02.037

Liu, K.-Z., Tian, G., Ko, A. C. T., Geissler, M., Brassard, D., & Veres, T. (2020). Detection of renal biomarkers in chronic kidney disease using microfluidics: progress, challenges and opportunities. Biomedical Microdevices, 22(2), 29. https://doi.org/10.1007/s10544-020-00484-6

Lysaght, T., Lim, H. Y., Xafis, V., & Ngiam, K. Y. (2019). AIassisted decision-making in healthcare: the application of an ethics framework for big data in health and research. Asian Bioethics Review, 11, 299–314. https://doi.org/10.1007/s41649-019-00096-0

Mishra, V., Singh, Y., & Rath, S. K. (2019). Breast cancer detection from thermograms using feature extraction and machine learning techniques. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), 1–5. https://doi.org/10.1109/I2CT45611.2019.9033713

Nielsen, F. (2016). Hierarchical Clustering. In Introduction to HPC with MPI for Data Science (pp. 195–211). Springer International Publishing.

Nusinovici, S., & others. (2020). Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of Clinical Epidemiology, 122, 56–69. https://doi.org/10.1016/j.jclinepi.2020.03.002

Saikia, A., Hussain, M., Barua, A. R., & Paul, S. (2020). An insight into Parkinson’s disease: researches and its complexities. In S. Paul & D. Bhatia (Eds.), Smart Healthcare for Disease Diagnosis and Prevention (pp. 59–80). Academic Press.

Sanchez-Martinez, S., & others. (2022). Machine learning for clinical decision-making: challenges and opportunities in cardiovascular imaging. Frontiers in Cardiovascular Medicine, 8, 765693. https://doi.org/10.3389/fcvm.2021.765693

Shehab, M., & others. (2022). Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine, 145, 105458. https://doi.org/10.1016/j.compbiomed.2022.105458

Silva, L., & Ramos, J. (2025). Hybrid Approach to Voice-Based Classification of Parkinson’s Disease. In P. Novais & others (Eds.), Ambient Intelligence – Software and Applications – 15th International Symposium on Ambient Intelligence (pp. 189–199). Springer Nature Switzerland.

Tao, X., & others. (2019). Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Information Sciences, 487, 31–56. https://doi.org/10.1016/j.ins.2019.02.062

Tejasree, G., & Agilandeeswari, L. (2024). An extensive review of hyperspectral image classification and prediction: techniques and challenges. Multimedia Tools and Applications, 83(34), 80941–81038. https://doi.org/10.1007/s11042-024-18562-9

Živanović, M. N., & Filipović, N. (2024). System Biology Modeling for Drug Optimization. In N. Filipović (Ed.), In Silico Clinical Trials for Cardiovascular Disease: A Finite Element and Machine Learning Approach (pp. 105–137). Springer Nature Switzerland.

Toraman, S., Alakus, T. B., & Turkoglu, I. (2020). Convolutional CapsNet: A novel artificial neural network approach to detect COVID-19 disease from X-ray images using capsule networks. Chaos, Solitons & Fractals, 140, 110122. https://doi.org/10.1016/j.chaos.2020.110122

Tyagi, K., Rane, C., Sriram, R., & Manry, M. (2022). Chapter 3 – Unsupervised learning. In R. Pandey, S. K. Khatri, N. K. Singh, & P. Verma (Eds.), Artificial Intelligence and Machine Learning for EDGE Computing (pp. 33–52). Academic Press.

Vasta, R., & others. (2018). The application of artificial intelligence to understand the pathophysiological basis of psychogenic nonepileptic seizures. Epilepsy & Behavior, 87, 167–172. https://doi.org/10.1016/j.yebeh.2018.09.008

Vayadande, K., Bhosle, A. A., Pawar, R. G., Joshi, D. J., Bailke, P. A., & Lohade, O. (2024). Innovative approaches for skin disease identification in machine learning: A comprehensive study. Oral Oncology Reports, 10, 100365. https://doi.org/10.1016/j.oor.2024.100365

Wang, M. (2021). Next-Generation Sequencing (NGS). In S. Pan & J. Tang (Eds.), Clinical Molecular Diagnostics (pp. 305–327). Springer Singapore.

Wang, X., Zhao, Y., & Pourpanah, F. (2020). Recent advances in deep learning. International Journal of Machine Learning and Cybernetics, 11(4), 747–750. https://doi.org/10.1007/s13042-020-01096-5

Yoganathan, K., Malek, N., Torzillo, E., Paranathala, M., & Greene, J. (2023). Neurological update: structural and functional imaging in epilepsy surgery. Journal of Neurology, 270(5), 2798–2808. https://doi.org/10.1007/s00415-023-11619-z

Zhang, Q., Nannan, Z., Xiaoxiao, M., Fansen, Y., Xiaojun, Y., & Liu, H. (2022). The prediction of protein–ligand unbinding for modern drug discovery. Expert Opinion on Drug Discovery, 17(2), 191–205. https://doi.org/10.1080/17460441.2022.2002298

Zhang, Z., Song, J., Tang, J., Xu, X., & Guo, F. (2018). Detecting complexes from edge-weighted PPI networks via genes expression analysis. BMC Systems Biology, 12(4), 40. https://doi.org/10.1186/s12918-018-0565-y

Zhou, D., & Zhong, D. (2015). A semi-supervised learning framework for biomedical event extraction based on hidden topics. Artificial Intelligence in Medicine, 64(1), 51–58. https://doi.org/10.1016/j.artmed.2015.03.004 

©Copyright 2024 C5K All rights reserved.