INTRODUCTION
Alzheimer’s disease (AD) is a global health crisis that affects millions of people around the world. This debilitating condition erodes the brain’s ability to comprehend, remember, and perform basic functions, ultimately leading to death (Nawaz et al., 2021). With projections indicating that the number of AD patients will increase from 50 million to 152 million by 2050, it is imperative that we take action now to address this growing health crisis (Maurer et al., 1997). The cost of treating AD is already staggering, with expected global expenses reaching nearly $186 billion in 2018 (Richards and Hendrie, 1999). Unfortunately, this number is only expected to increase in the coming years, putting an enormous burden on the healthcare system (Yiannopoulou and Papageorgiou, 2020). According to the established Clinical Dementia Rating result, the disorder is split into four stages: early mild cognitive impairment (EMCI), mild cognitive impairment (MCI), late mild cognitive impairment (LMCI), and AD (Morris et al., 2001; Yang et al., 2021). Early diagnosis of dementia disorders is crucial for patient recovery and reducing treatment expenses because the cost of treating patients with EMCI and LMCI is different (Nozadi and Kadoury, 2018; Sheng et al., 2021). Diagnosis of AD is best possible after “Alzheimer’s dementia,” since AD pathology which changes in patients could not be assessed early (DeTure and Dickson, 2019; Porsteinsson et al., 2021). The initial diagnostic criteria for AD were analyzed in 1984 and only relied on clinical symptoms (Alzheimer’s Association Report, 2018; Nasreddine et al., 2023). With the discovery of different biomarkers such as cerebrospinal fluid, magnetic resonance imaging (MRI), and positron emission tomography (PET) data, the international working group invented a new approach in 2014, and this served as the model for the National Institute on Aging and Alzheimer’s Association (NIA-AA) (Jack et al., 2018; Mankhong et al., 2022). Biomarker data are used to connect the clinical conditions of dementia or mild cognitive loss to intrinsic AD pathological changes with high, moderate, or low risk in the NIA-AA criteria (Jellinger et al., 1990; Scheltens et al., 2016; Jack et al., 2018). Imaging biomarkers are used to assess AD, such as computed tomography, functional magnetic resonance imaging (fMRI), MRI, and PET scans (Sheikh-Bahaei et al., 2017). The hippocampus and entorhinal cortex have shown extremely early changes in AD that are consistent with pathology, but it is still uncertain which structure would be best for an early diagnosis. The physiology of dementia and its differential diagnosis have benefited greatly from structural and functional imaging, which also holds considerable potential for tracking the course of the disease (O’Brien, 2007). Numerous reports have been documented to show the imaging methods that can be used to detect AD (Kim et al., 2022; Prasath and Sumathi, 2023). In volumetric MRI, patterns of sick and healthy subjects were identified using feature-based morphometry (Toews et al., 2010). Recently, computerized medical image processing such as convolution neural networks (CNNs) has achieved major advancements (Yamashita et al., 2018). As a result, various CNN models, including Visual Geometry Group (VGG), MobileNet, AlexNet, and ResNet, are available for object detection and segmentation. Despite the fact that CNNs are a renowned deep learning technique, their effectiveness is hampered by the absence of an extensive medical imaging dataset (Chan et al., 2020). Transfer learning is among the efficient methods for building a deep CNN without overfitting when the amount of data is minimal (Xiao et al., 2018). A pre-trained network is the foundation of transfer learning. The proposed method can learn the most useful features instead of training a specific CNN without preparation. To categorize AD into five classes, the proposed research study has used four pre-trained networks, comprising VGG16, ResNet, and DenseNet121. The main contribution of this research paper to detect and classify the AD stages is done in the following stages:
Identification of image dataset and ensuring the identified dataset is in the artificial neural network (ANN) format
Conversion of this image dataset into the jpeg format
Application of different normalization techniques on the dataset to remove ambiguities
Application of different data augmentation techniques on the normalized dataset
Ensemble of different deep learning approaches on normalized dataset to detect and diagnose AD stages
Finally, comparison of the efficiency of deep learning models, and it was found that VGG16 and DenseNet121 outperform ResNet 50 and other models
The research focuses on utilizing transfer learning and deep learning techniques, specifically VGG16, ResNet50, and DenseNet121, to detect and classify AD stages. By employing pre-trained networks, the study aims to overcome limitations in the dataset size and optimize classification accuracy. The identified contributions include dataset preparation, normalization, and augmentation, followed by the application of ensemble deep learning approaches for classification. The results highlight the superior performance of VGG16 and DenseNet121 compared to ResNet50 and other models, demonstrating their efficacy in AD stage detection. Through this research, significant advancements are made in AD diagnosis, addressing the pressing need for accurate and efficient classification methods in the field of medical imaging and neurology.
LITERATURE REVIEW
A literature review on the use of machine learning techniques in AD research shows a growing trend in the development of models that can assist in early diagnosis, predict disease progression, and improve the understanding of the underlying biological mechanisms of AD. One of the most common approaches in AD research is the use of MRI scans to study brain changes associated with the disease. CNNs have been used to classify and differentiate between healthy brains and those with AD based on MRI scans. Some of the promising research studies in detecting early signs of AD, which can help in early intervention and improve patient outcomes, are described as follows:
An automated framework was developed by Acharya et al. (2019) to evaluate whether a baseline brain scan will detect any evidence of AD. Wang and Liu (2019) integrated genomic data from six different brain areas using support vector machine (SVM) learning techniques to find AD biomarkers. Mahyoub et al. (2018) proposed that relying on characteristics including lifestyle, medical history, demography, and other considerations, AD is predicted at various stages. Rueda et al. (2014) suggested a fusion-based image processing technique that identifies discriminative brain patterns connected to the presence of neurodegenerative disorders. The effectiveness of classification using SVM was assessed on several datasets once the discriminative patterns had been identified. A classification approach based on multilayer brain divisions was presented by Li and Zhang (2016). Using SVM, histogram-based parameters from MRI data were used to categorize various brain levels.
Payan and Cruz’s (2015) contributions lie in their application of three-dimensional (3D) CNNs to predict AD using neuroimaging data. By leveraging advanced deep learning techniques, the study demonstrates the potential of CNNs in analyzing complex 3D brain images to aid in AD diagnosis. This pioneering work highlights the role of machine learning algorithms in identifying patterns indicative of AD pathology, offering promise for early detection and intervention strategies. Similarly, Liu et al. (2015) proposed a multimodal neuroimaging feature learning approach for multiclass diagnosis of AD. This method integrates information from multiple neuroimaging modalities to enhance diagnostic accuracy. Through comprehensive feature learning, the study advances the field by providing a robust framework for the classification of AD across multiple stages, thereby aiding in early detection and personalized treatment strategies. Researchers like Hosseini-Asl et al. (2015) developed a 3D deeply supervised adaptable CNN for AD diagnostics. This innovative approach harnesses the power of deep learning to analyze 3D MRI data, enabling more accurate and efficient detection of AD-related brain changes. By leveraging deep supervision techniques, the proposed network enhances feature representation and classification performance, advancing the capabilities of automated diagnostic systems for AD.
Giraldo et al. (2018) proposed an automated technique recently developed for identifying structural abnormalities in the thalamus, planum temporal, amygdala, and hippocampal areas. Nawaz et al. (2021) devised a framework based on the computer-aided system, which aids in real-time AD diagnosis. They have suggested identifying the stages of AD. For certain deep feature modeling and extraction, the researchers have used classification algorithms like K-nearest neighbor, random forest (RF), and SVM (Thanh Noi and Kappas, 2017; Sheth et al., 2022). Large datasets were necessary for classification and extracting deep features to avoid overfitting problems. To attain the maximum accuracy in early AD diagnosis, they have recommended on time the depth and propagation of learning techniques compared to previous approaches (Gupta et al., 2019). Still, currently, there is no treatment for AD using any medical reasoning/algorithm approaches or for detecting any stages of its complications (Zhao et al., 2023). Therefore, researchers working in the field of artificial intelligence thrive their interest to develop any suitable algorithm in AD-related areas. Two methods are utilized: conventional machine learning and deep learning. The conventional machine learning procedure contains supports, RF, linear regression, naïve Bayesian, ANNs, etc., while deep learning includes CNN, recursive neural networks, etc. (Zhao et al., 2023).
The main contribution of Sarraf et al. (2016) lies in the utilization of deep CNNs for AD classification based on MRI and fMRI data. By leveraging advanced neural network architectures, the study aims to enhance the accuracy of AD diagnosis, potentially enabling earlier detection and intervention. This approach demonstrates the potential of deep learning techniques in leveraging neuroimaging data for improved understanding and management of AD. Suk et al. (2014) contributed to AD and MCI diagnoses. Through hierarchical feature representation and multimodal fusion using deep learning techniques, the study enhances the accuracy and reliability of AD/MCI diagnosis. By leveraging deep learning algorithms to integrate diverse data sources, such as MRI and PET scans, the paper provides a comprehensive framework for improving early detection and understanding the underlying mechanisms of AD/MCI.
To attain high accuracy, Sørensen et al. (2018) proposed focusing on nonlinear SVM for the radial base purpose when developing a computerized machine learning approach for categorizing AD phases. Maqsood et al. (2019) developed a transfer learning approach to identify AD. They suggested breaking down the AD category into different divisions. Since AD is an incurable aliment, it is therefore an emerging topic for research globally. The contributions of researchers across the globe for the detection and diagnosis of this disorder are listed in Table 1.
Literature review for AD detection.
Dataset | Classification | Results | Reference |
---|---|---|---|
MIAS dataset | Binary | 95% | Chowdhary et al. (2020) |
Retinal photographs | Binary | 93% | Cheung et al. (2022) |
MNIST | Binary | 85% | Nagabushanam et al. (2022) |
ADNI | Binary | 96% | Amoroso et al. (2018) |
ADNI | Binary | 85% | Mirabnahrazam et al. (2022) |
ADNI | Binary | 88% | Hashemifar et al. (2022) |
ADNI | Multi | 96% | Ning et al. (2021) |
Abbreviations: ADNI, Alzheimer’s Disease Neuroimaging Initiative; MIAS, mammographic image analysis society; MNIST, modified national institute of standards and technology.
In addition to diagnosis and progression prediction, machine learning techniques have also been applied to understand the biological mechanisms underlying AD. This includes the analysis of genomic data, protein expression data, and other biological markers to identify potential drug targets and predict disease outcome.
The methodology employed in this study aligns with recent literature on the use of machine learning techniques in AD research. Similar to the reviewed studies, this research focuses on utilizing machine learning models, particularly CNNs and SVMs, to analyze MRI scans and classify different stages of AD. The study also acknowledges the importance of large datasets for classification accuracy and emphasizes the need for advanced techniques to mitigate overfitting issues (Mirabnahrazam et al., 2023).
Furthermore, like some of the referenced works, this study incorporates transfer learning approaches to enhance the classification of AD phases. Transfer learning has been increasingly recognized as a valuable technique in AD research, allowing models to leverage pre-trained features and adapt them to specific datasets. Additionally, the study emphasizes the importance of nonlinear SVMs for accurate categorization of AD phases, aligning with previous research that highlights the effectiveness of nonlinear approaches in complex classification tasks (Hashemifar et al., 2023).
This literature review highlights the potential of machine learning techniques in advancing the understanding and treatment of AD. While the field is still in its early stages, the results to date are promising, and continued research and development is necessary to fully realize the potential of these approaches.
TRANSFER LEARNING
A model created for one task is used as the basis for another using the machine learning technique known as transfer learning. Deep learning tasks in computer vision and natural language processing are built on pre-trained models. Compared to building neural network models from the beginning, they are both cheaper and faster, and they perform remarkably better on related tasks. Transfer learning is learning a new activity more effectively by applying what has already been learned about a related one (Olivas et al., 2010). For this approach to be practical, the features must be generic, i.e. applicable to both the base task and the target task (Yosinski et al., 2014; Han et al., 2019). CNNs, often known as ConvNet, are a subset of deep neural networks and are most frequently applied to the processing of medical images. The fundamental structure of a CNN is shown in Figure 1. Various pre-trained deep learning models with transfer learning approaches have been put out in the research. VGG16, ResNet 50, and DenseNet 121 were used in this study.

Basic CNN architecture for AD detection procedure. Abbreviations: AD, Alzheimer’s disease; CNN, convolutional neural network.
VGG16
A CNN with 16 layers is called VGG16. The ImageNet database contains a pre-trained kind of network that has been trained on more than a million images (Rayar, 2017). The pre-trained model can categorize images into 1000 distant object groups. The network has therefore acquired rich feature representations for a variety of images. Because of its pre-training on a large and diverse dataset, the VGG16 model possesses the capability to extract meaningful features from images, even if it has not been specifically trained on the target task. This property makes it well-suited for transfer learning, where the pre-trained model can be fine-tuned on a smaller dataset for a specific classification task.
One of the key characteristics of VGG16 is its depth, with 16 layers of trainable parameters. This depth allows the network to learn complex features and patterns from input images, making it particularly effective for image classification tasks (Simonyan and Zisserman, 2015).
ResNet 50
ResNet 50 is a variant of the ResNet model architecture, characterized by its structure consisting of 48 convolution layers, 1 MaxPool layer, and 1 average pool layer. This model is notable for its depth and efficiency, capable of performing approximately 3.8 × 109 floating-point operations. ResNet50 has gained widespread adoption due to its effectiveness in various computer vision tasks. The architecture has been extensively studied and evaluated, with detailed analyses conducted to understand its design principles and performance characteristics (Fuse et al., 2018).
DenseNet121
DenseNet121 belongs to a class of CNNs known as densely connected networks, where each layer is connected to every other layer in a feed-forward fashion. This architecture enables direct connections between all layers, totaling L (L + 1)/2 connections between “L” layers. Unlike traditional CNNs, DenseNet addresses the issue of vanishing gradients by restructuring the network architecture to facilitate streamlined connectivity between layers. This design innovation enhances gradient flow throughout the network, promoting effective feature propagation and mitigating degradation issues encountered in deeper architectures (Huang et al., 2016).
THE PROPOSED WORK AND ITS EXPERIMENTAL EVALUATION
The MRI images employed in this study are sourced from the AD Neuroimaging Initiative (ADNI) dataset, obtained from http://adni.loni.usc.edu/. ADNI is a collaborative research effort aimed at identifying biomarkers for the early detection and monitoring of AD progression. There are 3400 images in this dataset (680 images from each class), each measuring 224 × 224. The MRI images utilized in this study are sourced from the ADNI dataset, accessible at http://adni.loni.usc.edu/ (Cuingnet et al., 2011). ADNI is a longitudinal multicenter study designed to facilitate the identification of biomarkers for the early detection and tracking of AD progression. The dataset comprises MRI scans, along with other neuroimaging, clinical, cognitive, and genetic data from both AD patients and healthy control subjects. These MRI images provide valuable insights into the structural and functional changes in the brain associated with AD, enabling researchers to investigate disease mechanisms, develop diagnostic tools, and evaluate treatment efficacy. The research flow of the proposed work is shown as a flowchart in Figure 2.

The basic flowchart of the proposed work. Abbreviations: AD, Alzheimer’s disease; CN, control normal; EMCI, early mild cognitive impairment; LMCI, late mild cognitive impairment.
The images from each AD stage are selected and given as input to the specified models. The data are divided into training, validation, and testing data. The complete information regarding each stage is listed in Table 2.
The images given as inputs to the model.
AD stage | Total images in a dataset | |||
---|---|---|---|---|
Training data | Test data | Validation data | Total | |
NC | 500 | 90 | 90 | 680 |
EMCI | 500 | 90 | 90 | 680 |
MCI | 500 | 90 | 90 | 680 |
LMCI | 500 | 90 | 90 | 680 |
AD | 500 | 90 | 90 | 680 |
Abbreviations: AD, Alzheimer’s disease; EMCI, early mild cognitive impairment; LMCI, late mild cognitive impairment; MCI, mild cognitive impairment.
Data balancing
Data balancing is essential for the model to predict more accurately. Unbalanced data lead to overfitting and underfitting; thus, data need to be balanced. Herein, we use downsampling techniques to balance the data. Figure 3a and b shows the data before and after sampling.
Data augmentation
The size of the dataset is significant for deep learning models. These models predict more accurately and give better accuracy results on large datasets. The major drawback of image datasets is that they are not available in a large size. Therefore, it needs to be augmented to make the dataset larger for the models. We applied different data augmentation techniques to datasets, such as horizontal flipping of the images, rotation of images by 5°, and width and shift in the images. In this study, we applied data augmentation with the help of an image data generator of Keras API. Figure 4 shows the effect of data augmentation techniques on brain MRI images.
RESULT EVALUATION
The dataset used in this paper is divided into testing, training, and validating data. A total of 2900 images were used in this research: 2000 images for training (400 from each class), 450 for testing (90 from each category), and 450 for validating (90 from each type). We applied transfer learning by applying pre-trained CNN models such as DenseNet121 and VGG16 with ImageNet weights. For multiclass classification, we utilized RMSProp as our optimizer with a learning rate of 0.00001 and categorical cross-entropy as the loss metric, while maintaining accuracy metrics that will provide training and validation results as well as loss and accuracy values.
DenseNet121
DenseNet121 comprises 1 7 × 7 convolution, 58 3 × 3 convolution, 61 1 × 1 convolution, 4 AvgPool, and 1 fully connected layer. The performance of the classification models for a particular set of test data is assessed using a confusion matrix (Fig. 5).

Confusion matrix generated by the DenseNet121 model with an overall accuracy of 97.33%. Abbreviations: AD, Alzheimer’s disease; CN, control normal; EMCI, early mild cognitive impairment; LMCI, late mild cognitive impairment; MCI, mild cognitive impairment.
The basic architecture, confusion matrix with accuracy, and loss plot generated, respectively, by the DenseNet121 model are displayed in Figures 6–8.
VGG16
The VGG16 model, comprising 16 layers, is implemented on an input image with dimensions of 224 × 224, and converts it into 7 × 7 and five dense layer feature matrices as the output. The overall accuracy of the model is 96.0, which is shown in the confusion matrix in Figure 8. The loss and accuracy over 100 epochs are shown in Figure 9, and Table 3 describes the classification report generated by the VGG16 model.

Accuracy and loss plot generated by the VGG16 model over 100 epochs. Abbreviation: VGG, Visual Geometry Group.
Classification report generated by the VGG16 model.
Classification report | Precision | Recall | F1-score | Support |
---|---|---|---|---|
Final AD jpeg | 0.90 | 1.00 | 0.95 | 90 |
Final CN jpeg | 0.94 | 0.89 | 0.91 | 90 |
Final EMCI jpeg | 0.98 | 0.92 | 0.95 | 90 |
Final LMCI jpeg | 0.97 | 0.99 | 0.97 | 90 |
Final MCI jpeg | 0.98 | 0.96 | 0.97 | 90 |
Accuracy | 0.95 | 450 | ||
Macro average | 0.95 | 0.95 | 0.95 | 450 |
Weighted average | 0.95 | 0.95 | 0.95 | 450 |
Abbreviations: AD, Alzheimer’s disease; CN, control normal; EMCI, early mild cognitive impairment; LMCI, late mild cognitive impairment; MCI, mild cognitive impairment; VGG, Visual Geometry Group.
ResNet50
The input image size of 224 × 224 is converted to 7 × 7 by applying the RESNET50 model, which has 50 layers of coevolution, and the output feature matrix is five dense layers. The model’s accuracy is measured based on different parameters such as recall, score, and precision. The basic architecture, confusion matrix, accuracy and loss plots are shown in the Figures 10–12, respectively. Finally, the classification report generated by the model on the specified dataset is shown in Table 4.

Basic architecture of the ResNet50 mode for the detection of AD stages. Abbreviation: AD, Alzheimer’s disease.

Confusion matrix generated by the ResNet model with an accuracy of 62.22%. Abbreviations: AD, Alzheimer’s disease; CN, control normal; EMCI, early mild cognitive impairment; LMCI, late mild cognitive impairment; MCI, mild cognitive impairment.
The classification report generated by the ResNet50 model.
Classification report | Precision | Recall | F1-score | Support |
---|---|---|---|---|
Final AD jpeg | 0.77 | 0.74 | 0.76 | 90 |
Final CN jpeg | 0.52 | 0.64 | 0.57 | 90 |
Final EMCI jpeg | 0.86 | 0.47 | 0.60 | 90 |
Final LMCI jpeg | 0.49 | 1.00 | 0.66 | 90 |
Final MCI jpeg | 1.00 | 0.22 | 0.36 | 90 |
Accuracy | 0.62 | 450 | ||
Macro average | 0.73 | 0.62 | 0.59 | 450 |
Weighted average | 0.73 | 0.62 | 0.59 | 450 |
Abbreviations: AD, Alzheimer’s disease; CN, control normal; EMCI, early mild cognitive impairment; LMCI, late mild cognitive impairment; MCI, mild cognitive impairment.
DISCUSSION AND SIGNIFICANCE OF THE WORK
The proposed model evaluates the efficiency of models in different performance metrics, such as the confusion matrix, accuracy, loss, F1-score, precession, recall, receiver operating characteristic, and sensitivity. The general formulae to calculate different parameters are as follows:
The evaluation of the results in this study involved the utilization of a dataset divided into testing, training, and validating data, consisting of a total of 2900 images. Transfer learning was applied using pre-trained CNN models such as DenseNet121 and VGG16 with ImageNet weights. For multiclass classification, RMSProp was employed as the optimizer with a learning rate of 0.00001, and categorical cross-entropy was used as the loss metric, while accuracy metrics provided training and validation results as well as loss and accuracy values. The DenseNet121 model demonstrated an overall accuracy of 97.33%. The VGG16 model achieved an overall accuracy of 96.0%. The loss and accuracy over 100 epochs were displayed in the corresponding plots, and a classification report detailing precision, recall, F1-score, and support for each class was provided. Similarly, the ResNet50 model converted input images to 7 × 7 dimensions, with an accuracy of 62.22%. Overall, the results showcase the effectiveness of the employed models in accurately classifying AD stages based on MRI images, with each model demonstrating varying levels of accuracy and performance metrics. The performance analysis comparison of the applied models is shown in Figure 13.
CONCLUSIONS
The utilization of transfer learning in medical image analysis, particularly in the context of AD diagnosis, has shown significant promise in recent years. This study, employing ResNet50, VGG16, and DenseNet121 alongside CNNs, aimed to classify AD patients into multiple stages with notable success, achieving an accuracy of 96.6%. Despite these advancements, challenges persist, notably regarding the processing of large datasets and the cognitive workload for clinicians interpreting scans. Moreover, variations in image quality and resolution may lead to potential misinterpretations, underscoring the necessity for further advancements in imaging technology and analysis techniques to mitigate these issues. The study’s methodology involved employing pre-trained strategies to predict AD phases, yielding an impressive accuracy rate of 97.23%. The model developed using the ADNI data through the Keras API segmented MRI images into five categories: EMCI, MCI, LMCI, and AD. Through the examination of underfitting and overfitting problems, the study addressed key issues in model optimization, leading to enhanced performance. Notably, the proposed model, leveraging VGG16, DenseNet121, and RESNET50 networks, outperformed existing approaches significantly. Moving forward, the study suggests exploring the application of this model to other disorders utilizing similar data modalities, with a primary focus on enhancing classification results. Overall, this research underscores the potential of transfer learning in advancing AD diagnosis and highlights the ongoing need for innovation to address existing challenges in medical image analysis.