INTRODUCTION
The global population is aging at an accelerated rate, which has led to an unprecedented rise in the prevalence of neurodegenerative disorders. Dementia is one of the most challenging conditions to manage. One complex syndrome that includes a wide range of disorders is dementia, which is defined as a noticeable reduction in cognitive function with aging ( Chouliaras and O’Brien, 2023). Alzheimer’s, vascular, Lewy body, and frontotemporal dementia are among the most common ( McKeith et al., 2005). Dementia is considered as a disability. Although the main feature of dementia is cognitive decline, functional impairments are also frequently present, which affect a person’s capacity to perform daily tasks on their own. Memory, speech, problem-solving, and self-care issues are a few examples of these impairments. As dementia worsens, patients may need more care and support from their caregivers in order to continue living a fulfilling life and being involved in the community. Figure 1 shows a typical dementia and normal brain images. Dementia can therefore considerably limit a person’s capacity to participate in activities of daily living, which is why it is considered as a disability. A lot of governments and organizations acknowledge dementia as a crippling illness and offer resources and support services to those who are impacted.
Dementia is characterized by its subtle course, a relentless downward spiral in cognitive function that significantly impairs many aspects of a person’s mental clarity ( McKeith et al., 2005). Memory, reasoning, language ability, and the ability to perform everyday tasks are among the areas where this decline is evident ( McDade and Hachinski, 2009). A lot of challenges are faced by caretakers, healthcare systems, and society at large as a result of the complicated and complex interactions among different conditions caused by dementia and these have a very strong impact on the quality of life for the demented community ( Alves et al., 2012). Looking at the seriousness of dementia is really important, as more people grow older and the condition is quite complex ( Nichols et al., 2019). Dementia touches every part of an affected person’s life, messing with a lot more than just memory or thinking. As it gets worse, even the simple, everyday stuff gets tough to do. This creates a ripple effect that not only shakes up the life of the person with dementia but also the lives of their friends and family. Dementia can effect communication adversely, as speaking using right words at the right time becomes difficult ( Forlenza et al., 2013).
As dementia progresses, it becomes even more clear how important it is to recognize and differentiate between its various forms as soon as possible. Differentiating between the various forms that dementia can take is a crucial step toward individualized treatment plans, predicting the course of the illness, and providing affected individuals and their families with specialized support. Amidst the complexities of dementia, understanding the minute variations among its various manifestations is essential to guarantee that the therapeutic approaches are not only successful but also mindful of the unique challenges each type poses. It is an undeniable fact that conventional diagnostic methods are important; however, these often face difficulties because of their restricted ability to identify the subtle differences in the pathophysiology of dementia. Diagnosis of dementia, usually carried out using highly complex neuroimaging techniques. One of the most promising and frequently used such imaging is the use of brain magnetic resonance imaging (MRI). The finer points of these cutting-edge technologies are revealed, allowing for a deeper exploration of how various forms of dementia alter the structure and function of the brain. They provide more accurate diagnostic and classification insights. Neuroimaging is made even more powerful and significantly improved by using brain MRI.
RELATED STUDIES
Dementia is an increasingly common neurodegenerative disease that negatively impacts several important cognitive functions. It is a major global health concern that affects many people worldwide ( Forlenza et al., 2013). Prompt and accurate identification of dementia is critical because it is the starting point for interventions to be started at the right time. This improves the prognosis for individuals impacted by this complex disease and facilitates its efficient treatment ( Buccellato et al., 2023). Unfortunately, conventional diagnostic methods like neuroimaging analysis and cognitive function assessments are difficult to use; they require specialized knowledge to perform correctly and are often expensive, labor-intensive, and dependent on subjective judgment ( McKhann et al., 2011a; Banerjee et al., 2020). Clinical assessments are very important in the diagnosis of dementia. The clinical assessments come in a variety of procedures ranging from cognitive testing, detailed clinical history, patient interviews, activity monitoring, and interviewing family members. However, these approaches can differ among evaluators and require a significant amount of time to administer and interpret accurately ( Scheltens et al., 2002). There is no doubt about the utility of these conventional approaches; however, these approaches can vary significantly between different evaluators due to the subjectivity of their opinion and expertise. Apart from this variation, these approaches also require a considerable amount of time both to administer effectively and to interpret the results correctly ( Petersen et al., 2001). Conventional dementia diagnosis is also supported by computerized systems. Accuracy of dementia classification has been enhanced in recent years by enabling medical professionals with specialized computer systems. These specialized computer systems are commonly known as computer-aided diagnosis (CAD) systems. CAD systems may use medical images like computed tomography and MRI scans to help medical professionals with dementia diagnosis ( Reisberg et al., 2000; Mendez et al., 2007).
Computer-aided dementia diagnosis systems are used in two main ways, either completely by themselves without any human help or alongside doctors in the conventional diagnosis setup. Although they are helpful, these systems also pose some problems and difficulties. It is really important to remember that human experts play a big role in making these computer systems for diagnosing dementia. Most of these systems pull out important details from different medical images using special computer methods and math rules ( Reisberg et al., 2000). However, relying too much on these rules and methods can hold back these computer systems. It makes them less able to change and work in different situations, and it makes them not as good at their job. This could be a big problem, especially when trying to deal with something complicated like dementia ( Reisberg et al., 2000).
Artificial intelligence (AI) is playing a pivotal role in changing the landscape of medical diagnosis. In recent years AI has been dominated by deep learning-based solutions for almost every domain of life including health sciences. In the area of automated dementia diagnosis, deep learning-based approaches are the latest in trend. Its capacity to autonomously discriminate complex and layered features from extensive datasets stands out as its distinction ( Fareed et al., 2022). Among deep learning models, it is especially convolutional neural networks (CNNs) that have shown remarkable performance in feature extraction from medical images for dementia classification. These CNN-based models have proven their supremacy over traditional hand-crafted feature based methods in the domain of diagnosis ( Fareed et al., 2022; Bucholc et al., 2023; Li et al., 2023). Such progress has paced up the accuracy of automated dementia classification. Recent success in the area of dementia diagnosis using deep learning-based approaches has provided new hope in this domain.
In AI-supported dementia diagnosis, systems are usually developed for either binary classification, i.e. to classify patients on the basis of their input data (clinical data, medical images, etc.) as demented or non-demented ( Bansal et al., 2020; Tufail et al., 2020; Triapthi et al., 2021; Bharati et al., 2022; AlShboul et al., 2023; Javeed et al., 2023). This type of classification is very significant as an initial screening tool that helps in determining the presence or absence of dementia and lays the foundation for more detailed diagnostic measures that may be aimed at identifying specific subtypes of dementia. The preference for employing binary classification arises primarily from two factors, i.e. data scarcity of subtypes and emphasis on early-stage detection. The procurement of detailed datasets that accurately label the various dementia categories remains a significant hurdle. This limited availability restricts the development of AI-based algorithms to differentiate between multiple types of dementia with high accuracy. Second, dementia often comes with symptoms that usually overlap among its different types, thus making binary classification an adequately precise technique for initial diagnosis. It assumes even greater importance in environments where there is a scarcity of resources and limited access to state-of-the-art diagnostic equipment.
Multiclass classification for the subtypes of dementia has also been considered an interesting research area by the relevant researchers ( Mehmood et al., 2020; Asanomi et al., 2021; Sharma et al., 2021a; Biswas and Gini, 2023; Lampe et al., 2023). Multiclass classification classifies subtypes of dementia. This offers more than a simple confirmation of the presence or absence of dementia in the patients’ input data. However, this multiclass classification of dementia comes with associated challenges. Some of these challenges are the scarcity of diverse and broad datasets and ethical and privacy concerns of data sharing. Creating reliable and accurate multiclass classification systems needs comprehensive datasets. These datasets should represent a wide range of patients with different conditions and stages of the disease. Advanced AI models can distinguish effectively only if detailed patient data are shared between different medical institutions. It is crucial to share these data while following ethical guidelines and strict data protection rules to protect patient privacy.
The biggest challenge that AI researchers find in the area of dementia diagnosis is the availability of a large amount of unbiased and correct data. The latest AI models are very complex and these complex models need a very high amount of good-quality data for their accurate training. So this difference in what is available and what is required by these models is somehow covered using pretrained AI models. These pretrained models, including VGG16, ResNet50, and InceptionV3, have demonstrated notable success in categorizing various stages of dementia through the transfer learning approach. These sophisticated models start by being trained on extensive, diverse datasets and subsequently undergo a fine-tuning process using datasets specific to dementia, thus capitalizing on their previously acquired knowledge. This method significantly enhances performance and curtails the amount of time needed for training when compared to the construction of new models from the ground up ( Deepanshi et al., 2021; Sharma et al., 2021b; Suganthe et al., 2021; Torghabeh et al., 2023; Assmi et al., 2024).
This is the time when communities are adopting these new technologies; it is important to think about the ethical issues of using deep learning for dementia diagnosis. There is a risk of bias in data collection and also in the development of models. This can lead to biased and unfair outcomes. It is important to make sure that everything is fair and inclusive from every perspective. Also, keeping patient’s information safe is important. In the direction of improving dementia diagnosis, researchers are actively developing advanced multiclass classification systems to achieve better accuracy and reliability in various situations. They are not just sticking to the usual methods but are exploring new ways to tackle challenges, like limited data. They are trying out innovative techniques such as data augmentation and transfer learning to overcome these limitations. Researchers are also studying how deep learning can work with other methods, such as looking at genes or spinal fluid to make better dementia diagnoses. They are also working on explainable AI to make it easier to understand why models make certain decisions. This is important for trust and clarity in deep learning used for dementia care.
It is evident from the literature that deep learning is quite a powerful tool that is widely used in many fields including healthcare and has become particularly important ( Almufareh et al., 2023a, b, c, 2024a, b; Sahu et al., 2024). In healthcare, one of the use cases of deep learning, which is also the focus of the article, is dementia diagnosis. This makes diagnoses more efficient and reliable. Starting with simple classification is good, but we need to develop more complex sorting systems. We also need to think about ethics, like sharing data carefully and keeping patient’s information private, as we move forward in this field.
METHODOLOGY
The methodology section offers a detailed and extensive elucidation of the experimental setup and the various methodologies that were employed in the binary classification of MRI images used in dementia research. It delves into an exhaustive dissection of the entire process, detailing stages such as the procurement of data, sophisticated preprocessing techniques employed, an exploration of the underlying reasons for the selection of specific models, intricate details pertaining to the training phase of said models, and an in-depth analysis of the evaluation methods applied. Figure 2 shows AI supported Dementia diagnosis framework.
In this research, we have acquired a dataset ( Kaggle, n.d.), which comprises MRI scans that are specifically used for the classification of dementia. The original version of this dataset was divided into four main groups: people without dementia and three different levels of dementia severity—mild, very mild, and moderate. This means there were groups for those who do not have dementia, and then others for different stages based on how severe the dementia is, i.e. mild, very mild, and moderate. However, it is noted that there was a significant imbalance in the distribution of data across these types of dementia. This imbalance can substantially affect the training of models adversely. One of the obvious ways to deal with this imbalance is data augmentation. However, it is important to highlight the general reservations about applying data augmentation for health-related images. So, for the empirical study that is presented in this article, dataset is regrouped into two groups, i.e. images indicating dementia and those without the indications of dementia. Figure 3 shows few samples of MRI scans for demented and non Demented brain. This regrouping of data resulted in a harmoniously balanced dataset, thereby potentiating the robustness and reliability of our binary classification efforts. Despite this focus, it is pertinent to note that there exists an intriguing potential for implementing hierarchical classification systems. Such systems could be configured to further scrutinize images already determined to manifest signs of dementia during the initial binary classification phase and segregate them into the aforementioned three subtypes. This would undoubtedly yield a more refined and granular diagnostic insight concerning the stages of disease progression. However, it should be made clear that this paper places its emphasis squarely on binary classification—with an exclusive concentration on differentiating between cases with dementia and those without. The ambition of this study lies in concocting and rigorously assessing binary classification models by adopting transfer learning paradigms applied onto MRI scans to diagnose dementia. This paper focuses on how transfer learning can make dementia diagnosis better, improving the tools used in medical care.

Samples of MRI images: a. Demented; b. non-demented. Abbreviation: MRI, magnetic resonance imaging.
For our research, we regrouped the dataset specifically for sorting dementia cases into two groups, with 5069 MRI images for training. Half of these images show signs of dementia, and the other half show no signs. Additionally, we set aside 1098 images for checking the accuracy of our model, with an equal number from each group. In the t-distributed stochastic neighbor embedding (t-SNE) ( Figure 4) plot of the dementia vs. no dementia dataset, the points representing dementia and non-dementia cases appear very close together. This proximity indicates a low interclass variance, posing a challenge for classification models. This suggests that distinguishing between dementia and non-dementia cases based solely on the plotted features may be difficult for the model.
Before the models can start learning, each image goes through some preparation steps. These steps make sure that all the images look the same and work well with the smart learning programs. These preprocessing stages encompass an array of methods, such as altering the image size to a standard scale, normalizing pixel values for consistent intensity ranges across all images, and applying grayscale adjustments. This is important for improving the performance of the model as well as for preventing the issue of overfitting (where a model performs well on training data but poorly on unseen data). In search of the most effective deep learning model for this article, an assortment of highly regarded pretrained models are trained. These pretrained models have already undergone prior training on a vast and varied collection of images found in the ImageNet dataset. Among these models, experiments performed in this study have used VGG16, VGG19, Inception V3, DenseNet, EfficientNet, and ResNet101. Utilizing such pretrained models is beneficial as they serve as an excellent basis for the extraction of features; this is attributed to their demonstrated expertise in discerning generic features across a broad spectrum of image data, which holds great potential in tailoring them further to specific tasks at hand.
Each pretrained model goes through a careful fine-tuning process on the Dementia MRI dataset. This is important for tuning the generalized features that were previously learned from ImageNet data to match the unique attributes of dementia classification. This fine-tuning process includes systematic adjustments to the already established parameters of each pretrained model. These adjustments are performed with the training on the dementia dataset while carefully preserving the complex weights that were initially acquired from the extensive ImageNet dataset. To comprehensively assess the usefulness of different pretrained models as well as various transfer learning approaches, a series of systematic experiments are carried out. These experiments involve the thorough training of different models on the dementia dataset. Within this context, we meticulously explore various fine-tuning methods to ensure robust training. The effectiveness and reliability of each model are carefully measured using well-established evaluation metrics, including but not limited to accuracy, precision, and recall. We look at how well each model can tell the difference between MRI images of people with dementia and those without by using these measurements. Figure 5 represents the Dementia classification framework. This article explores different levels of transfer learning, starting from just pulling out features to make detailed adjustments to different parts of these ready-to-use models. By freezing and then unfreezing certain parts of the models while they are learning, we can carefully see how each part affects how well the model works and how quickly it learns. In the next phase, we carefully look at and talk about what happened in the experiments. We learn a lot about how well ready-to-use models and transfer learning methods work in sorting MRI scans for dementia. This article compares different models, pointing out what they are good at and where they might need improvement. Then, we suggest ideas for what future research in this area could focus on based on what we have found. In the next section, detailed analysis of the results is presented.
RESULTS AND DISCUSSION
The comprehensive study on the diagnosis of dementia using MRI scans shows the experimental findings of deep learning models to determine how well they diagnose dementia. The goal of the experiments is to examine several state-of-the-art architectures and assess how well they perform in terms of classification accuracy, precision, recall, and total memory capacities. First, we started experimenting with the lightweight and promising EfficientNet B0 architecture. This model was trained for 15 epochs, including three custom thick learnable layers. Its performance did not meet our expectations; it only gave an accuracy of about 58%. This result demonstrated the intrinsic difficulty of the job and the need for increasingly complicated structures. In order to increase accuracy, experimentation led to an increase in the architecture’s complexity. A subsequent trial involved enhancing the depth of the EfficientNet B0 model by incorporating six layers. However, contrary to our expectation, this increase led to a degradation in performance, with accuracy falling to 42%, alongside less promising precision and recall rates. To improve the accuracy, we started using the pretrained EfficientNet B4 model, which is well known for its deeper architecture and enhanced capabilities. With six layers to learn from, this model exhibited improved accuracy of approximately 58%. Notably, its precision and recall rates have shown significant improvements compared to its B0 counterpart. This indicates the effectiveness of deeper architectures in capturing the complexity of dementia diagnosis from MRI images.
Experimental results for EfficientNet B0 | ||||
---|---|---|---|---|
Training duration | Trainable layers | Accuracy (%) | Precision (%) | Recall (%) |
15 epochs (Experiment 1) | 3 custom dense layers | 58.29 | 58.29 | 100.00 |
15 epochs (Experiment 2) | 6 custom dense layers | 41.71 | 41.71 | 100.00 |
Experimental results for EfficientNet B4 | ||||
---|---|---|---|---|
Training duration | Trainable layers | Accuracy (%) | Precision (%) | Recall (%) |
15 epochs (Experiment 3) | 6 custom dense layers | 58.29 | 58.29 | 100.00 |
10 epochs (Experiment 4) | All pretrained model layers +2 custom dense layers | 74.23 | 83.24 | 69.84 |
Experimental results for EfficientNet B7 | ||||
---|---|---|---|---|
Training duration | Trainable layers | Accuracy (%) | Precision (%) | Recall (%) |
10 epochs (Experiment 5) | All pretrained model layers +2 custom dense layers | 77.78 | 86.00 | 73.91 |
15 epochs (Experiment 6) | All pretrained model layers +2 custom dense layers | 77.69 | 84.96 | 75.00 |
Another series of experiments was performed with the aim of improving the EfficientNet B4 model’s performance. We added more layers and changed the number of training epochs to 10; while no layer of the pretrained model was frozen, the accuracy increased significantly, reaching 74%. In addition, the model showed improved accuracy and memory performance. This emphasizes how crucial architectural changes are in improving diagnostic precision. We also looked at a more complex architecture that is the EfficientNet B7 model. We made significant progress through layer addition and repeated training while keeping all the layers of pretrained model as trainable and adding custom trainable layers as well. After 10 training rounds, the accuracy was about 78%. However, performance somewhat decreased when training was extended to 15 cycles without freezing any layers. This demonstrates how important it is to strike a balance between training time and model complexity. In addition to the EfficientNet series, we examined various popular pretrained models including VGG16, VGG19, ResNet101, InceptionV3, and DenseNet. Among these, the InceptionV3 model, trained for 30 epochs, emerged as the top performer, attaining an impressive accuracy of 80.6%. This achievement highlights the significance of model selection and training duration for optimizing diagnosis accuracy for dementia detection via MRI images.
Experimental results for DenseNet121 | ||||
---|---|---|---|---|
Training duration | Trainable layers | Accuracy (%) | Precision (%) | Recall (%) |
15 epochs (Experiment 7) | All pretrained model layers +2 custom dense layers | 79.05 | 87.68 | 74.53 |
30 epochs (Experiment 8) | All pretrained model layers +6 custom dense layers | 79.05 | 87.68 | 74.53 |
30 epochs (Experiment 9) | 20 pretrained model layers +6 custom dense layers | 76.96 | 85.77 | 72.50 |
Experimental results for ResNet101 | ||||
---|---|---|---|---|
Training duration | Trainable layers | Accuracy (%) | Precision (%) | Recall (%) |
30 epochs (Experiment 10) | All pretrained model layers +6 custom dense layers | 77.14 | 86.49 | 72.03 |
30 epochs (Experiment 11) | 20 pretrained model layers, +6 custom dense layers | 77.14 | 86.49 | 72.03 |
Experimental results for VGG16 | ||||
---|---|---|---|---|
Training duration | Trainable layers | Accuracy (%) | Precision (%) | Recall (%) |
30 epochs (Experiment 12) | All pretrained model layers +6 custom dense layers | 58.29 | 58.29 | 100.00 |
30 epochs (Experiment 13) | 20 pretrained model layers +6 custom dense layers | 58.29 | 58.29 | 100.00 |
Experimental results for VGG19 | ||||
---|---|---|---|---|
Training duration | Trainable layers | Accuracy (%) | Precision (%) | Recall (%) |
30 epochs (Experiment 14) | All pretrained model layers +6 custom dense layers | 58.29 | 58.29 | 100.00 |
30 epochs (Experiment 15) | 20 pretrained model layers +6 custom dense layers | 58.29 | 58.29 | 100.00 |
Experimental results for InceptionV3 | ||||
---|---|---|---|---|
Training duration | Trainable layers | Accuracy (%) | Precision (%) | Recall (%) |
30 epochs (Experiment 16) | All pretrained model layers +6 custom dense layers | 79.42 | 89.35 | 73.44 |
30 epochs (Experiment 17) | 20 pretrained model layers +6 custom dense layers | 80.69 | 89.34 | 75.94 |
Experiments are evaluated in terms of accuracy, precision, and recall. These performance measures can be defined as follows:
Figures 6– 8 shows the comparative results of different experiments in terms of accuracy, precision, and recall. Figures 9 and 10 show the results of the best performing Inception V3 model.

ROC curve for Inception V3. Abbreviations: AUC, area under the curve; ROC, receiver operating characteristic.
It can be observed that changing the number of epochs does not have a very strong effect on the achieved accuracy. The number of layers used for training affects the achieved results. However, general observation is that overall deeper and more complex models performed better than the lesser deep classification models. In general, the experiments showed that different deep learning models had different levels of success in classifying dementia and non-dementia cases from MRI images. Some models did well, while others found it difficult to tell the two classes apart. These results highlight how important it is to choose the right model and make changes to its structure to improve how it identifies dementia. More research is needed to find new methods and improvements to make deep learning models even better at diagnosing dementia, which is crucial for medical purposes.
CONCLUSION
This paper presents a detailed empirical analysis of how transfer learning techniques can be utilized for the classification of dementia from MRI images. By executing a set of experiments and detailed empirical analysis, this paper shows that transfer learning has the potential to significantly enhance classification accuracy by using various well-known deep learning models, such as VGG, ResNet, InceptionNet, EfficientNet, and DenseNet. The paper highlights the significance of pretrained models and transfer learning to overcome the challenges of limited and biased data in dementia diagnosis. Moreover, the study investigates the impact of different transfer learning strategies on classification accuracy. This provides valuable insights for improving model performance in dementia diagnosis. These findings are not only important for the improvement of automated diagnostic tools but also have the potential to support better intervention strategies. This can lead to better patient care outcomes. The paper suggests that exploring more complex layers of classification and incorporating diverse input data can further enhance classification accuracy.