1. INTRODUCTION
The brain, a highly specialized organ with as many as 86 billion nerve cells, is susceptible to a wide range of diseases and conditions that can substantially influence its function and health [1]. These diseases include infections, injuries, strokes, seizures, and tumors, all of which can impair daily activities and overall quality of life [2]. Infections including meningitis, encephalitis, and brain abscesses cause inflammation and can lead to severe complications [3]. Seizure disorders, particularly epilepsy, manifest various symptoms ranging from mild to debilitating [4]. Traumatic events, such as concussions and traumatic brain injuries, can result in temporary or permanent damage [5]. Tumors, whether benign or malignant, together with conditions including hydrocephalus and pseudotumor cerebri, increase intracranial pressure and consequently affect brain function [6]. Vascular conditions such as strokes result from disrupted blood flow and may potentially lead to brain cell damage [7]. The outcomes of these brain diseases vary widely according to their type, location, and severity, thus underscoring the importance of early diagnosis and treatment.
Medical imaging enables objective evaluation of brain structure and function. Among the various imaging modalities, magnetic resonance imaging (MRI) is notable for its ability to detect early abnormalities and lesions at the molecular level [8–10]. MRI can accurately quantify white and gray matter volumes; assess atrophy; and provide high-resolution images of brain structures, free from interference by skull base artifacts.
GAI, a transformative approach in modeling data distributions from training datasets [11], is aimed at understanding and replicating the data generation process, thereby producing new data that mimic the training data. In medical imaging, particularly the diagnosis of various brain diseases through brain MRI images, GAI can learn the disease’s characteristics and the associated structural changes in the brain [12]. By comparing the MRI images between healthy individuals and patients with various brain diseases, GAI can help clinicians make more accurate diagnoses.
This review first introduces four foundational GAI models frequently used in diagnosing brain diseases through brain MRI images. Subsequently, it explores the applications of these models in four critical areas: data preprocessing, image segmentation, interpretable features, and diagnostic support across various disease stages, with an emphasis on early diagnosis. Finally, the discussed content is synthesized, and future directions are proposed for the application of GAI in MRI imaging for various brain diseases.
2. TYPES OF GENERATIVE ARTIFICIAL INTELLIGENCE MODELS
2.1. Generative adversarial networks
Generative adversarial networks (GANs) provide a sophisticated approach to learning generative models of data distribution through adversarial training [13,14]. This method involves a dynamic interplay between two neural networks: the generator and the discriminator [15]. Figure 1 illustrates this process, highlighting the two components and their interaction within the GAN framework.
The generator’s primary objective is to create samples that are indistinguishable from real data [16]. The aim is to produce realistic data points that mimic the original dataset as closely as possible. In contrast, the discriminator’s role is to accurately classify whether a given sample is real (i.e., from the original dataset) or fake (i.e., produced by the generator) [17]. The adversarial nature of GANs arises from this opposition: the generator continually tries to deceive the discriminator by increasing the realism of its generated samples, whereas the discriminator enhances its ability to detect fake samples [18,19]. This ongoing adversarial game leads both networks to improve iteratively. In an ideal scenario, the generator becomes sufficiently proficient that the discriminator can no longer distinguish between real and generated samples, thereby achieving a state in which the synthetic data are virtually indistinguishable from real data [20].
2.2. Diffusion models
In physics, diffusion refers to the movement of particles or energy from a region of higher concentration to a region of lower concentration [21]. For example, in a drop of ink spreading through a transparent gel, the ink is initially concentrated in one area and diffuses throughout the gel over time, until it is uniformly distributed. Although reversing this natural process is impossible in reality, diffusion models aim to conceptually reverse this process within a data distribution [22].
The core idea underlying diffusion models is systematically and gradually degrading the structure of a data distribution through an iterative forward diffusion process. The model then learns a reverse diffusion process to restore the original structure, thereby resulting in a flexible and tractable generative model [23]. In practice, diffusion models replicate the diffusion process by iteratively adding noise to the original data, such as an image [24,25]. This noise addition continues until the image becomes pure noise, which is governed by a Markov chain of events. A Markov chain describes a sequence of events, wherein each step depends only on the previous one, formalized as follows:
Any sequence of random variables X 0, X 1, X 2, …, Xn satisfying this condition can be considered a Markov chain. This Markovian assumption makes learning of the added noise tractable. After the model is trained to predict the noise at each time step, it can generate high-resolution images from Gaussian noise inputs. In summary, the diffusion model operates in two stages: the forward diffusion process, wherein noise is incrementally added to the image until it becomes pure noise, and the reverse diffusion process, wherein a neural network is trained to remove the noise and reconstruct the original image [26].
2.3. Transformer models
Mainstream large models use predominantly the transformer network architecture, thus underscoring its critical importance [27]. In the realm of computer vision, these large models are akin to the human brain, with transformers serving as the neural network structures within the brain [28]. Central to this architecture are the encoder and decoder components, which enable deep understanding of the input image, and generate outputs through intricate signal transmission and processing mechanisms (Fig 2).
The encoder’s initial step involves converting the input image into numerical vectors, each representing an image block and encompassing its feature information. This process is analogous to converting biological samples into analyzable numerical data [29]. These vectors capture not only the appearance features of the image blocks, but also their spatial relationships, structural details, and contextual information. A critical innovation of the transformer is the self-attention mechanism, which functions similarly to a spotlight highlighting each image block during analysis. This mechanism illuminates the image block and reveals its relationships with other blocks in the image. The self-attention mechanism operates by generating three vectors: a query, key, and value. The query vector indicates the level of attention the current image block has toward other blocks, the key vector represents the features of all image blocks, and the value vector carries the actual image information. The attention score is calculated through comparison of the query vectors and key vectors; higher scores indicate a closer connection between image blocks. Finally, the self-attention layer produces a new, more contextual image representation by weighting the value vector according to the attention score [30]. The multi-head attention mechanism is akin to multiple independent analysis groups, each examining the relationships between image blocks at different levels. This framework is similar to various experimental groups studying different aspects of the same phenomenon in biological experiments. By integrating the outputs of these groups, the model enables a comprehensive understanding of the image. Position encoding addresses the challenge of the transformer’s inability to directly process the order of image blocks. By adding position vectors to each image block embedding vector, the model captures the relative positional relationships among image blocks [31]. Consequently, positional encoding maintains the stability of positional information even if the order of images changes.
When generating output, the decoder uses a masking self-attention mechanism to ensure that each image block focuses only on previously generated blocks, thereby preventing the model from using future information [32]. This framework is analogous to researchers in biological experiments inferring conclusions from existing data without predicting future results. The encoder-decoder attention mechanism allows the decoder to refer to the input representation generated by the encoder, thereby ensuring the output’s consistency and coherence with the original image. This framework mirrors biological research in which the researchers refer to existing literature and data during experiments. Finally, the decoder transforms internal representations into actual output image blocks sequentially, similarly to the final analysis and reporting of experimental results [33,34].
Through the self-attention and multi-head attention mechanisms, the transformer model achieves a deep understanding of the overall image and its nuanced details, and excels in tasks such as object detection, image segmentation, and image generation. Its core advantage lies in its ability to capture context and comprehend visual information, thus revolutionizing the field of computer vision.
2.4. Variational autoencoders
Figure 3 illustrates the principles of variational autoencoders (VAEs). An autoencoder, as its name implies, transforms a set of real samples into an ideal data distribution through an encoder network [35]. This transformed data distribution is then fed into a decoder network, which generates a new set of samples. If these generated samples closely resemble the real samples, the autoencoder model is considered successfully trained [36]. VAEs extend the standard autoencoder model by incorporating variational processing. This enhancement enables the encoder’s output to correspond to the mean and variance of the target distribution, and allows for more robust and realistic data generation [37].
2.5. Autoregressive model
Autoregressive (AR) models are a sophisticated class of generative models that are tailored for sequence data, and leverage past outputs to inform the generation of subsequent elements. This recursive approach ensures that each step in the sequence generation process is contextually grounded in the results of the preceding steps; therefore, AR models are particularly effective for applications such as language generation and time series forecasting. The term “autoregressive” reflects this method: “auto” signifies the model’s self-referential use of its own prior observations, whereas “regressive” emphasizes the dependence on historical data to predict current values. This dependency is crucial for capturing the temporal trends and correlations often inherent in time series data.
An AR model of order p, denoted AR(p), is mathematically expressed as
where φ1, …, φ p are model parameters, and ɛ t represents white noise. To ensure that the model remains weak-sense stationary, certain parameter constraints must be met; without these constraints, even AR(1) processes can become non-stationary. Consequently, autoregressive models are invaluable in analytical contexts requiring understanding and prediction of temporal dependencies.
2.6. Applications of GAI models based on MRI images in brain diseases
GAI models have distinct advantages over traditional machine learning methods in brain MRI analysis. Unlike traditional methods, which often require large amounts of labeled data and face data scarcity challenges, GAI models can create synthetic data to augment limited datasets. This capability improves the robustness and accuracy of deep learning models [38]. These models also excel in enhancing image quality through advanced denoising and artifact reduction techniques, which might not be handled as effectively through traditional methods [39]. Furthermore, these models can integrate multimodal data, thereby providing a more comprehensive view of brain diseases and enabling more nuanced insights [40]. This flexibility and ability to learn complex data distributions make GAI models a superior choice in medical imaging. Recent advancements in brain MRI image analysis have highlighted the transformative potential of GAI models.
GANs have been extensively explored for their ability to enhance deep learning model capabilities in lesion segmentation. Conte et al. have demonstrated the use of GANs to synthesize missing T1 and FLAIR MRI sequences, thus significantly improving brain tumor segmentation outcomes [41]. Li et al. have addressed data scarcity in age estimation for neurodegenerative diseases by proposing a GAN-based image synthesis method, which has been demonstrated to effectively estimate brain age [42]. Furthermore, Hu et al. have applied GANs to CT-to-MR image synthesis for detecting brain lesions in patients with acute ischemic stroke; these findings underscore the utility of GANs in enhancing diagnostic capabilities [43]. Ghaffari et al. have introduced a 3D conditional GAN to decrease motion artifacts in brain MRI, thereby achieving improvements in image quality and diagnostic accuracy [44]. Additionally, Cui et al. have presented a GAN-segNet model for precise brain tumor semantic segmentation; their findings highlight the roles of generative networks in accurately delineating intratumor regions [45].
Diffusion-based models have also shown promise. Coutts et al., in the DOUBT study, have assessed brain ischemia in patients with transient symptoms through diffusion MRI [46]. Miao et al. have focused on decreasing susceptibility artifacts in participants with metallic orthodontic braces, and enhancing the accuracy of diffusion-prepared DTI [47]. Toschi et al. have studied age-related white matter degeneration and revealed earlier brain changes in males than females, according to diffusion MRI [48]. Uus et al. have developed a multi-channel registration pipeline for neonatal brain development analysis, and Palombo et al. have introduced a compartment-based model for diffusion-weighted MRI, thus highlighting the inclusion of the soma compartment in brain microstructure models [49]. Zhang et al. have presented a deep learning method, DDMReg, for the accurate registration of diffusion MRI datasets through leveraging comprehensive fiber orientation information [50].
Transformer-based models have revolutionized MRI analysis, as demonstrated by Jun et al.’s medical transformer, which effectively models 3D volumetric brain images as sequences of 2D slices for tasks such as disease diagnosis, age prediction, and tumor segmentation [51]. Zhang et al.’s TW-Net, a transformer-weighted network, outperforms existing algorithms in neonatal brain MRI segmentation by incorporating long-range dependency information [52]. In brain tumor segmentation, Ting et al. have introduced multimodal transformer networks and clinical knowledge-driven models to improve segmentation accuracy by leveraging incomplete multimodal MRI data and radiologists’ insights [53]. Alharthi et al. have emphasized the roles of transformer models in diagnosing autism spectrum disorder, and advocated for further research to uncover underlying causes and biomarkers [54].
VAEs have been used effectively in brain MRI analysis. Volokitin et al. have combined a 2D slice VAE with a Gaussian model to capture relationships in 3D MR brain volumes [55]. Lyu et al. have proposed a cascade model for brain tumor subregional segmentation by using VAEs and attention gates, which have been found to achieve high performance on the BraTS 2020 dataset [56]. Sasagasako et al. have used VAEs for predicting postoperative outcomes in patients with glioblastoma by integrating clinical data and MRI features [57]. Kim et al. have applied VAEs to neonatal functional connectome data and successfully captured individual developmental trajectories [58]. Kapoor et al. have proposed a multiscale metamorphic VAE framework for high-fidelity 3D brain MRI generative modeling [59].
Autoregressive models have also been instrumental. Hashemzehi has introduced a hybrid paradigm combining neural autoregressive distribution estimation with convolutional neural networks for brain tumor detection [60]. Belaoucha has leveraged brain structural connectivity from diffusion MRI to enhance EEG/MEG source reconstruction by using autoregressive models [61]. Kook et al. have developed BVAR-connect, which integrates diffusion tensor imaging data into a Bayesian multi-subject vector autoregressive model for effective brain connectivity network analysis [62]. Ruf et al. have explored autoregressive features in functional MRI time series for classifying depression and anxiety, and highlighted the value of autocorrelation in fMRI data analysis [63]. Table 1 compares the advantages and disadvantages of the above five GAI models.
Summary of GAI methods for medical image segmentation.
Name | Advantages | Disadvantages |
---|---|---|
GANs | Generation of realistic images Improved segmentation accuracy Effective handling of missing data | Training instability Mode collapse Requirement for large datasets |
Diffusion | Robustness to noise Preservation of fine details Effective data denoising | High computational cost Slow generation process |
VAEs | Efficient latent space representation Good anomaly detection Capture of complex distributions | Potential generation of blurry images Lower image quality than that with GANs |
Transformers | Good handling of sequential data Capture of long-range dependencies Versatility across tasks | Computational expense Requirement for large labeled datasets |
Autoregressive | Model temporal dynamics Good sequence prediction Integration of multi-modal data | High computational cost Complex model training |
3. PRACTICAL APPLICATIONS OF GENERATIVE ARTIFICIAL INTELLIGENCE MODELS
3.1. Clinical workflow for using GAI to analyze MRI images
As shown in Fig 4, to incorporate a GAI model for MRI image analysis into routine clinical workflows, the process begins with the seamless integration of high-quality medical imaging data into existing systems. This process involves ensuring compatibility with clinical databases and imaging software. Data preprocessing, including noise removal and normalization, is essential to maintain accuracy and consistency. Next, the GAI model must be rigorously trained, in a process requiring substantial computational resources. Implementing robust evaluation methods, such as cross-validation, is critical to ensure the model’s reliability. Once trained, the model can be deployed for tasks such as image classification, object detection, and segmentation analysis, tailored to clinical needs.
Regulatory challenges include compliance with healthcare standards and obtaining necessary approvals from medical authorities. Ethical considerations involve maintaining patient privacy and ensuring informed consent for data usage. Logistically, integrating the GAI system requires adequate infrastructure and training for clinical staff to effectively use the new technology. Continuous monitoring and updates are also necessary to address emerging issues and improve the model’s performance over time.
3.2. Data preprocessing
Data preprocessing, a cornerstone of data analysis and AI-based diagnostics, provides the foundation for reliable modeling and the overall integrity of the analytical outcomes. In the realm of GAI models, preprocessing typically encompasses three critical stages: data repair, normalization, and augmentation. Each of these stages is crucial for ensuring data precision, consistency, and diversity, which are essential for training robust and reliable models [64].
Medical imaging datasets, particularly those sourced from various hospitals, often exhibit substantial variability in size, format, and clarity, because of differences in equipment and storage conditions [65]. For instance, MRI images may have issues such as blurriness, indistinct feature points, or damage, which can impair the effectiveness of neural network training. GAI models, with their ability to learn complex patterns from data, can be used to enhance and repair brain MRI images. The enhancement process typically involves several key steps. Initially, MRI images are preprocessed to standardize dimensions and normalize intensity values. The generative model is then trained on these preprocessed images, to learn to generate high-quality MRI images from noisy or incomplete inputs. Through iterative training, the model progressively improves its ability to produce images that closely resemble the original data. Finally, the output images are further refined to ensure that they meet consistency and accuracy standards. The use of generative models in MRI image enhancement offers several advantages, such as effectively restoring missing or damaged regions, decreasing noise, and correcting blurriness. Additionally, these models can standardize MRI images from different sources, thereby ensuring uniformity in training datasets [66].
Normalization is critical in balancing image resolution. High-resolution images might lead to excessive feature extraction, and result in prolonged computation times and overfitting risk [67]. In contrast, low-resolution images might lack sufficient detail, and cause underfitting and poor classification accuracy. Thus, selecting an optimal resolution for normalization is essential to maintaining both training efficiency and model performance. Generative models facilitate this process by learning the ideal resolution and image quality parameters directly from the data. The procedure typically begins with the standardization of MRI images with respect to dimensions and intensity values, thus establishing a consistent starting point. The model is then trained on these preprocessed images, to learn to generate normalized images that strike a balance between detail and computational efficiency. Through iterative training, the model identifies and produces images at optimal resolution, while ensuring that they are neither overly detailed (which might cause overfitting) nor too sparse (which might cause underfitting) [68]. The generated images are further refined to meet desired quality and consistency standards. Automating the normalization process with generative models substantially decreases the manual effort required for data preparation, while ensuring that all images used in training are at optimal resolution [69]. This automation offers several advantages: it decreases computation time, thereby allowing models to train more quickly; it prevents both overfitting and underfitting, thereby enhancing classification accuracy; and it standardizes MRI images from various sources, thereby ensuring uniformity and comparability across datasets [70].
The complexity of early brain changes, substantial individual variability, inconsistent disease progression, and nonspecific alterations in MRI images make diagnosing various brain diseases, such as Alzheimer’s disease, exceptionally challenging [71,72]. These factors contribute to the variability across Alzheimer’s disease stages, particularly in its early phases, in which brain MRI images are less commonly obtained compared to those in healthy older individuals. To address data scarcity and enhance the generalization capability of diagnostic models, generating diverse training data has become a crucial preprocessing step. Generative models have shown remarkable performance in data augmentation, by significantly increasing the efficacy of diagnostic models [73]. Generative models can provide a substantial number of realistic brain MRI images, encompassing a wide spectrum of changes from normal aging to various stages of Alzheimer’s disease. This data diversity enables the training model to better capture the distinct features of lesions, thereby enhancing its ability to recognize and differentiate among disease stages [74]. Furthermore, generative models can simulate the brain characteristics of different individuals—a process essential for addressing interindividual differences. Given the variability in brain structure and disease progression among individuals, generative models enhance the robustness of diagnostic models by generating image data that reflect these differences, even in scenarios with high interindividual variability. Additionally, the images generated by generative models can incorporate non-specific changes, which are common in real clinical settings [75,76]. Including these non-specific changes in the training set enables the model to learn to distinguish Alzheimer’s disease from other similar neurodegenerative conditions, thereby improving its diagnostic accuracy and reliability. The ability of generative models to create diverse and representative data not only alleviates the issue of data scarcity but also ensures that diagnostic models are better equipped to handle the complexities of Alzheimer’s disease, and ultimately leads to more accurate and reliable diagnosis [77].
The applications of GAI models in enhancing MRI images have been described in several real-world scenarios, particularly in medical institutions and research settings. For instance, researchers have presented a novel three-dimensional generative model for synthesizing realistic and morphologically accurate brain images, to address the challenges of data scarcity in medical imaging. The model generates high-resolution samples that preserve biological and disease phenotypes and therefore is suitable for various applications in healthcare AI. By demonstrating superior morphological preservation to existing methods, the research highlights the potential of synthetic data to enhance the training of AI models in privacy-sensitive contexts [78]. Another study focused on generating diverse training datasets for Alzheimer’s disease diagnostics. By using GANs, the researchers synthesized a wide variety of MRI images representing different stages of the disease. The generated images enabled more robust training of classification algorithms and consequently improved their ability to detect subtle changes associated with the disease. The authors have also proposed a novel architecture combining attention mechanisms to improve the segmentation of complex structures. The significant improvements in segmentation quality highlighting the potential of this approach for clinical applications in neuroimaging [79].
However, implementing GAI models for MRI enhancement is not without challenges. One major issue is the need for high-quality training data. The effectiveness of the generative models heavily relies on the diversity and quality of the input data. To address this challenge, researchers often perform extensive data preprocessing, including normalization and augmentation, to ensure that the training datasets are representative of various imaging conditions and patient demographics [80]. Another challenge is the risk of overfitting, wherein the model becomes too tailored to the training data and fails to generalize to new cases [81]. This aspect can be mitigated through techniques such as dropout regularization and the introduction of noise during training, which help the models maintain performance across diverse datasets. Additionally, the integration of GAI models into existing clinical workflows pose operational hurdles, including needs for training personnel and adapting software systems. Collaborations among data scientists, radiologists, and IT departments are essential for seamless integration and ensuring that the models yield actionable insights without disrupting established practices [82]. Although GAI models show great promise in enhancing MRI images, ongoing research and collaboration are necessary to address the challenges including data quality, model generalization, and practical implementation in clinical settings.
3.3. Image segmentation
The region of interest (ROI) refers to a specific area selected from an image, which is the main subject for subsequent image processing and analysis [83]. In machine vision and image processing, ROIs are typically used to delineate the image parts that require special attention or processing for deeper analysis and feature extraction. In brain MRI images of Alzheimer’s disease, the primary ROI is usually the temporal lobe, particularly the hippocampus, which may show deposition of beta-amyloid plaques and neurofibrillary tangles, thereby leading to synaptic and neuronal loss, and severe atrophy in the affected brain regions, often starting in the hippocampus [84,85].
In this application, the training of the generative model involves input of the original MRI images and generating segmented images through a network, to clearly delineate the hippocampal region. This process requires the model to understand both the global context of the image and local features, such as the shape, position, and texture of the hippocampus. Generative models have been shown to efficiently distinguish between lesion areas and normal tissues in the segmented images, thereby providing reliable data for subsequent diagnosis and analysis [86,87].
To enhance segmentation accuracy, generative models can integrate multiple loss functions, such as cross-entropy loss and Dice coefficient loss, thus optimizing the segmentation outcomes [86]. For instance, cross-entropy loss can effectively handle classification tasks by comparing the predicted class probabilities with the true class labels; therefore, this method is highly suitable for binary and multi-class segmentation problems [88]. In contrast, the Dice coefficient loss directly measures the overlap between the predicted segmentation and the ground truth, and is particularly beneficial in medical image segmentation tasks in which the region of interest is small and imbalanced [89]. Combining these loss functions allows the model to leverage the strengths of each, thereby improving both the precision and recall of the segmentation. Whereas cross-entropy loss ensures that the model learns to classify each pixel correctly, Dice loss focuses on the overall shape and area of the segmented region, thus balancing pixel-wise accuracy and the structural coherence of the segmentation [90]. Table 2 comprehensively summarizes various loss functions that can be used in medical image segmentation, including descriptions, advantages, and disadvantages.
Summary of loss functions for medical image segmentation.
Loss name | Description | Advantages | Disadvantages |
---|---|---|---|
Cross Entropy Loss (CE Loss) | Commonly used for binary and multi-class classification; measurement of the difference between the predicted probability distribution and the true distribution | Simple to implement, well-suited for classification tasks | Sensitive to class imbalance, might not capture spatial information well |
Weighted Cross Entropy Loss (WCE Loss) | Extension of CE Loss that includes weights to handle class imbalance by assigning more importance to minority classes | Balances class distribution, improves performance on imbalanced datasets | Might not address all types of imbalance, computationally complex |
Focal Loss | Designed to address class imbalance by focusing on hard-to-classify examples through a modulating factor | Effective for handling hard examples, improves model robustness | Requires tuning of additional parameters, can be computationally intensive |
Dice Loss | Measurement of the overlap between predicted and true segmentation masks, directly optimizing for segmentation accuracy | Effective for handling class imbalance, focuses on overall segmentation shape | Might have diminished effectiveness for large regions with little overlap, might be sensitive to small objects |
Generalized Dice Loss | Extension of Dice Loss for multi-class segmentation; adjustment for class imbalance by weighting classes differently | Robust to class imbalance, applicable to multi-class problems | Computational complexity, might require careful tuning of weights |
Generalized Wasserstein Dice Loss | Incorporation of Wasserstein distance to account for the semantic relationships among classes, thus enhancing segmentation quality | Captures semantic relationships, improves segmentation accuracy | High computational complexity, challenging to implement |
Tversky Loss | A variant of Dice Loss; enables adjustment of false positives and false negatives through weighting factors | Flexible; can be tuned to penalize false positives or false negatives differently | Requires careful tuning of parameters, can be complex to configure |
DiceCE Loss | Combines Dice Loss and Cross-Entropy Loss, leveraging the strengths of both | Balances pixel-wise accuracy and overall segmentation quality | High computational complexity, requires tuning of multiple parameters |
DiceFocal Loss | Combines Dice Loss and Focal Loss, addressing both class imbalance and hard-to-classify examples | Comprehensively handles class imbalance and difficult examples | Complex implementation, requires extensive tuning of multiple parameters |
Generalized Dice Focal Loss | Integration of Generalized Dice Loss with Focal Loss for enhanced performance in complex segmentation tasks | Highly robust, effective for complex and imbalanced datasets | High computational complexity, challenging parameter tuning |
Mean Squared Error Loss (MSE Loss) | Commonly used for regression tasks; measurement of the average squared difference between predicted and actual values | Simple to implement, widely used in regression problems | Sensitive to outliers, not suitable for classification tasks |
3.4. Interpretable features
GAI models have shown remarkable potential in extracting highly interpretable features from medical images, particularly in applications involving brain MRI scans for Alzheimer’s disease [91]. This high interpretability is due primarily to the inherent capability of generative models to reconstruct input data [92]. This reconstruction process necessitates a deep understanding of the data’s distribution and structure, to ensure that the model captures essential features of the original images. For instance, models such as VAEs and GANs use a low-dimensional latent space to represent data, and consequently can generate new samples that closely resemble the original distribution of brain MRI scans. Each point in this space can be seen as a combination of different features of the original MRI images. In the context of Alzheimer’s disease, different latent variables might correspond to various anatomical structures or pathologies within the brain, such as hippocampal atrophy or cortical thinning, which are critical markers of the disease. By manipulating these latent variables, researchers can generate MRI scans with specific characteristics, thereby facilitating a better understanding of how different features contribute to the overall diagnosis. Furthermore, GAI models’ excellence in disentangling features greatly enhances interpretability. For example, in brain MRI analysis, one latent variable might control the degree of atrophy in the hippocampus, whereas another might influence the presence of amyloid plaques. This disentanglement enables clearer insights into how different pathological changes are represented in the images, and provides a more nuanced understanding of various brain diseases’ progression.
Compared with discriminative models, which focus primarily on classification tasks, GAI models offer a distinct advantage by emphasizing the data generation process. In this process, the model must develop a comprehensive understanding of the image structure, thus leading to the extraction of more interpretable features [93]. In Alzheimer’s disease, generative models can not only classify the presence or absence of the disease, but also generate MRI scans that reflect the underlying pathology and enable a richer interpretation of the disease’s effects on brain structure [93].
GAI models also support feature visualization, a powerful tool for interpretability. One advanced approach to visualizing features involves generating activation maps, such as those produced by the LayerCAM algorithm [94]. This detailed process involves identifying and highlighting the regions of an MRI scan that are most influential in the model’s decision-making process. By overlaying these activation maps on the original images, researchers can visually inspect which anatomical structures or pathological changes are emphasized by the model. This process not only aids in understanding how the model interprets different regions of the brain but also provides insights into which features are most relevant for diagnosing brain diseases. Additionally, techniques such as t-distributed stochastic neighbor embedding and uniform manifold approximation and projection further enhance feature visualization by reducing the dimensionality of the latent space [95]. These methods enable the high-dimensional latent representations learned by generative models to be projected into two or three dimensions. This dimensionality reduction facilitates the exploration and interpretation of complex feature relationships within data. For instance, clusters formed in the reduced-dimensional space can reveal distinct patterns or stages of brain diseases, and aid in identification and understanding of the progression of the disease according to the extracted features.
3.5. Early diagnosis and diagnosis at different stages
GAI models, compared with traditional diagnostic tools, significantly enhance the early detection rates of various brain diseases, such as AD, primarily through their ability to identify subtle pathological features in brain MRI images. Early-stage AD often presents with minute structural changes that are difficult to detect through conventional image analysis methods. GAI models excel in recognizing these early signs, such as slight variations in gray and white matter, and decreases in hippocampal volume.
Several studies have demonstrated the effectiveness of GAI models in this context. For instance, one study has presented Smile-GAN, a semi-supervised deep-learning framework designed to identify several early-stage neuroanatomical patterns associated with AD from MRI data. Through analyzing 8,146 scans from 2,832 participants, the framework discovered four distinct neurodegeneration patterns, which revealed two pathways of disease progression. These patterns have been shown to enhance precision diagnostics and predict clinical outcomes more effectively than traditional biomarkers [96]. Moreover, GAI models can address the challenge of limited labeled data, which often hinders early AD diagnosis. A GAN has been used to classify AD with minimal data. By generating synthetic MRI images from a limited dataset, classification accuracy exceeding 80% has been obtained with a pretrained CNN [97]. This process allows for better training of models, and enhances sensitivity and specificity in recognizing early disease manifestations.
The adaptability of GAI models also allows for the application of Bayes’ theorem to produce discriminative models for multi-class classification. This capability enables more accurate differentiation among various stages of AD, which is critical in personalized medical strategies.
In summary, GAI models not only enhance the detection of early pathological features but also improve overall diagnostic performance in early-stage Alzheimer’s disease, as supported by empirical studies illustrating their advantages over traditional methods.
4. SUMMARY AND PERSPECTIVES
This review provided a comprehensive overview of the applications of GAI in the analysis of brain MRI images for various brain diseases. We explored five foundational GAI models: GANs, diffusion models, transformer models, VAEs, and autoregressive models. These models are not limited to single functions but instead are versatile tools capable of performing a wide range of tasks essential for brain disease diagnosis and research. Although this review was aimed at thoroughly describing the applications of GAI in brain MRI image analysis and exploring the potential of various models, several areas warrant further discussion. This article focused primarily on specific GAI models and application scenarios, which might not encompass all latest developments and interdisciplinary research findings in the field. Additionally, although the integration and applications of GAI technology in clinical practice were described, challenges such as system compatibility, physician acceptance, and the complexities of regulatory approval were not explored in detail. Our discussion of ethical and privacy issues was also relatively general; future research should further analyze the potential effects of these issues on clinical applications. Additionally, although we included some model comparisons and analyses, we acknowledge that room for improvement exists regarding the details of performance evaluation and model selection for specific application scenarios. We hope that these initial discussions will lay a groundwork for more in-depth research and encourage further empirical studies to verify and expand the application of these technologies in clinical practice.
The future application of GAI models in brain MRI imaging for brain diseases holds tremendous potential across various dimensions. Brain diseases are multifaceted, and involve structural, functional, and molecular changes in the brain [98,99]. Integrating MRI data with other imaging modalities such as positron emission tomography, as well as genetic, clinical, and biomarker data, may provide a comprehensive view of the disease. Future GAI models should be designed to handle and fuse these diverse data types, and leverage their complementary strengths to increase diagnostic accuracy and offer deeper insights into disease mechanisms. Such a holistic approach might lead to the identification of novel biomarkers and the development of more targeted therapeutic strategies. Additionally, the integration of GAI into clinical workflows as real-time diagnostic support systems is expected in the future. These systems could continually analyze incoming MRI scans and provide instantaneous feedback to clinicians. This capability may be particularly valuable in emergency settings or during routine check-ups, when timely decision-making is critical. Future work should focus on optimizing these systems for speed and accuracy, and ensuring that they are user-friendly and can seamlessly integrate with the existing medical infrastructure. Additionally, incorporating explainability features will help clinicians understand and trust the AI’s recommendations [100]. Moreover, brain diseases progress differently in each patient, thereby necessitating personalized treatment approaches. GAI models can analyze individual patient data, including longitudinal MRI scans, to predict disease progression and response to various treatments. By identifying specific patterns and trajectories in brain changes, these models can assist in tailoring treatment plans optimized for each patient’s unique condition. Future research should be aimed at enhancing the predictive capabilities of GAI models, as well as incorporating patient-specific variables, such as comorbidities, lifestyle factors, and genetic predispositions. Finally, as GAI becomes more integrated into the diagnosis and treatment of brain diseases, ethical and privacy considerations become paramount [101]. The use of sensitive medical data necessitates stringent measures to protect patient privacy and confidentiality. Future research should focus on developing robust data encryption and anonymization techniques to safeguard patient information. Ethical considerations also include ensuring fairness and avoiding biases in AI models, by training models on diverse datasets, and implementing mechanisms to detect and mitigate biases that could lead to unequal treatment outcomes. Transparency in AI decision-making processes and obtaining informed consent from patients are crucial to maintaining trust and ensuring ethical use of AI [102]. In conclusion, the integration of GAI in the diagnosis and treatment of brain diseases through brain MRI imaging is poised to revolutionize the field. Continued interdisciplinary collaboration and innovation will be essential to fully unlock the potential for transforming brain disease diagnosis and care.