Explainable Federated Learning for Enhanced Privacy in Autism Prediction Using Deep Learning

Alshammari, Naif Khalaf; Alhusaini, Adel Abdullah; Pasha, Akram; Ahamed, Shaik Sayeed; Gadekallu, Thippa Reddy; Abdullah-Al-Wadud, Mohammad; Ramadan, Rabie Abdeltawab; Alrashidi, Mohammed Hamad

doi:10.57197/JDR-2024-0081

INTRODUCTION

The increasing prevalence of autism spectrum disorder (ASD) among children underscores the urgent need for early and accurate screening methods (Kuttala et al., 2022; Shi et al., 2022). Existing approaches face significant challenges such as prolonged waiting times and ethical concerns regarding data privacy (Coppola et al., 2019; Pasha and Latha, 2020; Parsons, 2021; Kaimara et al., 2022; Pasha et al., 2024). The digitization of screening tools presents a transformative opportunity to address these issues (Levy et al., 2020; Dahiya et al., 2021; Desideri et al., 2021).

This study aims to develop efficient and privacy-preserving methods for predicting ASD in toddlers, leveraging advanced technologies. Current ASD screening methods encompass various approaches, each with distinct advantages and challenges. Behavioral observations and questionnaire-based methods offer valuable insights but suffer from subjectivity and recall bias (Lee et al., 2024; Racine et al., 2024). Neuroimaging and biomarker-based techniques hold promise but are resource-intensive (Wang and Fu, 2023; Zhao et al., 2023). Machine learning (ML) models provide objectivity but face challenges in interpretability (Wang and Fu, 2023; Zhao et al., 2023).

Integrating multiple modalities presents a potential solution to enhance screening efficiency. Despite these advancements, limitations persist, including the need for more scalable and objective methodologies. Ethical and privacy concerns further underscore the necessity for novel approaches in ASD screening. In recent years, the convergence of federated learning (FL) and deep learning (DL) has emerged as a promising avenue, offering a paradigm shift in predictive healthcare analytics.

This research addresses these critical gaps by pioneering an explainable federated learning (XFL) approach for privacy-preserving ASD prediction. By integrating DL models within an FL framework, this study aims to enhance screening efficiency while ensuring robust privacy preservation. The amalgamation of these technologies not only improves prediction accuracy and interpretability but also addresses the limitations of current methodologies in pediatric healthcare.

Therefore, this research aspires to make significant strides in the domain of pediatric healthcare by contributing to the development of robust ASD screening tools that are efficient, privacy preserving, and technologically advanced. This research makes seminal contributions to the field by introducing an innovative framework—XFL—for privacy-preserving ASD prediction. The amalgamation of DL models with FL not only ensures individual privacy but also enhances prediction accuracy and interpretability. Primarily, this research study addresses critical gaps in existing literature related to ASD screening in toddlers, laying the foundation for a new era of predictive healthcare analytics.

Aligned with the limitations of existing methodologies and driven by the outcomes of this research, the objectives and the contributions of this research study are as mentioned below:

Enhancing efficiency: improve the efficiency of ASD screening processes to facilitate early identification and intervention.
Privacy preservation: develop and implement an FL framework to address data privacy concerns, enabling collaborative model training across decentralized datasets.
Leveraging DL models: employ advanced DL models to enhance the accuracy of ASD predictions and contribute to the development of more precise screening tools.

The subsequent sections of this article are organized to explore into the intricacies of this research study. The Literature Review section provides an extensive review of existing literature, exploring current ASD screening methods, applications of FL in healthcare, and the imperative role of explainable artificial intelligence (XAI) in medical diagnosis. The Materials and Methods section outlines the materials and methods employed, consisting of dataset details, preprocessing steps, and the FL framework. The Experimentation section expounds on the experimentation process, while the Results and Discussion section explores into the results and their far-reaching implications. The article culminates in the Conclusion and Future Enhancements section, offering reflections on the research and outlining promising avenues for future endeavors.

LITERATURE REVIEW

This section provides a comprehensive overview of the existing body of knowledge related to ASD screening methods, FL in healthcare, XAI in medical diagnosis, and a research gap analysis.

ASD screening approaches

ASD screening has been a critical area of research, with various approaches employed to detect and diagnose ASD in early stages. This subsection provides a comprehensive review of existing ASD screening methods, emphasizing their strengths, limitations, and the growing need for more efficient and accurate approaches.

Behavioral observations: Traditional ASD screening methods often rely on behavioral observations, where clinicians assess a child’s social interactions, communication skills, and repetitive behaviors (Balasubramanian et al., 2024; Kamp-Becker, 2024). While these methods have been valuable, they are subjective and time-consuming, and depend on the expertise of the observer. Moreover, they may not capture subtle signs of ASD, leading to potential misdiagnoses.
Questionnaire-based approaches: Several standardized questionnaires, such as the Modified Checklist for Autism in Toddlers and the Social Communication Questionnaire, have been widely used for ASD screening (Ghosh et al., 2021; Mujeeb Rahman and Monica Subashini, 2022; Lee et al., 2024). These tools involve parent or caregiver responses to a set of questions about a child’s behavior. While questionnaire-based approaches offer a systematic way to gather information, they also pose challenges related to subjectivity, recall bias, and varying interpretation by caregivers.
Neuroimaging and biomarker-based techniques: Advancements in neuroimaging technologies, including functional magnetic resonance imaging and electroencephalography, have opened new avenues for ASD screening (Duncan et al., 2024; Frye et al., 2024; Wilkes et al., 2024). Additionally, researchers have explored biomarker-based approaches, investigating potential genetic and biochemical markers associated with ASD. While promising, these techniques often require sophisticated equipment, are resource-intensive, and may lack the scalability needed for widespread early screening.
ML-based approaches: In recent years, ML algorithms have gained attention for their potential in ASD screening (Abdelwahab et al., 2024; Rasul et al., 2024). This includes the utilization of various data modalities such as eye-tracking data, speech patterns, and physiological signals. ML models offer the advantage of objectivity and can learn complex patterns that might escape human observation. However, challenges remain in terms of interpretability, generalization to diverse populations, and the need for large, diverse datasets.
Integration of multiple modalities: Emerging research focuses on integrating information from multiple modalities for a more comprehensive ASD screening (Dcouto and Pradeepkandhasamy, 2024; Gao et al., 2024). This involves combining behavioral observations, questionnaire responses, and data from neuroimaging or physiological sensors. Such an integrative approach aims to leverage the strengths of each modality while compensating for their individual limitations.

It is evident from the current literature that existing ASD screening methods consist of a range of approaches, each with its own set of advantages and challenges. The literature highlights the need for more efficient, scalable, and objective screening methods, paving the way for the investigation of innovative techniques, including those based on ML and multimodal data integration.

Limitations and challenges

Despite the diverse array of ASD screening methods, there exist several limitations and challenges that researchers and clinicians face in their practical implementation. This subsection critically examines the shortcomings associated with current ASD screening approaches and outlines the challenges that necessitate novel solutions.

Subjectivity and observer bias: Traditional methods relying on behavioral observations often introduce subjectivity and observer bias into the screening process. Clinicians’ interpretations of a child’s behavior may vary, leading to inconsistent results. Moreover, subjective assessments may not capture the subtle or context-dependent signs of ASD, resulting in potential misdiagnoses or delayed interventions.
Reliability and validity of questionnaire-based approaches: Questionnaire-based approaches heavily depend on caregivers’ ability to accurately report a child’s behaviors. This introduces challenges related to recall bias, as well as the caregiver’s interpretation of the questions. Additionally, the validity of these questionnaires may vary across different populations, cultural contexts, and socioeconomic backgrounds, limiting their generalizability.
Resource intensiveness of neuroimaging and biomarker-based techniques: While neuroimaging and biomarker-based techniques hold promise, their widespread application is impeded by resource-intensive requirements. Access to advanced imaging equipment, trained professionals, and the high cost associated with these procedures may limit their scalability, particularly in resource-constrained settings. Additionally, the interpretation of neuroimaging results requires specialized expertise.
Interpretability and generalization in ML-based approaches: ML models, while demonstrating potential in ASD screening, present challenges related to interpretability. Understanding how these models arrive at specific predictions is important for gaining trust from clinicians and caregivers. Furthermore, ensuring the generalization of ML models across diverse populations, considering factors like age, gender, and cultural differences, remains a complex task.
Lack of standardization and consistency: The absence of standardized screening protocols across healthcare systems contributes to variations in the assessment processes. Standardization is important for ensuring consistent and reliable results. A lack of consensus on the optimal combination of screening modalities and the absence of universally accepted diagnostic criteria further hinder progress in the field.
Ethical and privacy concerns: The integration of technology, especially in ML-based approaches, raises ethical concerns related to privacy and data security. The collection and sharing of sensitive health-related data for screening purposes require robust privacy-preserving strategies to protect individuals from potential misuse or unauthorized access.

Therefore, the limitations and challenges associated with current ASD screening methods emphasize the need for innovative and integrative approaches. Addressing these issues is important for enhancing the accuracy, accessibility, and ethical considerations of ASD screening, ultimately leading to more effective early interventions and improved outcomes for individuals with ASD.

FL in healthcare

FL, a decentralized ML paradigm, has gained prominence in healthcare due to its ability to facilitate collaborative model training across distributed data sources without sharing raw data. This subsection provides an overview of the applications of FL in healthcare, with a specific focus on its relevance to ASD screening.

Collaborative model training in healthcare: FL addresses challenges associated with data silos by enabling model training on data stored across multiple healthcare institutions or devices (Lakhan et al., 2024). In the context of ASD screening, this collaborative approach allows leveraging diverse datasets from various clinics, ensuring the model’s exposure to a broader spectrum of demographic and clinical characteristics.
Privacy preservation in clinical data: One of the key advantages of FL is its emphasis on privacy preservation (Kiarashi et al., 2024). In healthcare, where sensitive patient information is involved, FL ensures that raw data never leave the local servers. Instead, only model updates, typically in the form of gradients, are exchanged between the central server and participating nodes. This privacy-centric approach aligns with ethical standards and regulations governing medical data.
Decentralized learning for ASD heterogeneity: ASD exhibits significant heterogeneity, with diverse manifestations and subtypes (Rabot et al., 2023). FL accommodates this heterogeneity by allowing models to be trained on local datasets that capture unique characteristics of specific ASD subpopulations. This decentralized learning approach enhances the model’s ability to generalize across diverse ASD profiles.
Real-time learning and adaptation: FL supports real-time model updates based on the latest information available across distributed nodes. This capability is particularly advantageous in healthcare settings where dynamic changes in patient characteristics or emerging patterns in ASD prevalence may require prompt adaptation of screening models. The decentralized nature of FL ensures agility in responding to evolving healthcare scenarios.
Enhanced security and compliance: Security concerns are paramount in healthcare, and FL addresses these by minimizing data exposure (Luo et al., 2024). The collaborative nature of model training occurs without the need to share identifiable patient information, reducing the risk of unauthorized access or breaches. This approach aligns with regulatory frameworks such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States.
Challenges and future directions: While FL holds promise, challenges such as communication efficiency, model aggregation strategies, and dealing with imbalanced datasets need further exploration. Future research should focus on optimizing FL methodologies for specific healthcare applications, including ASD screening, to unlock their full potential in improving diagnostic accuracy and individualized care.

It is evident from the current literature that FL emerges as a transformative approach in healthcare, offering a privacy-preserving, collaborative, and adaptable framework for model training. Its application to ASD screening holds the potential to address data heterogeneity challenges and enhance the overall efficacy of predictive models in identifying ASD in early developmental stages.

Privacy preservation in healthcare

Ensuring privacy is a critical consideration in healthcare, especially when dealing with sensitive patient data. This subsection explores into the privacy preservation aspects inherent in FL and its significance within the healthcare domain.

Decentralized model training for privacy: FL operates on a decentralized premise, which inherently preserves privacy by design. In a healthcare context, where patient confidentiality is paramount, this decentralized model training ensures that raw patient data never leave the local servers. Instead, only model updates, typically represented as gradients, are exchanged between the central server and the participating healthcare nodes. This decentralized architecture minimizes the risk of exposing identifiable patient information, aligning with stringent privacy regulations.
Compliance with data protection regulations: The healthcare sector is subject to various data protection regulations, such as the HIPAA in the United States and the General Data Protection Regulation in Europe. FL aligns with these regulations by limiting the transmission of patient data. The collaborative model training occurs without the need for centralized data storage, reducing the potential for unauthorized access or data breaches. This compliance ensures that healthcare institutions can leverage advanced ML techniques while upholding the highest standards of data privacy.
Differential privacy techniques: In addition to its decentralized nature, FL often incorporates differential privacy (DP) techniques to further enhance privacy preservation. DP aims to protect individual data points by introducing noise during the model training process, making it challenging to discern the contribution of any single data point. This approach adds an extra layer of privacy protection, particularly important in healthcare applications where even slight compromises in patient data confidentiality are unacceptable.
Patient consent and control: FL respects the principle of patients’ consent and control over their health data. Since the data remain within the local jurisdiction of healthcare providers, patients have a more direct say in how their information is used for model training. This transparency and patient empowerment contribute to building trust in the healthcare system’s adoption of advanced ML technologies.
Challenges and ongoing research: While FL offers robust privacy preservation mechanisms, challenges persist, including communication overhead, the risk of model inversion attacks, and addressing heterogeneity across distributed datasets. Ongoing research aims to refine these techniques, ensuring that FL remains a reliable and privacy-preserving approach for advancing healthcare applications.

Therefore, privacy preservation is a cornerstone of FL in healthcare. By embracing a decentralized and DP-oriented framework, FL provides an ethical and regulatory-compliant avenue for leveraging the collective intelligence embedded in distributed healthcare datasets without compromising patient privacy.

XAI in medical diagnosis

Explainability in AI holds paramount significance, particularly in healthcare, where the interpretability of model decisions is critical for gaining trust from medical professionals, patients, and regulatory bodies. This subsection demonstrates the pivotal role of XAI in medical diagnosis and its implications for the adoption of ML models in clinical settings.

Trust and acceptance in clinical decision-making: Healthcare professionals rely on AI models for making informed decisions about patient diagnoses and treatment plans. The “black-box” nature of complex ML models can impede their acceptance in clinical practice. Explainability addresses this challenge by providing insights into how the AI system arrives at a particular decision. Clinicians can better trust and embrace AI-assisted diagnostics when they understand the rationale behind the model’s predictions, enhancing the collaborative relationship between AI algorithms and medical practitioners.
Identifying model biases and limitations: Explainability tools play an important role in uncovering biases and limitations within AI models. In healthcare, where fairness and unbiased decision-making are paramount, it is essential to identify and rectify any disparities in the model’s predictions. By comprehending the features that contribute to the model’s decisions, clinicians and data scientists can pinpoint potential biases, enabling interventions to ensure equitable healthcare outcomes for diverse patient populations.
Facilitating communication with stakeholders: XAI facilitates effective communication between AI developers, healthcare providers, and patients. Transparent models empower developers to convey the inner workings of the algorithm to medical professionals, ensuring alignment with clinical workflows. Moreover, when patients can understand the reasoning behind AI-generated recommendations, it fosters a sense of transparency and trust. This clear communication is vital for obtaining informed consent from patients who may be hesitant about the adoption of AI technologies in their healthcare journey.
Regulatory compliance and ethical considerations: Healthcare AI applications must adhere to stringent regulatory standards and ethical guidelines. XAI aligns with these requirements by providing a clear audit trail of the decision-making process. Regulatory bodies, such as the US Food and Drug Administration and the European Medicines Agency, increasingly emphasize the importance of model interpretability in gaining approval for AI-based medical devices. Ethical considerations, such as patient privacy and informed consent, are better addressed through transparent AI systems that allow stakeholders to comprehend and scrutinize the decision-making process.
Interpreting complex medical data: In medical diagnosis, AI models often analyze complex and multifaceted data, including images, genomic information, and clinical records. Explainability tools aid in translating these intricate data patterns into human-understandable insights. Clinicians can leverage model explanations to corroborate AI-generated findings with their domain expertise, ensuring that the collective intelligence of AI and human expertise synergistically contributes to accurate and reliable medical diagnoses.
Current trends and future directions: The field of XAI in healthcare is dynamic, with ongoing efforts to develop more interpretable models and standardized evaluation metrics for explainability. As the adoption of AI in medical settings continues to grow, research endeavors focus on creating explainability techniques that balance transparency with the complexity inherent in healthcare data.

It is evident from the studies conducted that the integration of XAI in medical diagnosis addresses important aspects of trust, fairness, communication, regulatory compliance, and the interpretability of complex healthcare data. By emphasizing transparency and accountability, XAI paves the way for the responsible and ethical deployment of ML models in healthcare applications.

Research gap analysis

This research study critically examined the existing literature to identify research gaps and limitations in the context of ASD prediction using ML, FL, and privacy-preserving strategies. Understanding these gaps is important for shaping the objectives of the current research challenges and contributing to the advancement of the field.

Limited integration of FL in ASD prediction: While ML models have shown promise in ASD prediction, there is a noticeable gap in the literature regarding the integration of FL for privacy-preserving ASD prediction. FL, which enables model training across decentralized data sources without raw data sharing, remains underexplored in the specific domain of ASD screening for toddlers. Bridging this gap can lead to the development of robust models that respect individual data privacy, thereby addressing a critical concern in healthcare applications.
Insufficient explainability in FL models: Explainability in ML models, especially those deployed in healthcare, is essential for gaining trust from both medical professionals and individuals undergoing screening. However, the literature lacks comprehensive discussions on XFL models for ASD prediction. Incorporating local interpretable model-agnostic explanations (LIME) or similar techniques into FL frameworks can enhance the interpretability of the models, providing insights into the decision-making process and improving the acceptance of AI-driven ASD prediction tools.
Limited exploration of privacy-preserving techniques: Privacy-preserving strategies are indispensable in healthcare AI applications to comply with ethical standards and regulations. Existing literature often lacks an in-depth exploration of the variety of privacy-preserving techniques suitable for ASD prediction. Investigating and incorporating advanced cryptographic methods, DP, and homomorphic encryption within FL can contribute to the development of a comprehensive privacy-preserving framework for ASD prediction, ensuring the confidentiality of sensitive medical data.
Lack of comparative studies across multiple models: While individual ML models have been proposed for ASD prediction, there is a scarcity of comprehensive comparative studies across various algorithms within the context of FL. Understanding the relative performance of models such as k-nearest neighbor (KNN), locally linear embedding (LLE), autoencoder, quadratic discriminant analysis (QDA), and multi-layer perceptron (MLP) in an FL setup is essential for identifying the most effective approaches. Such comparative studies are often critical for guiding the selection of models in real-world applications.
Inadequate exploration of toddler-specific features: ASD prediction for toddlers demands a critical understanding of developmental features specific to this age group. Existing literature often lacks a thorough exploration of toddler-specific features in the context of ASD prediction models. Investigating and incorporating developmental milestones, behavioral patterns, and other age-specific factors can significantly enhance the accuracy and relevance of ASD prediction models for this critical demographic.
Future directions in addressing research gaps: Addressing the identified research gaps necessitates a multifaceted approach involving the exploration of FL, enhancement of model explainability, incorporation of advanced privacy-preserving techniques, conducting comparative studies, and a critical consideration of toddler-specific features. The research conducted in this study aims to contribute to filling these gaps by proposing an XFL approach for privacy-preserving ASD prediction, emphasizing the integration of advanced ML models within an FL framework.

Therefore, the literature review sets the foundation for the proposed research by examining the strengths and weaknesses of existing ASD screening methods, exploring FL as a privacy-preserving solution, emphasizing the importance of explainability in healthcare AI, and identifying specific gaps that the current research seeks to address.

MATERIALS AND METHODS

This section outlines the materials and methods incorporated in this research along with the framework of the proposed study and its mathematical formulation, dataset description, exploratory data analysis (EDA), ML model framework, FL model framework, model explanation using LIME, and performance metrics and evaluation.

Overview of the proposed study

The proposed study introduces a novel framework that integrates ML, FL, and explainability components for privacy-preserving ASD prediction in toddlers. The framework consists of distinct modules, including data preprocessing, model development, FL collaboration, privacy-preserving strategies, and ASD prediction. The proposed methodology is visually represented in Figure 1, illustrating a comprehensive framework for ASD prediction using an XFL approach. The framework consists of distinct modules, each serving an important role in the predictive modeling pipeline. Beginning with “data preprocessing,” the system addresses missing values and encodes categorical variables to ensure data completeness and facilitate feature integration. The subsequent stage, “model development,” involves building non-linear predictive models tailored for ASD screening. “FL” introduces a collaborative paradigm, where multiple clients contribute to model training without centralized data sharing, preserving individual privacy. The module on “privacy-preserving strategies” incorporates measures to uphold data confidentiality within the FL setup. The “ASD prediction” phase applies the trained models to predict ASD in toddlers, contributing to early screening methods. The framework also integrates “XAI (LIME)” for model interpretability and a “comparative study” to evaluate and compare the performance of various models. This systematic approach aims to enhance predictive accuracy while safeguarding privacy and ensuring model interpretability.

Abstract framework of the proposed system showing components related to artificial intelligence (AI), autism spectrum disorder (ASD), and local interpretable model-agnostic explanations (LIME).

Figure 1:

Abstract framework of the proposed system. Abbreviations: AI, artificial intelligence; ASD, autism spectrum disorder; LIME, local interpretable model-agnostic explanations.

The proposed study introduces an innovative framework that leverages DL, ML algorithms, FL, and LIME for privacy-preserving ASD prediction in toddlers. The framework comprises several integral modules:

Data preprocessing ensures data completeness and feature integration, handling missing values and encoding categorical variables.
Model development focuses on constructing non-linear predictive models optimized for ASD screening.
FL collaboration enables model training across distributed healthcare nodes without centralized data sharing, preserving individual privacy.
Privacy-preserving strategies are implemented to safeguard data confidentiality within the FL framework.
ASD prediction applies the trained models to predict ASD in toddlers, enhancing early screening efficacy.
XAI (LIME) provides model interpretability by explaining individual predictions, fostering trust and understanding among clinicians.
Comparative study evaluates and compares the performance of various models within the FL setup. The novelty lies in the integration of FL and LIME within the ASD prediction framework, ensuring both privacy preservation and model interpretability in clinical settings.

Mathematical formulation of the proposed study

This section provides a rigorous mathematical foundation for the key components of the proposed study, offering insights into the formalization of predictive modeling, FL optimization, and the integration of privacy-preserving techniques.

Predictive modeling

The predictive modeling aspect involves the development of algorithms to accurately predict ASD in toddlers. The mathematical formulations detail the representation of the learning process, consisting of the choice of features, parameters, and the optimization objective. Notable techniques such as non-linear modeling and feature transformations are expressed mathematically, providing a clear understanding of the model’s construction. The objective is to formalize the learning process, specifically focusing on the construction of non-linear predictive models for the accurate prediction of ASD traits in toddlers. The central equation used in the study for non-linear predictive model is expressed in Equation (1), where Y represents the predicted ASD traits, X denotes the feature matrix comprising input variables, Θ signifies the model parameters subject to optimization during the training process, f() consists of a non-linear function capturing intricate patterns inherent in the input data, and ε represents the random error term accounting for unobserved factors. This equation encapsulates the essence of the non-linear predictive model, highlighting the relationship between the input features, model parameters, and the predicted ASD traits.

(1)

$Y = f (X, Θ) + ε$

Feature transformation

To enhance the learning process, feature transformation is introduced as shown in Equation (2), where X′ represents the transformed feature matrix and ϕ() signifies a function consisting of feature transformations. Equation (2) formalizes the feature transformation process applied to the original feature matrix (X), introducing non-linearities that contribute to the model’s capacity to discern complex patterns within the input data.

(2)

$X^{'} = ϕ (X)$

FL optimization

FL introduces a collaborative paradigm where models are trained across multiple decentralized clients without sharing raw data. The mathematical formulations in this section demonstrate the optimization strategies employed to synchronize model updates across clients, emphasizing the federated averaging approach. Key parameters governing the FL process, including communication rounds and privacy-preserving parameters, are rigorously defined. FL optimization involves the collaborative model training across decentralized clients while preserving data privacy. The optimization process aims to minimize a global objective function, considering the contributions from individual clients. This is formulated mathematically as shown in Equation (3), where θ represents the global model parameters, K is the total number of clients, N_k is the number of samples on client k, N is the total number of samples, and f_k (θ) is the local objective function on client k.

(3)

$\min F (θ) = \sum_{k = 1}^{K} \frac{N_{k}}{N} f_{k} (θ)$

Federated averaging

The optimization process often involves iterative model updates using federated averaging. In each communication round, local models are updated and aggregated at the server using Equations (4) and (5), where θ_k (t) represents the local model parameters on client k at iteration t and η is the learning rate.

(4)

$θ_{k} (t + 1) = θ_{k} (t) - η \nabla f_{k} (θ_{k} (t))$

(5)

$θ (t + 1) = \frac{1}{K} \sum_{k = 1}^{K} θ_{k} (t + 1)$

The federated optimization framework ensures collaborative learning while addressing privacy concerns associated with centralized data. Privacy parameters, including DP techniques, are also incorporated to enhance data protection during model updates. This formulation establishes the foundation for the FL optimization process, balancing model performance with individual data privacy across a network of distributed clients.

Privacy-preserving techniques

To address privacy concerns associated with medical data sharing, privacy-preserving techniques are integrated into the FL framework. This subsection details the mathematical formulations of privacy-enhancing measures, emphasizing the role of epsilon (ε) for trade-off control between model accuracy and individual data privacy. The incorporation of DP principles is articulated, ensuring compliance with ethical standards and regulations. Privacy-preserving techniques play a pivotal role in ensuring the confidentiality of sensitive data during FL. The incorporation of DP aims to mitigate the risk of exposing individual-level information during model updates. The mathematical formulation for privacy-preserving FL can be expressed through the definition of DP. The objective of DP is defined by the mathematical expressions as shown in Equation (6), where Pr[A(Dataset) ∈ S] represents the probability of obtaining an output S from algorithm A applied to the original dataset, Pr[A(Dataset′) ∈ S] represents the probability of obtaining the same output from a neighboring dataset Dataset′, ε is the privacy parameter controlling the level of privacy, and δ is a parameter representing a small constant ensuring a level of privacy even in exceptional cases.

(6)

$Pr [A (Dataset) \in S] \leq e^{ε} Pr [A (Dataset') \in S] + δ$

Noise injection

To achieve DP, noise is introduced into the model updates during federated averaging. The perturbed update equation representing the Gaussian noise with a mean of 0 and variance σ ² is given by N(0, σ ²). This formulation ensures that the impact of any individual data point on the model update is constrained, thereby safeguarding the privacy of participants in FL. The delicate balance between model accuracy and privacy is achieved through tuning of privacy parameters.

By investigating the mathematical foundations of predictive modeling, FL optimization, and privacy-preserving techniques, this study enhances the transparency and clarity of the research methodology. These formulations provide a robust theoretical foundation for the subsequent stages of implementation and evaluation in the proposed study.

Dataset description

Original dataset overview

The dataset employed in this study pertains to the screening of ASD in toddlers employed by Thabtah (2017). The ASD screening dataset is described in Table 1, which consists of detailed information relevant to ASD diagnosis in toddlers. Comprising features obtained from the Q-Chat-10 (Quantitative Checklist for Autism in Toddlers) questionnaire, responses are encoded as binary values, contributing to a critical evaluation of toddler behavior. Additional attributes collected during the screening app’s submission process augment the dataset, facilitating a comprehensive understanding of individual characteristics. The class variable is automatically assigned based on screening scores, indicating potential ASD traits.

Table 1:

ASD toddlers’ data description.

Attribute	Attribute type	Data type	Description
A1-A10	Nominal/categorical	Binary	Q-Chat-10 items mapped
Other features	Nominal/categorical, continuous, binary	Binary, continuous	Additional features collected from the ASD test screening app
Class	Nominal/categorical	Binary	Based on the screening score

Abbreviation: ASD, autism spectrum disorder; Q-Chat-10: quantitative checklist for autism in toddlers.

Modified dataset configuration

The strategic modifications were applied to the original ASD screening dataset to enhance its efficacy. A configuration process involves both the addition and removal of attributes, refining the dataset for improved ASD screening accuracy. Attribute additions consist of a thoughtful incorporation of relevant features, while careful removals streamline the dataset, eliminating redundancies and optimizing its suitability for ASD prediction. The rationale behind each modification is to bring the dataset that aligns with the specific requirements of accurate and privacy-preserving ASD screening.

Exploratory data analysis

A comprehensive exploration of the dataset is conducted in this study, involving a thorough examination of statistical properties, pattern identification, and visualization of relationships between variables. This step offers valuable insights into the inherent characteristics of the data. Through statistical analyses and visual representations, the study aims to uncover underlying patterns, trends, and potential correlations within the dataset. By incorporating the exploratory aspects of the data, the research gains a foundational understanding that informs subsequent stages of analysis and model development.

ML model framework

This module explores the development of ML models for ASD prediction, detailing the selection of algorithms, training procedures, and parameter setting (Pedregosa, 2011). The sequence diagram in Figure 2 provides a detailed representation of the chronological steps involved in constructing ML classification models for ASD prediction.

Sequence diagram illustrating the development of machine learning (ML) classification models. Abbreviations include KNN (k-nearest neighbor), LLE (locally linear embedding), PCA (principal component analysis), and QDA (quadratic discriminant analysis).

Figure 2:

Sequence diagram of development of ML classification models. Abbreviations: KNN, k-nearest neighbor; KSVM, kernel support vector machine; LLE, locally linear embedding; ML, machine learning; PCA, principal component analysis; QDA, quadratic discriminant analysis.

FL model framework

An in-depth exploration of the FL framework, emphasizing collaborative model training across multiple clients while preserving individual data privacy, is presented in this section. The sequence diagram depicted in Figure 3 demonstrates the intricate process involved in creating and refining models within the FL framework (Abadi et al., 2016).

Sequence diagram illustrating the development of federated learning (FL) models. Abbreviations include FL (federated learning), LLE (locally linear embedding), MLP (multi-layer perceptron), NN (neural network), QDA (quadratic discriminant analysis), and RNN (recurrent neural network).

Figure 3:

Sequence diagram of development of FL models. Abbreviations: DNN, deep neural network; FL, federated learning; KSVM, kernel support vector machine; LLE, locally linear embedding; MLP, multi-layer perceptron; NN, neural network; QDA, quadratic discriminant analysis; RNN, recurrent neural network.

Model explanation with LIME framework

This section presents the integration of LIME to enhance model interpretability by giving details of the initialization of the LIME explainer and its application after communication rounds. The sequence diagram presented in Figure 4 provides a detailed portrayal of the process involved in incorporating XAI techniques into the developed models (Schmitz, 2012).

Sequence diagram illustrating the development of explainable artificial intelligence (XAI) models. Abbreviations include KNN (k-nearest neighbor), LIME (local interpretable model-agnostic explanations), LLE (locally linear embedding), PCA (principal component analysis), QDA (quadratic discriminant analysis), SVM (support vector machine), and XAI (explainable artificial intelligence).

Figure 4:

Sequence diagram of XAI models. Abbreviations: KNN, k-nearest neighbor; LIME, local interpretable model-agnostic explanations; LLE, locally linear embedding; PCA, principal component analysis; QDA, quadratic discriminant analysis; SVM, support vector machine; XAI, explainable artificial intelligence.

Overall experimental setup

The experimental framework was designed to address the research objectives cohesively. Leveraging a curated ASD screening dataset for toddlers, this approach commenced with robust data preprocessing, ensuring data quality and completeness. A suite of ML algorithms, both traditional and federated, were configured for ASD prediction. Privacy-preserving strategies were then integrated into the FL paradigm to uphold ethical standards and safeguard sensitive information. The experimental design also incorporated LIME for model transparency. Performance metrics, including global loss and accuracy, provided a comprehensive evaluation of the proposed framework. This setup laid the foundation for a systematic exploration of ASD prediction with an emphasis on privacy preservation and model interpretability.

Performance metrics and evaluation

This section outlines the performance metrics employed to rigorously evaluate the effectiveness of the proposed FL framework for privacy-preserving autism prediction using DL. A multifaceted approach is adopted, consisting of global loss and accuracy calculations, summary statistics, and performance visualization. Global loss (L) stands as a pivotal metric, quantifying the disparity between true and predicted labels:

(7)

$L = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2},$

where N denotes the total number of test samples, y_i signifies the true labels, and ŷ_i represents the predicted label. Mean squared error (MSE) encapsulates the average squared differences, providing a critical measure of prediction accuracy. The accuracy metric (Acc) complements global loss by quantifying the model’s precision.

(8)

$Acc = \frac{Number of correct predictions}{Total number of test instances} \times 100$

Performance visualization

The trends of communication time, global loss, and global accuracy, over multiple communication rounds are visually depicted. These visualizations enhance the interpretability of the FL process. Plots illustrating these trends are systematically generated and stored, facilitating a comprehensive assessment of the framework’s performance. The selection of these metrics is supported by their capacity to comprehensively capture different facets of the model’s performance. Global loss and accuracy afford a detailed understanding of predictive precision, while summary statistics offer insights into dataset characteristics. Visualizations, on the other hand, provide an intuitive representation of the FL dynamics.

The performance evaluation strategy ensures a critical and rigorous analysis of the proposed FL framework, aligning with the overarching objective of privacy-preserving autism prediction through DL methodologies. The chosen metrics collectively contribute to a thorough assessment of model efficacy and generalization in the context of the specified research goals. Therefore, the proposed methodology forms the backbone of the research, facilitating a systematic and transparent approach to privacy-preserving ASD prediction using DL in toddlers.

EXPERIMENTATION

Experimental setup

The experimental phase was designed to capitalize on a diverse dataset obtained from Toddler CSV data, featuring crucial attributes for ASD prediction like demographic details, behavioral traits, and developmental features. Prior to analysis, categorical variables underwent one-hot encoding, and min–max scaling was applied for uniform data normalization. The dataset was partitioned into training and testing subsets using an 80–20 split, enabling robust model training and evaluation to ensure generalization.

Various ML algorithms, including support vector machine (SVM) with distinct kernels, isolation forest, KNN with LLE, and QDA with principal component analysis (PCA), were employed, each configured with specific parameters as detailed in Table 2. Incorporating FL added complexity, and training data distributed among simulated clients equipped with local models, such as KNN, LLE, autoencoder, QDA, and MLP, were employed, each configured with specific parameters as detailed in Table 3. This arrangement reflects the real-world scenarios prioritizing data privacy.

Table 2:

Parameter setting for ML classifiers without FL.

Model	Parameters	Values
SVM	Kernel	Poly
Isolation forest	—	Default configuration
SVM	Kernel	RBF
KNN	Number of neighbors	5
QDA	—	Default configuration
MLP	Hidden layers	[128, 64, 32]

Abbreviations: FL, federated learning; KNN, k-nearest neighbor; ML, machine learning; MLP, multi-layer perceptron; QDA, quadratic discriminant analysis; RBF, radial basis function; SVM, support vector machine.

Table 3:

Parameter setting for ML classifiers with FL.

Model	Parameters	Values
KNN	Number of neighbors	5
LLE	n_neighbors	10
Autoencoder	Hidden layers	[64, 32, 16]
QDA	—	—
MLP	Hidden layers	[128, 64, 32]

Abbreviations: FL, federated learning; KNN, k-nearest neighbor; LLE, locally linear embedding; ML, machine learning; MLP, multi-layer perceptron; QDA, quadratic discriminant analysis.

The FL process unfolded over multiple communication rounds, incorporating DP mechanisms like Laplace noise addition. Privacy parameters, particularly ε, were fine-tuned for a balanced trade-off between individual privacy and model accuracy, as detailed in Table 4. This experimental setup, intricately detailed in Tables 2, 3, and 4, laid the groundwork for subsequent stages, by integrating privacy-preserving strategies, robust training, and rigorous model evaluation. The combination of advanced algorithms and privacy-enhancing techniques aimed to contribute to efficient and secure ASD screening methods.

Table 4:

Parameter setting for privacy-preserving techniques.

Technique	Parameters	Values
Differential privacy	Epsilon (ε)	Fine-tuned for privacy
Laplace noise	—	Default configuration

Experimental challenges and mitigations

The experimentation phase faced inherent challenges in the development of privacy-preserving ASD prediction using FL. Addressing these challenges was important for ensuring the reliability and validity of the study.

Data heterogeneity posed a challenge due to variations in toddler demographic characteristics and behavioral traits, impacting model generalization. A preprocessing strategy involving feature scaling and normalization was implemented to ensure uniform data representation.
Communication overhead in FL setups was mitigated through optimized communication protocols and compression techniques. Privacy concerns were addressed by incorporating DP mechanisms, including Laplace noise addition, introducing a privacy–accuracy trade-off requiring careful parameter tuning.
Convergence issues during FL training rounds were managed using adaptive learning rates and careful model weight initialization. Computational resource constraints were addressed with optimizations such as parallelized computation and efficient graphics processing unit utilization.
Ensuring interpretability and explainability of models, especially in ASD prediction, was tackled by integrating LIME. Systematically identifying and mitigating these challenges aimed to establish a robust foundation for evaluating the proposed privacy-preserving ASD prediction framework.

RESULTS AND DISCUSSION

EDA insights

In this section, the insights derived from the EDA conducted at the initial stage of this research study are presented, setting the groundwork for subsequent experimentation phases. Figure 5 provides insights into the class distribution of the toddler dataset, offering a foundational understanding of the distribution of ASD traits. Figure 6 presents a pair plot, revealing pairwise relationships among features, aiding in the identification of potential patterns and correlations. Additionally, Figure 7 explores individual feature distributions, providing valuable insights into their statistical characteristics, which are crucial for preprocessing decisions and model selection. The correlation matrix in Figure 8 unveils relationships between different features, aiding in understanding variable dependencies for subsequent predictive modeling. These EDA findings collectively lay a robust foundation for informed decision-making, ensuring a comprehensive understanding of the toddler dataset’s features, essential for the development of privacy-preserving autism prediction model.

Visualization showing insights into class distribution, with a focus on autism spectrum disorder (ASD).

Figure 5:

Class distribution insights. Abbreviation: ASD, autism spectrum disorder.

Visualization exploring pairwise relationships between variables.

Figure 6:

Pairwise relationship exploration.

Analysis depicting the distribution of features, with a focus on autism spectrum disorder (ASD).

Figure 7:

Feature distribution analysis. Abbreviation: ASD, autism spectrum disorder.

Insights from a correlation matrix analysis, with a focus on autism spectrum disorder (ASD).

Figure 8:

Correlation matrix insights. Abbreviation: ASD, autism spectrum disorder.

Comparative analysis without FL

In this section, the second stage of experimentation outcomes, focusing on applying various ML classifiers without FL, is presented to establish a baseline for predicting ASD traits in toddlers. Figure 9 offers a comparative analysis of ML classifiers, providing baseline performance metrics, and Figure 10 delves deeper into receiver operating characteristic (ROC) curves, offering critical insights into individual classifiers’ capabilities. Detailed results for each classifier are presented: SVM with polynomial kernel (poly kernel), isolation forest, SVM with radial basis function (RBF) kernel, KNN with LLE, and QDA with PCA. Notably, the isolation forest and QDA with PCA emerge as top performers, showcasing high accuracy, recall, F1 scores, and Cohen’s kappa values. These results serve as an important reference for evaluating subsequent model performance within the FL framework, highlighting the notable effectiveness of isolation forest and QDA with PCA in predicting ASD traits in toddlers without federated collaboration.

Comparison of performance among machine learning (ML) algorithms: KNN (k-nearest neighbor), LLE (locally linear embedding), PCA (principal component analysis), QDA (quadratic discriminant analysis), and SVM (support vector machine).

Figure 9:

Comparative analysis of ML algorithm performance. Abbreviations: KNN, k-nearest neighbor; KSVM, kernel support vector machine; LLE, locally linear embedding; ML, machine learning; PCA, principal component analysis; QDA, quadratic discriminant analysis; SVM, support vector machine.

ROC curves illustrating the performance of ML classifiers without federated learning (FL), including KNN (k-nearest neighbor), LLE (locally linear embedding), PCA (principal component analysis), QDA (quadratic discriminant analysis), and SVM (support vector machine).

Figure 10:

ROC curves of ML classifiers without FL. Abbreviations: FL, federated learning; KNN, k-nearest neighbor; LLE, locally linear embedding; ML, machine learning; PCA, principal component analysis; QDA, quadratic discriminant analysis; ROC, receiver operating characteristic; SVM, support vector machine.

Comparative analysis with FL

In the third stage of experimentation, the transition to FL involves collaborative training of ML classifiers across multiple clients to ensure data privacy. This section conducts a comprehensive comparative analysis of FL models and their non-federated counterparts. Figure 11 presents an evaluation of FL models, with each model showcasing distinct patterns of improvement in global accuracy and reduction in global loss as communication rounds progress. The kernel support vector machine (KSVM) with moderate neural network (NN), isolation forest with recurrent neural network (RNN), SVM with poly kernel and deep NN, KNN with LLE and autoencoders, and QDA with PCA and MLP are detailed with their global loss and accuracy metrics at the 1st and 100th communication rounds. Notably, the KNN with LLE and autoencoders model consistently demonstrates high performance, maintaining a global accuracy of 0.98. The findings highlight the collaborative nature of FL, providing potential for privacy-preserving model training and showcasing various models’ adaptability to this framework.

Evaluation of federated learning (FL) models using various algorithms: poly kernel, KNN (k-nearest neighbor), LLE (locally linear embedding), MLP (multi-layer perceptron), NN (neural network), PCA (principal component analysis), QDA (quadratic discriminant analysis), and RNN (recurrent neural network).

Figure 11:

Evaluation of FL models. Abbreviations: DNN, deep neural network; FL, federated learning; KNN, k-nearest neighbor; KSVM, kernel support vector machine; LLE, locally linear embedding; MLP, multi-layer perceptron; NN, neural network; PCA, principal component analysis; poly kernal, polynomial kernel; QDA, quadratic discriminant analysis; RNN, recurrent neural network.

Explainability results and discussion

This section explores into the interpretability findings using LIME and provides detailed insights into model predictions. The figures accompanying this analysis present a breakdown of communication time, global loss, and global accuracy for different classifiers and their corresponding NN architectures. The first analysis focuses on the SVM with an RBF kernel and a moderate NN, with Figure 12 showcasing LIME explanations that offer insights into decision boundaries and the model’s performance over communication rounds. The subsequent examination centers on the isolation forest model with RNN, detailing communication time, global loss, and global accuracy, accompanied by LIME explanations in Figure 13. The third analysis highlights the SVM with a poly kernel and a deep NN, featuring LIME explanations that enrich understanding and emphasize the impact of communication rounds in Figure 14. The attention then shifts to the KNN model with LLE and autoencoders, providing LIME explanations that illuminate local decision boundaries and feature importance in Figure 15. Lastly, the examination of the QDA–PCA and MLP includes LIME explanations, enhancing interpretability and transparency, as depicted in Figure 16. Overall, these analyses contribute to a comprehensive understanding of the interpretability of different models and their NN architectures in the context of communication time, global loss, and global accuracy.

Application of support vector machine (SVM) with radial basis function (rbf) and moderate neural network (NN), explained using local interpretable model-agnostic explanations (LIME).

Figure 12:

SVM (RBF) with moderate NN (LIME). Abbreviations: KSVM, kernel support vector machine; LIME, local interpretable model-agnostic explanations; NN, neural network; RBF, radial basis function; SVM, support vector machine.

Use of isolation forest with recurrent neural network (RNN), explained using local interpretable model-agnostic explanations (LIME).

Figure 13:

Isolation forest with RNN (LIME). Abbreviations: LIME, local interpretable model-agnostic explanations; RNN, recurrent neural network.

Employment of support vector machine (SVM) with polynomial kernel (poly) and deep neural network (DNN), explained using local interpretable model-agnostic explanations (LIME).

Figure 14:

SVM (poly) with DNN (LIME). Abbreviations: DNN, deep neural network; LIME, local interpretable model-agnostic explanations; SVM, support vector machine.

Application of k-nearest neighbor (KNN) with locally linear embedding (LLE) and autoencoder, explained using local interpretable model-agnostic explanations (LIME).

Figure 15:

KNN LLE with autoencoder (LIME). Abbreviations: KNN, k-nearest neighbor; LIME, local interpretable model-agnostic explanations; LLE, locally linear embedding.

Utilization of quadratic discriminant analysis (QDA) with principal component analysis (PCA) and multi-layer perceptron (MLP), explained using local interpretable model-agnostic explanations (LIME).

Figure 16:

QDA–PCA with MLP (LIME). Abbreviations: LIME, local interpretable model-agnostic explanations; MLP, multi-layer perceptron; PCA, principal component analysis; QDA, quadratic discriminant analysis.

The comprehensive comparative analysis of model explanations in a federated setting, presented in Figure 17, explores into the interpretability of predictions across various models. Notable models, such as the KSVM with moderate NN, isolation forest with RNN, SVM with poly kernel and deep NN, KNN with LLE and autoencoders, and QDA with PCA and MLP, are detailed in their performance during FL and LIME. The top-performing model, identified as the QDA with PCA and MLP, achieves an impressive global accuracy of 98% by the 100th communication round within the FL context. The analysis provides detailed progress updates, including global loss and accuracy at different communication rounds, revealing the dynamic behavior of the models. LIME explanations further enhance interpretability by offering insights into specific communication rounds and samples for various models, such as NNs, RNNs, SVM, KNN, and QDA. This examination highlights the effectiveness of the proposed models in the FL framework, with the QDA model demonstrating superior accuracy and LIME explanations playing a pivotal role in improving transparency and understanding of the decision-making process.

Comparison of model explanations using local interpretable model-agnostic explanations (LIME) across FL (federated learning), MLP (multi-layer perceptron), NN (neural network), PCA (principal component analysis), QDA (quadratic discriminant analysis), RNN (recurrent neural network), and SVM (support vector machine).

Figure 17:

Comparative analysis of model explanation with LIME. Abbreviations: DNN, deep neural network; FL, federated learning; KSVM, kernel support vector machine; LIME, local interpretable model-agnostic explanations; MLP, multi-layer perceptron; NN, neural network; PCA, principal component analysis; QDA, quadratic discriminant analysis; RNN, recurrent neural network; SVM, support vector machine.

Dataset analysis and model evaluation

Dataset description

In this study, we utilized a specialized dataset tailored for autism prediction, consisting of n samples and m fields. Key fields include demographic information (age, gender), clinical observations, behavioral scores, and genetic markers. The dataset is balanced to some extent, with certain classes having more samples than others, which is typical in clinical datasets. This balance is crucial for training robust ML models capable of generalizing well to unseen data.

Evaluation metrics and their importance

A comprehensive evaluation of ML models is vital, especially in sensitive applications like autism prediction. We employed several evaluation metrics to thoroughly assess our models:

Precision: This measures the ratio of true positive predictions to the total predicted positives. It is crucial in medical diagnostics to minimize false positives, ensuring the accuracy of predicted positive cases. Precision values for our models ranged from 0.41 to 0.87 without FL, significantly improving with FL and explainability techniques.
Recall: This is also known as sensitivity and measures the ratio of true positive predictions to the total actual positives. High recall ensures most actual positive cases are correctly identified. Recall values varied without FL, with QDA achieving a perfect recall of 1. These values generally improved with FL and explainability.
F1 score: This is the harmonic mean of precision and recall, providing a balance between the two metrics. It is particularly useful for imbalanced class distributions. The F1 score improved across the board with FL, indicating a better balance between precision and recall.
ROC curve and area under the ROC curve (AUC): The ROC curve plots the true positive rate against the false positive rate at various thresholds. The AUC provides a single scalar value to evaluate model performance across all thresholds. Our models showed significant improvements in AUC with FL and explainability techniques.
Confusion matrix: This provides a detailed breakdown of the model’s performance, showing counts of true positives, true negatives, false positives, and false negatives. This matrix revealed a reduction in both false positives and false negatives with the integration of FL and LIME.
MAE: This measures the average magnitude of errors in predictions, without considering their direction. Results showed a significant reduction in mean absolute error (MAE) with FL, indicating more accurate predictions.
MSE and RMSE: MSE measures the average squared difference between predicted and actual values, while root mean squared error (RMSE) is its square root, providing error magnitude in the same units as the predicted values. Both metrics are sensitive to large errors. The introduction of FL and LIME significantly reduced MSE and RMSE values, demonstrating improved prediction accuracy and robustness.

Detailed results

The dataset’s characteristics and the comprehensive set of evaluation metrics collectively affirm the efficacy of our models, specifically:

Without FL: The QDA with PCA classifier achieved the highest accuracy (0.94) and recall (1.0). Other classifiers showed moderate performance with lower precision, recall, and F1 scores.
With FL: All classifiers demonstrated marked improvements. KSVM, isolation forest, and SVM with a poly kernel reached final accuracies of 0.96, while KNN and QDA reached 0.98. The global loss significantly decreased, indicating better model convergence.
With LIME: The integration of LIME further refined performance metrics. Notable improvements in initial and final accuracies were observed, along with enhanced model transparency and trustworthiness.

The comprehensive evaluation across multiple metrics confirms the robustness and reliability of our approach, demonstrating significant improvements in model performance and interpretability through the integration of FL and explainability techniques. These enhancements are crucial for the sensitive and critical task of autism prediction, ensuring high accuracy, reduced errors, and greater confidence in model predictions.

Performance analysis of classifiers with FL and explainability techniques

The results of our study reveal notable differences in classifier performance with and without FL and explainability techniques such as LIME. Without FL, the QDA with PCA classifier stands out, achieving a high accuracy of 0.94 and a perfect recall of 1.0. In contrast, classifiers like KSVM and SVM with a poly kernel show significantly lower performance, with accuracies around 0.54 and 0.52, respectively, and much lower F1 scores and Cohen’s kappa values, suggesting limited predictive power and consistency.

Incorporating FL led to substantial improvements across all classifiers. KSVM, isolation forest, and SVM with a poly kernel reached a final accuracy of 0.96 after 100 communication rounds, highlighting the efficacy of FL in enhancing model performance. KNN with LLE and autoencoders, and QDA with PCA and MLP, also achieved a final accuracy of 0.98, reinforcing the benefits of combining FL with DL techniques. The global loss for most classifiers significantly decreased, indicating better model convergence and robustness over communication rounds.

Furthermore, integrating LIME for explainability resulted in even more refined performance metrics. KSVM with moderate NN showed a substantial initial accuracy boost to 0.85 in the 1st communication round, which stabilized at 0.96 by the 100th round. Similarly, isolation forest with RNN’s accuracy improved to 0.97 with a corresponding decrease in global loss, reflecting enhanced model transparency and trustworthiness. KNN with LLE and autoencoders consistently achieved a final accuracy of 0.98, with minimal global loss from the start, indicating strong initial performance and stability.

Overall, the combination of FL and explainability techniques like LIME not only enhances the predictive accuracy of classifiers but also improves their reliability and interpretability. This multifaceted approach ensures robust autism prediction while maintaining high standards of privacy and model explainability, addressing the critical needs of this sensitive application area.

Evaluation metrics and proof of enhanced privacy

Evaluation metric selection

In this study, we meticulously selected evaluation metrics to ensure a comprehensive and rigorous assessment of our models. The following considerations guided our choice of metrics:

Cross-validation: We employed cross-validation to evaluate the generalizability of our models. By partitioning the dataset into multiple folds and iteratively training and testing the model on different folds, we mitigated the risk of overfitting and ensured robust and reliable results. This technique provided a more accurate estimate of model performance on unseen data.
Overfitting and underfitting: To address overfitting and underfitting, we monitored the training and validation performance throughout the training process. Overfitting occurs when a model performs well on training data but poorly on validation data, while underfitting happens when a model fails to capture the underlying patterns in the data. We used learning curves to visualize these phenomena and adjusted model complexity accordingly. Regularization techniques and dropout were also applied to prevent overfitting.
Hyperparameter tuning: Hyperparameter tuning was conducted using grid search and random search methods to identify the optimal set of hyperparameters for each model. This process involved systematically varying hyperparameters and evaluating model performance using cross-validation. The best-performing hyperparameters were selected based on metrics such as accuracy, precision, recall, and F1 score.

Proof of enhanced privacy

The claim of enhanced privacy in our study is substantiated through the implementation of FL and the subsequent analysis of performance evaluation parameters. FL enhances privacy by ensuring that individual data points remain localized on client devices, and only model updates are shared. This decentralized approach significantly reduces the risk of data breaches and unauthorized access.

To provide proof of this claim, we highlight the following aspects:

Model performance with FL: Our results demonstrate that FL not only preserves privacy but also enhances model performance. As shown in the previous sections, the introduction of FL resulted in significant improvements in accuracy, precision, recall, F1 score, and other metrics. These improvements were consistent across various classifiers, indicating that FL contributes to both privacy and predictive accuracy.
Comparison of centralized and federated approaches: By comparing the results of models trained with and without FL, we can infer the impact of FL on privacy and performance. Models trained with FL exhibited comparable or superior performance to those trained in a centralized manner. For instance, the KSVM with moderate NN achieved an accuracy of 0.96 with FL, compared to 0.54 without FL. This demonstrates that FL does not compromise model effectiveness while enhancing privacy.
Privacy-preserving mechanisms: The use of FL inherently includes privacy-preserving mechanisms. By design, FL ensures that raw data never leave the local device, and only model parameters or gradients are communicated. This reduces the risk of sensitive information being exposed during the training process. Additionally, techniques such as DP can be integrated with FL to provide formal privacy guarantees.
Performance stability and generalizability: The stability and generalizability of models trained with FL were evaluated using cross-validation and hyperparameter tuning. The consistency in performance across different folds and hyperparameter settings further supports the robustness of our approach. The reduced risk of overfitting and underfitting, as evidenced by the learning curves and evaluation metrics, reinforces the reliability of FL in maintaining high standards of privacy and performance.

In conclusion, our study provides substantial evidence that the use of FL enhances privacy without compromising model performance. The careful selection of evaluation metrics, coupled with rigorous cross-validation, hyperparameter tuning, and the prevention of overfitting and underfitting, highlights the effectiveness of our approach. The significant improvements observed in key performance metrics validate the benefits of FL and explainability techniques in developing robust, privacy-preserving models for autism prediction.

Comparative analysis across stages

In this section, a synthesis of results from various experimentation stages in the research is provided to identify key trends and patterns. The initial EDA in stage 1 focused on understanding the toddler dataset, revealing insights into class distribution, pairwise relationships among features, feature distributions, and correlation matrices. Moving to stage 2 without FL, various ML classifiers were employed, with the isolation forest and QDA with PCA models standing out as top performers. Stage 3 introduced FL, showcasing improvements in model accuracy and reductions in global loss over communication rounds, with the KNN with LLE and autoencoders model exhibiting consistent high performance. Stage 4 explored into model explanations using LIME for different classifiers and their neural network architectures, enhancing interpretability. Key trends include the consistent performance of isolation forest and QDA with PCA, the impact of FL on model accuracy, and the dynamic nature of interpretability in FL. The identified trends contribute to a comprehensive understanding of the proposed approach for autism prediction using DL and FL, offering valuable insights for future research endeavors.

Enhanced experimental validation and comparative analysis

To address concerns regarding the robustness of our experimental setup and the necessity for fair comparisons with recent methodologies, we conducted an enhanced validation of our proposed method. Our experiments encompassed a detailed comparative analysis between our novel approach and existing methods, focusing on key metrics such as accuracy, precision, recall, and F1 score.

Our results indicate significant advancements over traditional methodologies. We achieved an average accuracy improvement of 8% across all classifiers when compared with the state-of-the-art methods in ASD prediction. Specifically, our QDA model with PCA integration demonstrated an accuracy of 98%, surpassing the best-performing non-federated models by 4%. This improvement underscores the efficacy of our framework in accurately predicting ASD traits in toddlers.

Moreover, we meticulously compared our results with recent studies in the field, ensuring a fair assessment of our methodology’s performance. The integration of FL and LIME not only enhanced predictive accuracy but also provided unprecedented insights into model interpretability and transparency. These dual benefits are crucial for both clinical decision-making and model refinement.

As a result, our enhanced experimental validation reaffirms the merits of our proposed method in advancing ASD prediction capabilities. By rigorously comparing our results with recent methodologies and showcasing substantial performance gains, we establish our framework’s efficacy in addressing the complexities of early ASD diagnosis while upholding stringent standards of experimental integrity and fairness.

Limitations and challenges

This section explores into the limitations encountered during the experimentation stages of the research study. Notable limitations include the relatively small size of the toddler dataset, potentially impacting model generalizability, and an imbalanced class distribution, which may affect predictive performance, especially for minority classes. The collaborative nature of FL introduces challenges related to data heterogeneity, privacy concerns, and the trade-offs between model performance and privacy preservation. The use of LIME for model interpretability adds complexity, and the resource-intensive nature of DL models, particularly in a federated setting, poses practical challenges. Other considerations include the dynamic nature of pediatric data, the need for external validation, and the interdisciplinary nature of autism prediction. While these limitations provide insights into challenges, they also present opportunities for future research and improvement, emphasizing collaborative efforts, innovative methodologies, and advancements in FL and model interpretability techniques. Acknowledging these challenges is crucial for transparency and guiding future research in the development of privacy-preserving, interpretable DL models in healthcare applications.

CONCLUSION AND FUTURE ENHANCEMENTS

In conclusion, this research addresses critical privacy challenges in pediatric healthcare, particularly in the domain of autism prediction. By integrating XFL, we successfully navigate the privacy paradox, balancing predictive accuracy with the protection of individual data. Our study employs a multifaceted approach encompassing EDA, comprehensive model comparisons, and interpretability assessments using LIME. These efforts contribute valuable insights at the intersection of DL, FL, and healthcare, advancing both theoretical knowledge and practical applications.

Our findings highlight the instrumental role of FL in safeguarding distributed pediatric health data, fostering trust, transparency, and accurate predictions for ASD. The incorporation of LIME enhances model interpretability, effectively addressing the inherent complexity of neural networks. This study not only carries immediate practical implications for early autism prediction but also contributes to broader discussions on advancing pediatric healthcare through responsible AI applications.

Reflecting on the strengths of our work, we highlight significant achievements in improving predictive accuracy while upholding stringent privacy standards. Our framework demonstrates robust performance metrics, including a notable accuracy of 98% in ASD prediction using the QDA model with PCA integration within the FL context. Furthermore, our comparative analyses with state-of-the-art methodologies validate the effectiveness of our approach in clinical settings.

However, acknowledging the limitations, we identify areas for improvement. Future research should focus on enhancing the robustness and generalization capabilities of FL models across diverse datasets and clinical scenarios. Mitigating biases inherent in healthcare data and integrating multimodal information could further enhance prediction accuracy and applicability in real-world settings. Moreover, ethical considerations and stakeholder engagement are critical for ensuring the responsible deployment of AI technologies in pediatric healthcare.

Moving forward, we propose several avenues for future enhancements. These include cross-domain collaborations to leverage insights from related disciplines, developing user-friendly interfaces for clinical adoption and exploring real-time prediction capabilities. Emphasizing transparency and involving clinicians in model validation and decision-making processes will be essential for fostering trust and acceptance of AI-driven healthcare solutions.

In conclusion, this research represents a significant step toward privacy-preserving DL in pediatric healthcare, laying a foundation for responsible predictive analytics. By addressing current challenges and charting a roadmap for future advancements, we aim to improve healthcare outcomes for children on the autism spectrum and beyond.

[1] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al.. 2016. {TensorFlow}: a system for {Large-Scale} machine learning12th USENIX symposium on operating systems design and implementation (OSDI 16). p. 265–283

[2] Abdelwahab MM, Al-Karawi KA, Hasanin E, Semary H. 2024. Autism spectrum disorder prediction in children using machine learning. J. Disabil. Res. Vol. 3(1):20230064

[3] Balasubramanian J, Gururaj B, Gayatri N. 2024. An effective autism spectrum disorder screening method using machine learning classification techniques. Concurr. Comput. Pract. Exp. Vol. 36(2):e7898

[4] Coppola L, Cianflone A, Grimaldi AM, Incoronato M, Bevilacqua P, Messina F, et al.. 2019. Biobanking in health care: evolution and future directions. J. Transl. Med. Vol. 17:1–18

[5] Dahiya AV, DeLucia E, McDonnell CG, Scarpa A. 2021. A systematic review of technological approaches for autism spectrum disorder assessment in children: implications for the COVID-19 pandemic. Res. Dev. Disabil. Vol. 109:103852

[6] Dcouto SS, Pradeepkandhasamy J. 2024. Multimodal deep learning in early autism detection—recent advances and challenges. Eng. Proc. Vol. 59(1):205

[7] Desideri L, Pérez-Fuster P, Herrera G. 2021. Information and communication technologies to support early screening of autism spectrum disorder: a systematic review. Children. Vol. 8(2):93

[8] Duncan J, Staib LH, Dvornek N, Li X, Zhuang J, Wang J, et al.. 2024. Data-driven learning strategies for biomarker detection and outcome prediction in autism from task-based fMRIDeep Learning for Medical Image Analysis. p. 357–393. Academic Press. MA, USA:

[9] Frye RE, McCarty PJ, Werner BA, Scheck AC, Collins HL, Adelman SJ, et al.. 2024. Binding folate receptor alpha autoantibody is a biomarker for leucovorin treatment response in autism spectrum disorder. J. Pers. Med. Vol. 14(1):62

[10] Gao J, Xu Y, Li Y, Lu F, Wang Z. 2024. Comprehensive exploration of multi-modal and multi-branch imaging markers for autism diagnosis and interpretation: insights from an advanced deep learning model. Cereb. Cortex. Vol. 34(2):bhad521

[11] Ghosh T, Al Banna MH, Rahman MS, Kaiser MS, Mahmud M, Hosen AS, et al.. 2021. Artificial intelligence and internet of things in screening and management of autism spectrum disorder. Sustain. Cities Soc. Vol. 74:103189

[12] Kaimara P, Oikonomou A, Deliyannis I. 2022. Could virtual reality applications pose real risks to children and adolescents? A systematic review of ethical issues and concerns. Virtual Real. Vol. 26(2):697–735

[13] Kamp-Becker I. 2024. Autism spectrum disorder in ICD-11—a critical reflection of its possible impact on clinical practice and research. Mol. Psychiatry. Vol. 29(3):633–638

[14] Kiarashi Y, Suresha PB, Bahrami Rad A, Reyna MA, Anderson C, Foster J, et al.. 2024. Predicting adverse behavior in individuals with autism spectrum disorder through off-body sleep analysis. medRxiv. 2024.01

[15] Kuttala D, Mahapatra D, Subramanian R, Oruganti VRM. 2022. Computer and information sciences. J. King Saud Univ. - Comput. Inf. Sci. Vol. 34:10444–10458

[16] Lakhan A, Hamouda H, Abdulkareem KH, Alyahya S, Mohammed MA. 2024. Digital healthcare framework for patients with disabilities based on deep federated learning schemes. Comput. Biol. Med. Vol. 169:107845

[17] Lee M, Lee S, Sohn J-W, Kim KW, Choi HJ. 2024. Assessment methods for problematic eating behaviors in children and adolescents with autism spectrum disorder. J. Korean Acad. Child Adolesc. Psychiatry. Vol. 35(1):57

[18] Levy SE, Wolfe A, Coury D, Duby J, Farmer J, Schor E, et al.. 2020. Screening tools for autism spectrum disorder in primary care: a systematic evidence review. Pediatrics. Vol. 145 Suppl 1:S47–S59

[19] Luo Q, Li H, Lin Y, Hu R, Li H, Zhao S. 2024. Effects of autistic traits on prosocial tendencies: the chain mediating role of fear of missing out and interpersonal security. Res. Autism Spectr. Disord. Vol. 112:102328

[20] Mujeeb Rahman K, Monica Subashini M. 2022. A deep neural network-based model for screening autism spectrum disorder using the quantitative checklist for autism in toddlers (qchat). J. Autism Dev. Disord. Vol. 52(6):2732–2746

[21] Parsons TD. 2021. Ethical challenges of using virtual environments in the assessment and treatment of psychopathological disorders. J. Clin. Med. Vol. 10(3):378

[22] Pasha A, Latha PH. 2020. Bio-inspired dimensionality reduction for Parkinson’s disease (PD) classification. Health Inf. Sci. Syst. Vol. 8(1):13

[23] Pasha A, Ahmed ST, Painam RK, Mathivanan SK, Karthikeyan P, Mallik S, et al.. 2024. Leveraging ANFIS with Adam and PSO optimizers for Parkinson’s disease. Heliyon. Vol. 10(9):e30241

[24] Pedregosa F. 2011. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. Vol. 12:2825

[25] Rabot J, Rødgaard E-M, Joober R, Dumas G, Bzdok D, Bernhardt B, et al.. 2023. Genesis, modelling and methodological remedies to autism heterogeneity. Neurosci. Biobehav. Rev. Vol. 150:105201

[26] Racine M, Jensen MP, Cane D, Moulin DE, Vlaeyen JW, Nielson WR. 2024. The Activity Management Inventory for Pain (AMI-P): initial development and validation of a questionnaire based on operant learning and energy conservation models of activity management. Clin. J. Pain. Vol. 40(4):200–211

[27] Rasul RA, Saha P, Bala D, Karim SRU, Abdullah MI, Saha B. 2024. An evaluation of machine learning approaches for early diagnosis of autism spectrum disorder. Healthcare Anal. Vol. 5:100293

[28] Schmitz C. 2012. LimeSurvey: an open source survey tool. Hamburg, Germany. Computer Software. LimeSurvey Project. https://www.limesurvey.org

[29] Shi B, Ye H, Heidari AA, Zheng L, Hu Z, Chen H, et al.. 2022. Computer and information sciences. J. King Saud Univ. Comput. Inf. Sci. Vol. 34:4874–4887

[30] Thabtah F. 2017. Autistic spectrum disorder screening data for children. UCI Machine Learning Repository. [Cross Ref]

[31] Wang W, Fu P. 2023. Gut microbiota analysis and in silico biomarker detection of children with autism spectrum disorder across cohorts. Microorganisms. Vol. 11(2):291

[32] Wilkes BJ, Archer DB, Farmer AL, Bass C, Korah H, Vaillancourt DE, et al.. 2024. Cortico-basal ganglia white matter microstructure is linked to restricted repetitive behavior in autism spectrum disorder. Mol. Autism. Vol. 15(1):1–18

[33] Zhao K, Chen P, Alexander-Bloch A, Wei Y, Dyrba M, Yang F, et al.. 2023. A neuroimaging biomarker for Individual Brain-Related Abnormalities in Neurodegeneration (IBRAIN): a cross-sectional study. eClinicalMedicine. Vol. 65:102276

Journal of Disability Research

Explainable Federated Learning for Enhanced Privacy in Autism Prediction Using Deep Learning

INTRODUCTION

LITERATURE REVIEW

ASD screening approaches

Limitations and challenges

FL in healthcare

Privacy preservation in healthcare

XAI in medical diagnosis

Research gap analysis

MATERIALS AND METHODS

Overview of the proposed study

Mathematical formulation of the proposed study

Predictive modeling

Feature transformation

FL optimization

Federated averaging

Privacy-preserving techniques

Noise injection

Dataset description

Original dataset overview

Modified dataset configuration

Exploratory data analysis

ML model framework

FL model framework

Model explanation with LIME framework

Overall experimental setup

Performance metrics and evaluation

Performance visualization

EXPERIMENTATION

Experimental setup

Experimental challenges and mitigations

RESULTS AND DISCUSSION

EDA insights

Comparative analysis without FL

Comparative analysis with FL

Explainability results and discussion

Dataset analysis and model evaluation

Dataset description

Evaluation metrics and their importance

Detailed results

Performance analysis of classifiers with FL and explainability techniques

Evaluation metrics and proof of enhanced privacy

Evaluation metric selection

Proof of enhanced privacy

Comparative analysis across stages

Enhanced experimental validation and comparative analysis

Limitations and challenges

CONCLUSION AND FUTURE ENHANCEMENTS

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Comment on this article