INTRODUCTION
Autism spectrum disorder (ASD) is a common neurological and developmental disorder that impacts people’s emotional, cognitive, and social life and health ( Devika Varshini and Chinnaiyan, 2020; Calderoni, 2023). It disturbs the activity of the brain by affecting the nervous system, mainly caused by genetic and neurological mechanisms in the body. Besides, ASD comes with co-occurring disorders like bipolar disorder, attention deficit hyperactivity disorder, depression, sensory processing disorder, disruptive behavior disorder, etc. In the contemporary society, occurrence of ASD is enhancing, with men being affected 4.5 times greater than women. Recent epidemiological research studies undertaken in various areas of the globe have represented that, partially, 1 in 100 individuals have certain kind of autism. Specifically, in Arab countries, the occurrence was estimated to be between 1.4 and 29/10,000. Moreover, few research studies published information regarding autism prevalence in developing nations. In Saudi Arabia, autism is higher than stated in developed nations and the confirmed cases have been found to be 42,500 and several stay undiagnosed ( Khan et al., 2020). Due to such increased prevalence of autism conditions, early treatment is vital in improving the ASD patient’s quality of life ( Garg et al., 2022; Shinde and Patil, 2023). Currently, the diagnosis method of ASD is subjective to the assessment process and should be performed by qualified professionals. Unfortunately, it is a time-consuming and expensive process. Technological advancement assists physicians in storing huge amounts of data to diagnose patients, wherein, data mining is an important method for collecting the data. In ASD identification, artificial intelligence (AI) attained voluminous medical achievements for identification and classification ( Bohr and Memarzadeh, 2020; Jahanara and Padmanabhan, 2021).
Several conventional techniques are utilized for the ASD identification system, like machine learning (ML) and deep learning (DL) ( Mohanty et al., 2022; Ismail et al., 2023). In ML, enormous algorithms are utilized in ASD identification, like random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN). Enormous methods have utilized significant characteristics as features for identification, like eye-tracking ( Oliveira et al., 2021), image analysis ( Jahanara and Padmanabhan, 2021), speech identification, data mining, and face identification. Among these, data mining is the fastest method for ASD identification. In the exiting technique, several ML techniques are utilized to detect ASD. Algorithms utilized in the existing method are RF, SVM, KNN, LR, and NB. The dataset utilized in this conventional research is Quantitative Checklist for Autism in Toddlers (Q-CHAT). The outcome shows that SVM has attained better performance with an accuracy of 95% ( Tartarisco et al., 2021). In the same way, the transfer learning-based DL method is utilized to screen ASD. For that, it has utilized visual geometry group (VGG-16) and convolutional neural network (CNN) algorithms. The dataset utilized in this existing technique is the ASD dataset that contains data from ASD-diagnosed children. The result of this exiting system is accomplished with 0.95 as F1-score and with 95% as classification accuracy ( Lu and Perkowski, 2021). Though several conventional studies attained better results in identifying ASD, they still lack in handling large datasets and high computation.
To overcome this issue, the propounded model employs modified multi-layer perceptron (MLP) with cross-weighted attention system for identifying ASD in adults through the autism screening dataset. This dataset comprises information on survey results from >700 people. It contains data on people with ASD as well as people without ASD. It is utilized in the respective model to identify ASD in adults. Primarily, the data from the autism screening dataset go over the preprocessor, where the data quality is improved by removing the noise in the data. Then, the data are parted into testing data and data of training, where the data of training is fed to the identification mechanism. The study suggests using a modified MLP structure along with a cross-weighted attention mechanism. Although MLPs and attention mechanisms are well-known in ML, the unique blend and alterations proposed in this research represent a fresh method for identifying ASD. Customized loss function takes into account the difference in test data. This customized method for calculating loss extends beyond typical loss functions in MLP models and showcases an advanced technique for enhancing model training and resilience. The ASD identification is done with modified MLP with a cross-weighted attention technique. Lastly, the classified data pass through the trained model and prediction phase. The model’s performance is calculated using examination metrics. Furthermore, the model is examined internally with other algorithms, such as MLP, RF, and NB, to calculate the efficiency of the respective system. Contrary to prevalent methods primarily centered on diagnosing ASD in children, the planned research specifically aims at adult data sets. Acknowledging the distinctive features of ASD presentation in adults and creating specific strategies for this group signify a significant progress from current methods. The study aims to enhance the efficiency and accuracy of ASD identification by incorporating a cross-weighted attention mechanism. This signifies a progression from conventional MLP models, which may face difficulties when dealing with massive datasets or understanding intricate connections within the data.
Motivation of the research
In the recent years, ASD is ascending its spread swiftly than heretofore. Detecting this neurological disorder has evolved to be complicated as there exist several other mental illnesses, few of their symptoms being identical to ASD symptoms, thereby rendering this process hard. Contrarily, detecting the autism traits with screening tests is a time-consuming process and expensive. Concurrently, the progress of DL- and ML-based model assists in prognosticating autism in the initial phase. Moreover, ML models are typically reliable and effective in affording interpretable and precise outcomes. On the other hand, DL methodologies permit the exposure of high level features with the use of representation learning and provide enhanced performance than conventional models. The main merit of DL is its innate ability in executing automated feature engineering without the involvement of domain experts. Due to such advantages of ML and DL, various research studies have used different algorithms like neural network (NN) ( Huang et al., 2020), KNN, RF, decision tree (DT), and SVM. Regardless of the existing system’s endeavors, there is scope for further improvement with regard to accuracy. Therefore, a consistent study is needed for progressing in an optimal manner for diagnosing ASD due to the limitations in the results. Motivated by these factors, the present work aims to attain better accuracy in ASD diagnosis by exploring suitable ML and DL models.
Main contributions of the research
The research aims to identify ASD through the use of suitable ML and DL techniques, following the specified objectives,
To investigate the effectiveness of a modified MLP in predicting and analyzing behavioral patterns in individuals with ASD.
To examine the impact of incorporating a cross-weighted attention mechanism in the modified MLP on the accuracy and performance of the predictions.
To compare the results of the modified MLP with the traditional MLP and other existing ML models commonly used in ASD behavioral research.
To contribute to the advancement of ASD behavioral research by introducing a novel approach that combines modified MLP and cross-weighted attention mechanism, potentially leading to more accurate and nuanced insights into the behaviors of individuals with ASD.
Significance and scope of the research
Medical implications of precise diagnosis of autism are crucial. Detection and interference in the initial phase can affirmatively influence the individuals affected with ASD, affording them with well-timed assistance and treatment plans. Significance of these models have revolutionized the field of ASD research, providing a deeper understanding of the disorder, identifying subgroups, developing customized treatments, and creating innovative interventions. As technology continues to advance, it is exciting to imagine the possibilities these models hold in improving the lives of individuals with ASD and their families. With the development of optimal performing model through modified MLP, the study intends to support the researchers and clinicians in accomplishing informed and accurate decisions in ASD diagnosis. Through the exploitation of the strengths of the proposed model in terms of cross-weighted attention mechanism, the framework offers significant possibility in assisting initial phase detection and interference for people affected with ASD. Besides, precise classification of ASD cases could contribute in better healthcare decisions and patient outcomes. Developmental process of the model’s automated nature, enabled by the proposed model, minimizes the problem of manual trial and error, permitting practitioners and researchers to concentrate on application and interpretation of the results corresponding to a model in real-time cases. Modified ML and DL models are at the forefront of ASD behavioral research and have the potential to revolutionize the field. By leveraging the power of AI, these models can provide a more comprehensive and nuanced understanding of the disorder, leading to improved diagnosis and treatment strategies.
Research questions
How does the use of modified multi-layer perceptron (MLP) with cross-weighted attention mechanism impact the accuracy and efficiency of ASD behavioral research?
What are the key differences and similarities between traditional MLPs and modified MLPs in the context of ASD research?
How does the cross-weighted attention mechanism contribute to the understanding and analysis of ASD behaviors in research?
In what ways can the modified MLP with cross-weighted attention mechanism be adapted and customized for different subtypes of ASD?
REVIEW OF THE LITERATURE
This section involves the review of recent studies that implement the identification of ASD using various algorithms. In addition, the problems identified by reviewing the studies are also discussed.
ASD is a developmental disorder that disturbs the functions of the brain by affecting the nervous system ( Thabtah et al., 2019; Shahamiri et al., 2022). Adults with ASD have difficulties ( Mengash et al., 2023) in communication, learning, concentration, memory, etc. Several techniques that have been developed for detecting ASD through ML ( Bala et al., 2022) and DL ( Mujeeb Rahman and Monica Subashini, 2022) are reviewed and discussed in this section. In the conventional study, an ML algorithm has been utilized to detect ASD in adults. In this case, a behavioral diagnostic tool called Autism Diagnostic Observation Schedule (ADOS) has been utilized to extract the five behavioral features for the detection model. It has been examined with ADOS module 4 in 673 samples of clinical data of 385 adults. This dataset comprises both adults with and without ASD. The model’s outcome achieved better performance in detecting ASD ( Küpper et al., 2020).
Likewise, the DL model has been utilized for the screening of ASD. For that, the CNN algorithm has been utilized. The data have been extracted with the application on the mobile device. The identification system has been further compared with other algorithms to calculate the efficacy of this traditional detection system. From the outcome, it has been revealed that, the existing system ( Shahamiri and Thabtah, 2020) achieved better performance in ASD detection. In the same way, an ML-based system has been utilized for ASD identification. Various techniques have been analyzed in the existing system, such as SVM, LR, KNN, CNN, and NB. Nonclinically, ASD dataset, which comprises adolescents, children, and adult data, has been utilized for the identification. The outcome has shown that CNN has achieved better performance ( Raj and Masood, 2020). Correspondingly, the ML system has been utilized for the identification of ASD. Several models have been utilized, such as SVM, KNN, AdaBoost, and NB, in this identification system. The ASD dataset has been utilized that consisted of 16 attributes of 703 ASD-affected people and nonaffected people. This analysis has revealed that SVM and NB have produced better performance with a lower error rate ( Mashudi et al., 2021).
Similarly, the ML technique has been utilized for ASD detection. For that, this traditional system has employed diverse feature-scaling techniques through feature-scaled datasets. Eight classifiers have been utilized in this existing technique, such as AB, RF, DT, KNN, GNB, LR, SVM, and linear discriminant analysis. The dataset comprises toddlers, children, adults, and adolescent’s data. The experimental outcome has unveiled that AB achieved better accuracy in detecting ASD ( Hasan et al., 2022). Likewise, the DL model with particle swarm optimization (PSO) has been utilized to detect ASD. For that, artificial neural network (ANN)-based MLP has been used for the classification, while PSO has been utilized for feature selection. ASD dataset has been used for identification. The outcome of the existing system has been compared with non-reduced and reduced features of datasets. The results have shown an enhanced performance in the identification of ASD ( Sahu and Verma, 2022).
In addition, ML and DL systems have been utilized to identify ASD with images. For that, RF, SVM, and CNN have been utilized in this existing identification system through the Autism Brain Imaging Data Exchange II (ABIDE II) dataset. From the analysis, it has been found that RF has produced better performance than SVM and CNN in ASD identification ( Saputra et al., 2023). In the same way, ML and DL ( Fan et al., 2023; Qiang et al., 2023) have been utilized to detect ASD. CNN, ANN, DT, NB, LR, and SVM have been used for this purpose. The nonclinical dataset has been utilized as the input data. The analysis has shown that ANN performed better in ASD identification ( Raju et al., 2022). Likewise, the DL technique has been utilized for ASD detection. It has been accomplished by using 1D CNN algorithm through the ASD dataset. This dataset comprises adult, children, and adolescent’s data. The results have shown the better performance ( Kareem et al., 2023). Correspondingly, the DL-based model has been utilized to detect ASD in children. MobileNet with two dense layers has been used for feature extraction and classification. The data of 3014 images comprised children with ASD and children without ASD. The outcome of this existing model has stated the better performance of the model with 94.6% accuracy ( Hosseini et al., 2022).
Furthermore, the study by Squarcina et al. (2021) aimed to identify ASD by collecting a sample encompassing of 76 subjects wherein 36 comprise of developed subjects and 40 with ASD-diagnosed cases. Each child underwent magnetic resonance imaging (MRI) analysis. To accomplish classification, the features that have been extracted have been utilized as an input for the classifier so as to find the ASD subjects via a “learning through instance” procedure. Then, features with optimal performance has been chosen by greedy forward feature selection. Lastly, the recommended framework underwent a cross validation strategy. From training of 68-RoIs, 5 RoIs attained an accuracy rate of nearly 70%. Following this, a recursive process for feature selection has been considered to find eight features with an ideal accuracy rate of 84.2% ( Voinsky et al., 2023).
Subsequently, the research ( Rakić et al., 2020) has endorsed a classifier for ASD classification. Implied system has relied on a network-based model comprising of MLP and auto-encoders ( Yin et al., 2022). The significance of multi-model method has been showcased by assessing the attained outcomes quantitatively and qualitatively. With the consideration of varied information kinds in the implied classifier, the results have been statistically enhanced with an 85.06% accuracy rate. Furthermore, the functional connections with ASD has been exposed in the study ( Spera et al., 2019) by exploiting the potentialities of ML methods to emphasize the subtle variations amongst the subject profiles of fully connected (FC) with ASD controls. From outcomes, it has been highlighted that, for single sample, ASD subjects have been differentiated through the usage of FC sequences learned on three sites. The results have shown the area under the curve (AUC) value as 0.83 ( Ke et al., 2020).
Furthermore, the research ( Huang et al., 2020) has used a graph-based computational framework to identify ASD with the use of rs-fMRI. Moreover, a GBFS technique has been endorsed to emphasize the associations through internal and external measure. For taking merit of the topological information inferred in graph, restricted path-based depth first search has been proposed to refine the connectivity matrices. Lastly, a three-layered deep belief network has been endorsed with automated hyperparameter tuning methodology so as to perform classification. Analytical outcomes have unveiled an accuracy rate of 0.764.
Correspondingly, an ML-based system has been utilized for ASD detection. For this, various systems have been analyzed, such as MLP, sequential minimal optimization (SMO), LR, sequential learning, leaderboard, indicators of compromise, real AdaBoost, and LMT. Four types of data, collected from toddler, child, adolescent, and adult, have been utilized. The outcome has shown that the MLP classifier and feature selection with Relief F has produced better performance ( Hossain et al., 2021). Several conventional methods have utilized behavioral symptoms and images as the features for the identification. ASD dataset has been the widely utilized dataset for identification. Some conventional methods have utilized data from toddlers ( Achenie et al., 2019), children, adolescents, and adults. Since more methods have utilized ASD identification of children, only a few methods have focused on adult ASD identification.
The existing model surpasses the current state-of-the-art methods with an 87% average accuracy and exceptional metrics such as 96% AUC, 87% sensitivity, and 86% F1-score, showcasing its superiority. Various scoring methodologies in a thorough comparison study consistently show the CC200 atlas outperforming other atlases in distinguishing individuals with autism. The individuals in the ASD group and control group are compared ( Guttikonda et al., 2023). The result provided with the updated ANN classifier model was found to have an accuracy of 1.00, indicating enhanced performance. Therefore, the implied model was proposed and designed to help doctors and scientists improve the diagnosis of ASD for better results ( Sha et al., 2023). During the comparative analysis from the existing studies, it was noted that long short-term memory (LSTM) outperformed with other AI algorithms like CNN and MLP in ASD diagnosis, as it demonstrated consistent results, reaching high accuracy in fewer epochs while minimizing MSE and loss. Moreover, the proposed HDS for ASD, based on LSTM, attained optimal outcomes with 100% accuracy according to DSM-V, which was statistically confirmed through a sample comprising both ASD and TD individuals ( Khullar et al., 2021).
Problem identification
The main pitfalls encountered during the analysis of the conventional works are deliberated in this section.
The deep CNN system has been utilized for the identification of ASD. But it lacks in the identification of complex features ( Shahamiri and Thabtah, 2020).
The ML-based system has been analyzed for the detection of ASD. However, this deprived the ability to handle larger datasets ( Hossain et al., 2021).
ML system has been utilized for ASD identification; nevertheless, the quantity of data was not enough for all stages of people ( Hasan et al., 2022).
Different ML and DL methodologies have been used by conventional works. Nevertheless, there is a scope for improvement with regard to accuracy ( Spera et al., 2019; Huang et al., 2020; Rakić et al., 2020).
PROPOSED METHODOLOGY
ASD is considered as one of the behavioral disorders that affects the quality of life of numerous people in the world. Though it is recognizable early, some people are not diagnosed primarily. Effective identification is needed for the primary treatment in adults. Several conventional techniques thrived to accomplish fast ASD identification systems but lack accuracy, handling large datasets, and computation. Therefore, the respective system employs modified MLP with cross-weighted attention mechanism for the identification of ASD in adults. The overall working of the proposed system is shown in Figure 1. This method merges a modified MLP structure with a cross-weighted attention mechanism. The device aids in concentrating on important characteristics and decreases the amount of processing required. The new method also incorporates a customized loss function which takes into account the change ratio in the testing dataset.
Initially, the data from the autism screening dataset are passed through the preprocessor, where the noise in the data is removed, and the data quality is improved. Besides, it checks for missing values and additional inconsistencies in the data. Then, the data are divided into testing and training, where 80% of the data are used for training and 20% for testing. Consequently, the data of training are passed through the identification mechanism. The identification of ASD is done through modified MLP with a cross-weighted attention mechanism. At that time, the classified data go over the trained model and prediction phase. Furthermore, the modified MLP with a cross-weighted attention system is internally assessed with NB, MLP, and RF systems to examine the efficacy of the proposed system. Finally, metrics such as accuracy, precision, recall, and F1-score calculate the model’s efficacy. Figure 2 signifies the identification mechanism in the propounded system.
In the propounded system, the cross-weighted attention mechanism is used to improve the efficacy of ASD identification. In the identification mechanism, the input data pass through the process of assigning weight for the parameters. In the forward pass, input data go over the network to generate the prediction mechanism. It offers a holistic approach to enhancing diagnostic accuracy and efficiency, especially among adult populations, a unique blend of modified MLP structure and cross-weighted attention mechanism. This new method intends to overcome the drawbacks of current strategies by utilizing the advantages of both NN structures and attention mechanisms. At this point, the loss is calculated using weighted cross-entropy that focused on the important classes, which improves the model’s performance. Then, in the backward pass, the network reduces the loss function by reducing the weights. By incorporating attention mechanisms and multiple layers into a modified MLP, the method improves the feature representation in the input data. The inclusion of a cross-weighted attention mechanism aids in concentrating on important characteristics while decreasing computational burden. This results in improved processing efficiency and potentially quicker diagnosis times.
Furthermore, the data are fed to the gradient descent mechanism for updating the weights. The gradient descent takes numerous iterations to calculate the local minimum with the needed accuracy. Then, the gradient of the cross-entropy loss is calculated. Finally, the resulting predicted data pass over the trained model and prediction phase to evaluate the model’s efficacy. The cross-weighted attention mechanism in the identification process focuses on the important features of the data and eliminates unwanted data to improve the efficacy of the ASD identification system.
Internal assessment
Generally, internal comparison is undertaken to determine the effectual performing model for a process. In this study, the performance of NB, RF and MLP has been internally compared with that of the proposed model to determine its effectualness in identifying ASD in adults. Specifically, the study considers NB due to its innate capability in handling discrete and continuous data. It is scalable with maximum data points and predictors and not subtle to the irrelevant features. Furthermore, RF is considered as it has inherent ability with regard to scalability, versatility, robustness, accuracy, and feature significance. Moreover, RF minimizes overfitting by taking the average of multiple DTs. It is also sensitive to outliers and noise within a data. Contrarily, MLP possess the ability to handle nonlinearly and linearly discrete data. Due to such advantages, NB, RF, and MLP are considered to be internally compared with the proposed model.
Naïve Bayes
This supervised learning system utilizes the Bayes theorem for resolving classification. Its process of identification is based on finding the probability of the object. It is mainly utilized for the text classification process. Initially, the dataset is converted into frequency tables in the NB identification process. Then, the likelihood table is produced to discover the probability of the features in the data. At this point, Bayes theorem is utilized to calculate posterior probability. The NB identification system is utilized for multi-class and binary classification. The NB system comprises three types like Gaussian, multinomial, and Bernoulli. Algorithm 1 signifies the process in the NB system.
Naïve Bayes
Step 1: Training: |
For every data |
For every data – value V |
For every class – label S |
PROB(VS)=(total amount of data occurrence – value with class label) with /(total amount of occurrences of class label) |
End for |
End for |
End for |
Step 2: Testing: |
For every instance in test data |
Compute probability of posterior with |
PROB(SV)=(PROB(SV)*PROB(S))/PROB(V) |
End for |
Step 3: Task the class label with the highest posterior probability to the test instance. |
Random forest
It is the ensemble tree-based system that uses the bagging process for improving the detection process. Primarily, the quantity of trees and features with the node size is located for each parameter. RF comprises DTs where every single tree comprises a dataset with a replacement called the Bootstrap sample. The bootstrap sample of DTs in diverse subsets of real-time datasets, thereby calculates the voting average to connect the predictions produced by the diverse trees. It generally predicts the result by taking the majority prediction outcome of each tree and produces the final output. Accuracy depends on the number of trees utilized in the system. Algorithm 2 signifies the process in RF.
Random forest
Input: Data of training sets DS N*P plus trees(A) quantity |
For every variable i ∈ P do |
For a = 1 to A: |
1. Enter the data of sample X * size of M of the data of training. |
2. Generate a RF tree F a to the 23 of data. |
3. Identify leftover classification 15 utilising the RF tree, and |
compute rate of classification = accuracy rate (OOB), namely accuracy. |
4. For variable I, variable value is permuted and evaluate accuracy (accuracy b), |
minus the primary oob error (h a |
= accuracy a |
– e a), the acceleration indicates the importance of variables. |
End for |
Aggregate total accuracy from all trees and compute the variance. |
ˆH = 1A∑ A k = 1 hk and s2h = 1A−1∑ A k = 1 (hk−ˆh)2 |
compute the importance of variable i: variable i = ĥ/s h |
End for |
Multi-layer perceptron
It is the NN of the feed-forward network that produces the amount of output from the amount of input. It comprises layers of three nodes such as an input layer, an output layer, and a hidden layer. Every layer in the node utilizes a nonlinear activation function except the input layer. It utilizes the function of back-propagation to train the network. Initially, the input data pass through the input layer. Then, the output layer performs the identification of the ASD. The hidden layer is utilized for the nonlinear transformation, like adding weights in the input and passing through the activation function to the output. The MLP in the identification is widely used for its capability in resolving difficult nonlinear complications, handling huge datasets, and producing better accuracy results.
The overall architecture of MLP is depicted in Figure 3. Each of the nodes in all the layers corresponding to MLP inclusive of the bias-node is FC to all nodes in the succeeding layer. Overall nodes in the input layer and the number of input-parameters are equal. Contrarily, the resultant layer might also encompass several nodes that correspond to the overall predictions which the network is accountable to make. Nevertheless, the hidden layers and their corresponding nodes are modifiable hyperparameters by which the model fulfills preferred approximation and appropriate generalization ability. The overall working of this network is depicted in Algorithm 3.
MLP
Mechanism MLP (Autism data, Concealed layers, iterate) |
Start: Train MLPs |
for autism data = 1 to End of autism data do |
for Concealed layers = 1 to 20 do |
for iterate = 1 to 20 do Train MLP |
MLP-identification <− save highest accuracy |
End for |
End for |
MLP identification |
<− MLP best identification based on the autism data. |
End for |
return MLP − identification |
End Mechanism |
Proposed model-modified MLP with cross-weighted attention mechanism
The MLP model is modified and it is cross linked with the attention mechanism to get the results precisely. The novelty of the proposed study is modified MLP with cross-weighted attention mechanism. The modified MLP is utilized by joining numerous layers in the NN. It comprises NN, feature fusion, and attention mechanism to use the related features in the data. By analyzing large datasets from individuals with ASD, these models can identify distinct subgroups based on genetic, behavioral, and cognitive characteristics. The respective model improves the attention system by modifying the loss function with weighted cross-entropy. These models have the potential to revolutionize our understanding and management of ASD, leading to earlier diagnosis, customized treatment, and improved outcomes for individuals with the disorder. The modified MLP comprises three layers viz input, hidden, and FC layers. Every layer in the modified MLP comprises hidden layers of neurons. The value of parameters in the input is extracted from the hidden layers. The input layer’s neuron amount depends on the parameters amount in the input. Here, the sigmoid layer is employed to determine which value to pass and not to pass as the output. Furthermore, the sequential layer collects the data and concatenates the weights generated. Thus, the hidden output is applied with cross modification attention mechanism and loss weight cross-entropy is produced by adding up the outputs. The experimental results used in the present study are various algorithms with performance metrics, confusion matrix of the various algorithms and performance metrics of the MLP and modified MLP, and internal comparison analysis of the modified MLP with NB and RF. Figure 4 signifies the modified MLP with the cross attention model process.

Architectural representation of modified MLP with cross-weighted attention system. Abbreviation: MLP, multi-layer perceptron.
From Figure 4, it is inferred that, in modified MLP, the nodes in the input layer take input and forward the output to each three nodes of the hidden layers. The cross-weighted attention mechanism is proposed by combining multiple layers of NN to capture the interrelationships in the input data. The parameters are assigned with weights and bias. The weighting is based on the significance of every input part. It is utilized to calculate a score of alignment among the elements. The matrix in the input is converted to the sequence x = [x 1,x 2,…x d], and the query is signified as query∈ ℝd . The query in the system represents the features taken from the autism screening of the adult dataset. Then, the softmax function converts the scores [f(A, query)]di = 1 to the dissemination of probability p(z|x, query). Here, z represents the indicator. The indicator of the element is significant to the query.
The attention mechanism in MLP is signified in Equation (1).
Then, the weighted element is selected for the parameter in the network based on its importance. The weighted element is represented in Equation (2).
The parameterized function based on the compatibility is proceed by the MLP system, and it is represented in Equation (3).
The parameters are represented by weight(1) ϵℝdimXdim,weight(2) ϵℝdimXdim,weight ϵℝdim . The dimensions are represented by A, and the activation function is represented by σ. The attention mechanism with multiplication function utilizes the similarity of cosine as the support function. It is represented in Equation (4).
The query is removed from the support function, which is formally written as the following Equation (5):
The calculated weighted element is represented in Equation (6).
Then, the loss function is utilized by multiplying the complete figures on the change ratio in the test data and the cross-entropy values. The loss function is customized by multiplying absolute values of change ratio of the tested data and original cross-entropy values. The cross-entropy for binary classes in the ASD identification is represented in Equation (7).
In Equation (7), y i represents the predicted probability. It calculates the probability of class i. The respective system-weighted cross-entropy is represented in Equation (8).
In Equation (8), change ratioi represents the change ratio in the test data i. Thus, the system is portrayed to ensure accurate detection for large change ratios and to decrease the loss. Moreover, the highly correlated features from the input are selected when weighted modified attention network is applied and tends to achieve better empirical performance for ASD identification.
RESULTS AND DISCUSSION
The results obtained by the execution of the respective mechanism are presented in this section. Furthermore, the dataset description with the sample data, exploratory data analysis (EDA), performance metrics, experimental results, performance analysis, comparative analysis, and discussion of the existing techniques with the proposed work are presented.
Dataset description
The respective system utilized autism screening on an adult dataset to assist physicians with the primary diagnosis of ASD adult patients. The dataset comprises survey results from >700 people. The data contain information on people across age, gender, and other variables. They contain labels that signify whether the person has been diagnosed with autism. The dataset is generated from the following link https://www.kaggle.com/datasets/andrewmvd/autism-screening-on-adults. The dataset comprises the outcome of the screening test through the autism-spectrum quotient (AQ) survey. It is basically the self-assessment considered to identify the likelihood of adults with ASD. The survey comprises 50 questions in 5 categories: Imagination, communication skills, social skills, attention to detail, and switching in attention. The assessment relies on the answer provided by the contributor in the statement given in the survey on a 4-point scale. The motive of the dataset is to give the resource to the examiners and physicians to enhance the understanding of ASD in adults. Table 1 provides few questions given in the survey in the autism screening in the adult dataset.
Questions provided in the survey of the dataset.
AQ survey questions | |
---|---|
1. | I choose to carry out tasks through others relatively than on my own. |
2. | I choose to carry out tasks a similar way again and again. |
3. | If I attempt to visualize anything, I discover it simple in the mind to make a depiction. |
4. | I am regularly fascinated by a single item that I miss attention of further items |
5. | I frequently observe minor sounds |
6. | I observe the figure plates in car or related threads of data. |
7. | Some recurrently convey what I’ve said is rude despite it is respectful |
8. | While reading, I imagine the characters in the story |
9. | I am usually captivated by dates |
10. | I notice numerous diverse discussions in the group |
Abbreviation: AQ, autism-spectrum quotient.
The AQ questions in the dataset are utilized to gather data on the likelihood of ASD in adults. The responses in the survey are utilized to generate their likelihood of ASD. The questions in the survey are produced to calculate communication and social skill, attention, and imagination. These are the skills that are challenging for ASD adults. In this dataset, target variables comprise two classes such as 0 and 1, where 0 represents the contributors who are not likely to have autism, and 1 represents the contributors who are more likely to have autism.
Exploratory data analysis
EDA analyzes the entire dataset besides formal modeling and summarizes the main characteristics and key insights from the dataset. The modified MLP with cross-weighted attention model utilized an autism screening dataset for ASD identification. Figure 5 signifies the amount of ASD patients in the country-wise manner utilized from the autism screening dataset.

ASD rate per ethnicity utilized in the proposed system. Abbreviation: ASD, autism spectrum disorder.
From Figure 5, the ASD patient’s data are taken from the ethnicity-wise counts. The highest percentage of data is taken from the United States than other regions. The data are acquired from countries like United States, United Kingdom, New Zealand, Australia, Canada, India, France, Brazil, Malaysia, Mexico, Italy, Netherlands, Austria, Russia, United Arab Emirates, Sweden, Afghanistan, and Germany. The data acquired from white-European ethnicity are high compared to those from other ethnicities. Furthermore, Figure 6 represents the class on the dataset on the basis of gender and relation.
Following this, 10 scores are available in the dataset that corresponds to the survey taken from 700 people and the overall distribution of these scores is presented in Figure 7. These scores indicate the intelligence quotient (IQ) level of the individuals. Based on the scores, the probability of having ASD is prognosticated.
Correlation matrix is defined as a statistical method utilized to assess the association amongst two variables within a dataset. Matrix indicates a table wherein each cell encompasses a correlation coefficient (as shown in Fig. 8). In this matrix, 1 denotes a strong correlation amongst variables and 0 indicates neutral correlation.
Furthermore, correlation matrix is visualized as a network graph, wherein, the variables are plotted as nodes, while, the correlations are plotted as edges associating the nodes. The network graph for 10 scores is shown in Figure 9.
Insights
The contribution of the ASD dataset in the detection and diagnosis of ASD in adults is provided in Table 2. It states the uses and function of the dataset and also provides the causes and treatment for ASD in adults.
Autism screening in adult dataset.
Abbreviations: ASD, autism spectrum disorder; ML, machine learning.
Performance metrics
The performances of the respective mechanism are evaluated with certain performance metrics such as F1-score, accuracy, recall, and precision. The precise explanation about these metrics along with its mathematical form are given in Table 3.
Evaluation metrics.
Evaluation metrics | Mathematical representation |
---|---|
Precision: It is also denoted as identified positive figure and is given by the fraction of true positives to the average of true positives as well as false positives. | Precision=TpTp+Fp |
Recall: It is given by the ratio of correctly identified results to overall identifications. The recall is also known as specificity or sensitivity. | Recall= TpTp+Fn |
Accuracy: It is defined as the ratio of correct identification to overall identification. | Accuracy=Tp+TnTp+Fp+Tn+Fn |
F1-score: It is evaluated by the mean of recall and precision scores. It also denotes that, if F1-score predicted is higher, then the quality of the classifier is also high. | F1-score= 2∗Recall∗PrecisionRecall+Precision |
T n is true negative, T p is true positive, F n is false negative, F p is false positive.
Experimental results
The results obtained by the modified MLP with cross-weighted attention system are discussed in this section. The respective model is evaluated with performance metrics such as F1-score, accuracy, recall, and precision. Table 4 and Figure 10 provide the internal analysis of the modified MLP with cross-weighted attention system with RF, MLP and NB system based on certain examination metrics.
Internal comparison of the propounded system with conventional system.
Accuracy | Recall | F1-score | Precision | ROC | |
---|---|---|---|---|---|
Modified MLP | 99 | 99 | 99 | 99 | 98 |
RF | 95 | 100 | 97 | 94 | 51 |
NB | 98 | 98 | 99 | 99 | 52 |
MLP | 66 | 75 | 78 | 82 | 52 |
Abbreviations: ML, machine learning; MLP, multi-layer perceptron; NB, naïve Bayes; RF, random forest.

Graphical representation of internal comparison of the propounded model. Abbreviations: MLP, multi-layer perceptron; ROC, receiver operating characteristic.
According to Table 4 and Figure 10, NB has achieved an accuracy rate of 98%, an F1-score rate of 99%, a precision rate of 99%, a recall rate of 98%, and an Roc rate of 52. In the same way, RF has demonstrated an accuracy rate of 95%, an F1-score rate of 97%, a precision rate of 94%, a recall rate of 100%, and an Roc rate of 51. On the other hand, MLP has displayed an accuracy rate of 66%, an F1-score rate of 78%, a precision rate of 82%, a recall rate of 75%, and an Roc rate of 52. Nevertheless, the modified MLP with cross-weighted attention mechanism achieved superior outcomes, boasting an accuracy of 99%, a recall rate of 99%, a precision rate of 99%, an Roc score of 98%, and an F1-score of 99%.
Performance analysis
Performance of the proposed system has been assessed with regard to confusion matrix for all the considered models, internal comparison plot for MLP vs. modified MLP, and comparison with proposed model and NB, RF and MLP. The corresponding outcomes are deliberated in this section. Initially, confusion matrix is plotted which depicts the summary of prognostication outcomes on the classification issue. It affords the insight into errors that are being done by the considered classifier.
In a confusion matrix, T p indicates that the model correctly prognosticates the positive classes. In this case, actual and prediction both seem to be positive. Contrarily, T N denotes that the model suitably prognosticates the negative classes. In this case, the actual and prediction cases are negative. Following this, F p denotes that the model affords wrong prognostication of negative class. Subsequently, F N denotes that the model wrongly prognosticates negative class. The confusion matrix corresponding to NB, RF, MLP, and modified MLP is depicted in Figure 11a–d.

(a) NB confusion matrix. (b) RF confusion matrix. (c) MLP confusion matrix. (d) Modified MLP confusion matrix. Abbreviations: MLP, multi-layer perceptron; NB, Naïve Bayes; RF, random forest.
Figure 11 explains about the confusion matrix of NB, RF, MLP, and modified MLP. They are essential for model evaluation, error analysis, and improving the overall accuracy and reliability of ML algorithms. Figure 11a–d illustrates the disease prediction of autism in an individual. Figure 11a depicts the NB model disease prediction, Figure 11b represents the disease prediction by RF, Figure 11c discusses the MLP model’s disease prediction, and Figure 11d presents the disease prediction by modified MLP model. From these figures, it is concluded that modified MLP model predicts the autism disease accurately in an individual.
According to Figure 11a, three classes were accurately identified with true values for accurately diagnosing autism, but one class was wrongly classified as misdiagnosis of autism present in an individual. The accurate classification rate in this instance has been determined to be superior to the misclassification rate, revealing the superior performance of NB. On the contrary, according to Figure 11b, four classes have been accurately identified as precisely diagnosing autism disease in an individual. In the same way, one class had mistakenly diagnosed the normal person as an autism-conditioned individual. Therefore, it is evident that the accurate classification rate is greater than the misclassification rate, indicating the superior performance of RF.
In addition, matrices have been accurately identified as such in four instances in Figure 11c, with positive diagnosis conditions and misdiagnosis conditions. It has been discovered that the accuracy of predictions is higher than the rate of misclassifications, highlighting the superior performance of MLP. Figure 11d elaborates about the autism diagnosis prediction in an individual or not with the modified MLP model. As a result when compared with other models, it provides with the outcome of predicting the disease condition precisely.
Contrarily, eight classes have been suitably classified while there has been no misclassification for classes. Likewise, 1 class has mispredicted the diagnosis of an individual, while 39 classes have precisely identified the individuals who are suffering from autism condition. In this case, the accurate classification rate has been found to be higher than misclassification rate, exposing the better performance of the proposed modified MLP. Overall, the proposed model is found to show a higher rate of precise classification than NB, RF, and MLP, unveiling its efficacy in ASD classification.
Similarly, the performance of MLP and modified MLP has been internally compared with regard to F1-score, precision, accuracy, and recall. The respective outcomes are shown in Figure 12.
According to Figure 12, it is evident that modified MLP outperforms the MLP algorithm in terms of accuracy, recall, F1-score, and precision, when classifying ASD. This confirms the superior performance of modified MLP. Additionally, modified MLP has been pitted against algorithms such as RF and NB. The results achieved are depicted in Figure 13.

Internal analysis of modified MLP with NB and RF. Abbreviation: MLP, multi-layer perceptron; NB, naïve Bayes.
According to Figure 13, internal analysis has been conducted regarding recall, precision, and accuracy. In general, it has been discovered that modified MLP outperforms NB and RF in terms of the specified metrics. While modified MLP has demonstrated improved performance in internal testing, it is important to conduct external evaluations to validate its effectiveness in classifying ASD. Given this, the present research aims to conduct a comparative analysis with three recent studies to examine how effective the proposed model is in classifying ASD.
Comparative analysis
The proposed system is compared with traditional models, and the obtained results are discussed in this section. Initially, comparison has been made with algorithms namely LR and SVM. Table 5 and Figure 14 signify the comparative examination of the propounded model with the conventional model.
Comparative examination of propounded method with conventional method ( Farooq et al., 2023).
Accuracy | Precision | Recall | F1-score | |
---|---|---|---|---|
SVM | 81 | 73 | 56 | 63 |
LR | 78 | 81 | 51 | 62 |
Proposed | 99 | 99 | 99 | 99 |
Abbreviations: LR, logistic regression; SVM, support vector machine.

Comparative examination of the propounded model with the traditional system ( Farooq et al., 2023). Abbreviations: LR, logistic regression; SVM, support vector machine.
According to Figure 14 and Table 5, the proposed MLP with cross-weighted attention feature has achieved superior outcomes compared to conventional approaches. In this scenario, the conventional methods used are SVM and LR. The SVM model has an 81% accuracy rate, while the LR model has a 78% accuracy rate. Nevertheless, the results of the proposed model indicate superior performance, achieving an accuracy rate of 99%. Additionally, a comparison has been made with traditional algorithms such as MLP, SMO, SANN, DT, LR, and RF. Table 6 and Figure 15 show the comparison between the proposed model and the current model.
Comparative examination of the propounded system with the conventional system ( Batsakis et al., 2022).
Model | ROC curve value |
---|---|
MLP | 0.85 |
SMO | 0.5 |
RF | 0.87 |
LR | 0.81 |
DT | 0.77 |
SANN | 0.87 |
Proposed | 0.98 |
Abbreviations: DT, decision tree; LR, logistic regression; MLP, multi-layer perceptron; RF, random forest.

Comparative examination of the propounded model with the traditional model ( Batsakis et al., 2022). Abbreviations: MLP, multi-layer perceptron; ROC, receiver operating characteristic.
Figure 15 and Table 6 represent the results of the performance evaluation when contrasted with the conventional model using the receiver operating characteristic (ROC) curve. The examination results show that the proposed system outperformed with a ROC curve value of 0.98, while the traditional systems MLP, SMO, RF, LR, DT, and SANN achieved values of 0.85, 0.5, 0.87, 0.81, 0.77, and 0.87, respectively. Still, the proposed model achieved a higher ROC value of 0.98. Furthermore, a comparison has been conducted with current approaches such as J48, PART, SVM, RF, ANN, Attribute Selected Classifier (AttselClass), Bayes, and AdaBoost. Table 7 and Figure 16 display the findings from the comparative analysis of the proposed model with the conventional model.
Comparative analysis of proposed system with existing system ( Peral et al., 2020).
Model | Accuracy |
---|---|
J48 | 92 |
RF | 95 |
Bayes | 96 |
AdaBoost | 94 |
PART | 97 |
ANN | 97 |
SVM | 97 |
AttselClass | 96 |
Proposed | 99 |
Abbreviations: ANN, artificial neural network; AttselClass, Attribute Selected Classifier; RF, random forest; SVM, support vector machine.

Comparative examination of propounded model with the traditional model ( Peral et al., 2020). Abbreviations: ANN, artificial neural network; AttselClass, Attribute Selected Classifier; SVM, support vector machine.
Figure 16 and Table 7 display that the proposed model achieves superior results compared to the traditional models. The accuracy rates for J48, RF, NB, AB, PART, ANN, SVM, and AttselClass are 92%, 95%, 96%, 94%, 97%, 97%, 97%, and 96%, respectively, with the system achieving 99% accuracy. This demonstrates the efficient functioning of the system proposed. Despite PART, SVM, and ANN performing better, the proposed system outperformed them with a 2% increase in accuracy.
DISCUSSION
The propounded system has utilized modified MLP with cross-weighted attention mechanism for identifying ASD in adults. The MLP technique is utilized for its capacity to solve nonlinear problems and for handling huge datasets. Several studies have utilized MLP ( Rahman et al., 2020; Umamaheswari et al., 2021; Rahman and Mamun, 2022; Qureshi et al., 2023) for the identification process due to its advantages in several aspects like nonlinearity, flexibility, parallel processing, robustness, and universal approximator.
Nonlinearity: MLP is capable of modeling complex nonlinear associations amongst the input variables and output variables. This makes it appropriate for several applications.
Flexibility: MLP model possesses the innate ability to deal with several kinds of data inclusive of categorical and numerical variables.
Parallel processing: MLP could be trained and assessed in parallel. This can enhance the speed rate of the training process especially for huge datasets.
Robustness: MLP is sensitive to data with noise and still could afford reasonable detections with a less noise level.
Universal approximator: MLP possesses the capability for approximating persistent function, provided sufficient training and hidden neurons.
In spite of several advantages of MLP, there exists certain issues with regard to overfitting, computational complexity, black-box nature, and sensitivity to the initial weights.
Overfitting: MLP is likely to overfit, particularly when training data seem to be limited or when the model is complex.
Computational complexity: Training MLP with several neurons and hidden layers could be time-consuming and computationally expensive.
Black-box nature: MLP model lacks interpretability. This makes it complex to comprehend the reasons behind its prognostications.
Sensitivity to the initial weights: Performance of MLP could be subtle to biases and initial weights assigned to neurons. This might demand cautious tuning or initialization.
Moreover, generally, input data might be complex and large. As the data are typically processed by forwarding them through several NN layers, networks are encompassed of several inter-connected nodes structured into the layers. Individual node performs data processing and forwards them to the subsequent layer. This permits the model to increase the extraction of complicated features from the data when they pass through each layer. Due to this, it might be intricate for the MLP model to process all the data and determine the relevant information. To address these issues, the respective system utilized modified MLP with a cross-weighted attention mechanism. The cross-entropy attention model is utilized for enhancing the classification process by focusing on the important features in the input data and assigning appropriate weight to the parameter in the identification system. The cross-entropy attention model used with MLP permits to selectively focus on significant parts, thereby ignoring the irrelevant parts. This would assist the model in making precise predictions as well as to run effectively. By permitting the model to concentrate on significant information, the proposed mechanism could assist the model in enhancing the prediction rate. To verify this, the proposed model has been internally assessed and externally compared with three recent works.
From the assessment, it has been clearly found that, the proposed system accomplished an accuracy of 99%, recall of 99%, F1-score of 99%, precision of 99%, and ROC value of 98%. In terms of internal assessment with RF and NB system, RF attained an accuracy of 95%, and NB attained an accuracy of 98%. Furthermore, with regard to external comparison, the proposed work has been found to show superior outcomes. Therefore, the modified MLP with cross-weighted attention system on ASD identification for autism screening of adult dataset achieved better outcomes which have been verified through the results.
Statistical tests
Statistical tests are typically considered to evaluate the significance of differences, associations, or patterns found in the data. Moreover, statistical tests assist in taking informed decisions regarding the generalizability and performance of models. With the comparison of observed and expected outcomes in accordance with the statistical tests, it is possible to determine if the performance of the model is statistically significant or whether it happens as a result of randomness.
The current study specifically intends to use chi-squared test as it is distinctly designed to assess the categorical data, allowing it to be an appropriate choice to evaluate the associations amongst the categorical variables and target variable. This could afford the insights into the way in which varied kinds of feature impact the results. This test also does not depend on any assumptions regarding the underlying data distribution, permitting it to be suitable to several ML issues. Particularly, it is valuable when handling skewed or non-normal data.
Furthermore, chi-squared test also affords a P value. This value denotes the statistical variance of relationship amongst the variables. This permits for tranquil understanding of the outcomes. For computing chi-square, square of variance amongst the observed values and expected values are considered. Then, it is divided by the respective expected value. In accordance with the data categories, two or more than two values might be attained. Chi-square indicates the summation of these values. The chi-squared test results of the proposed work are presented in Table 8.
From Table 8, it is found that the chi-squared statistic value is 6.591397768445308, whereas the P value is found to be 0.010247268414204275. This indicates that the observed variance amongst the frequencies corresponding to two variables suffices to be statistically significant at the 0.05 phase.
CONCLUSION
ASD is a developmental and neurological disorder affecting enormous people in the world. Early diagnosis is needed to improve the patient’s quality of life. The screening of ASD is an expensive and time-consuming process that requires professionals. Moreover, most research focused on children with ASD and only a few methods concentrated on adults with ASD. Therefore, the proposed system employed modified MLP with cross-weighted attention mechanism to enhance the efficacy of ASD identification through autism screening on adult datasets. The MLP was employed for its capability in handling huge datasets and improving the accuracy of the propounded system. The conventional MLP achieved better results but lacked in speed, had overfitting issues, and required more parameters. To resolve these issues, the cross-weighted attention mechanism was utilized to enhance the computation by focusing on the important data features and selecting the appropriate weight in the respective system. Furthermore, the system was examined with internal and external comparison of ASD identification. From the result of the respective system, it was revealed that, the proposed model achieved 99% accuracy. Besides, from statistical analysis, the P value was found to be 0.010247268414204275, which represented that the observed variance amongst the frequencies that corresponds to two variables should be statistically significant at the 0.05 stage.
In spite of the better performance of the proposed model, there are various ways for enhancement that could be considered in future. Initially, clinical analysis can be undertaken with ML models for assessing the real-time effectualness of ML in identifying ASD in toddlers. Furthermore, other ML models could also be experimented by utilizing the significant features so as to explicate if precise classifiers could be accomplished with minimum data. Different feature selection methods can also be considered to assess the outcomes. The current work can be improvised by considering the processing of MRI brain-scan images. By doing so, the tedious process involved in answering the “AQ10” questionnaire could be avoided. Hence, this study is crucial in expediting further enhancement in the area of autism. Overall, this research affords significant contribution to autism diagnosis by unveiling the potentials of the proposed model. Through this, practitioners and researchers working in this area can gain beneficial and relevant insights.
Researchers are also using modified ML and DL models to identify subtypes of ASD, which could help personalize treatment approaches. By analyzing large datasets from individuals with ASD, these models can identify distinct subgroups based on genetic, behavioral, and cognitive characteristics. This could lead to more targeted and effective interventions, tailored to the specific needs of each individual with ASD. Future of ASD research is bright, with modified ML and DL models at the forefront. These models have the potential to revolutionize our understanding and management of ASD, leading to earlier diagnosis, customized treatment, and improved outcomes for individuals with the disorder.