INTRODUCTION
Attention-deficit/hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder in children, characterized by symptoms of inattention and hyperactivity. It significantly impacts the quality of life of affected children and places a prolonged burden on their families. Early diagnosis plays a crucial role in mitigating the adverse effects associated with this condition, enabling smoother transitions into puberty (Polanczyk and Jensen, 2008; Quaak et al., 2021). In recent decades, there have been significant advancements in ADHD neurobiological diagnostic tools. Machine learning (ML) and deep learning (DL) techniques have emerged as promising approaches for addressing the classification of ADHD, offering various effective methods that have substantially improved classification accuracy (Bakhtyari and Mirzaei, 2022; Loh et al., 2022).
ADHD impacts around 5% to 7% of children of school age, and a substantial proportion, ranging from 30% to 50%, persists in experiencing symptoms into adulthood. Timely diagnosis plays a crucial role in effectively treating ADHD, leading to a substantial reduction in the negative impacts of the condition and improving patients’ overall quality of life. Currently, ADHD diagnosis primarily relies on clinical symptoms, assessed through subjective reports utilizing a series of Hamilton scales (Safren et al., 2005; Polanczyk and Jensen, 2008). However, this symptom-based diagnosis approach requires skilled physicians and may introduce biases. In recent years, there has been growing interest in neurobiological diagnosis, which utilizes objective brain alterations. ML and DL techniques are widely employed to identify ADHD and detect associated neurobiological changes.
Researchers are now conducting investigations into risk variables with the aim of decreasing the occurrence of ADHD in youngsters. Extensive research has demonstrated a significant association between genetic factors and ADHD (Visser et al., 2007; Faraone et al., 2021). It has been found that genetic influences account for approximately 75% of the risk of developing ADHD in early childhood (Brikell et al., 2015). Besides genetics, ADHD risk factors include brain damage, maternal alcohol or tobacco use, and preterm delivery. Previous research has linked ADHD in children, asthma, race, anxiety, depression, smoking, and socioeconomic status (Freeman-Fobbs, 2003; Agranat-Meged et al., 2005; Kollins et al., 2005; Stevens et al., 2005; Bazar et al., 2006; Bramlett and Blumberg, 2007; Visser et al., 2007; Cortese et al., 2008; Waring and Lapane, 2008; Brikell et al., 2015; Faraone et al., 2021). These investigations were specifically conducted to explore the risk factors associated with ADHD in children.
ML models offer a promising alternative to conventional methods for prediction purposes. These approaches have been successfully applied in various domains, including medical imaging, healthcare, and mental health, enabling identification and prediction tasks. Several ML classifiers have been employed to forecast ADHD in children (Kim et al., 2015; Duda et al., 2016, 2017; Yasumura et al., 2020; Kim et al., 2021; Zhang-James et al., 2021).
In a study conducted by Uluyagmur-Ozturk et al. (2016), the emotional well-being of children in Turkey was investigated, and they were categorized into autism spectrum disorder (ASD), ADHD, or control groups based on their diagnoses. A total of 61 children were included in the dataset utilized in this study, which was acquired from Marmara University Medical Hospital. Within this cohort of youngsters, a total of 18 individuals were identified as having ASD, 30 individuals were diagnosed with ADHD, and 13 individuals were classified as developing normally.
The Relief algorithm was used to identify essential aspects of ASD and ADHD. They then classified the children into ASD, ADHD, and healthy groups using five ML methods: decision tree (DT), random forest (RF), support vector machine (SVM), K-nearest neighbor, and adaptive boosting (AB). The results demonstrated an accuracy rate of 80% when using AB for discriminating between children with ASD, ADHD, and those who were healthy (Duda et al., 2017).
Contribution
The main aim of this research work is to develop a new approach for diagnosing and detecting ADHD by leveraging a new ADHD dataset. The efficacy of various ML and DL models in predicting ADHD is thoroughly examined using text-based features extracted from Reddit posts. This comprehensive analysis sheds light on the efficiency of different approaches in precisely detecting positive and negative cases of ADHD using textual data.
Furthermore, the inclusion of a unique ADHD dataset adds originality and diversity to the evaluation, enabling a more comprehensive assessment of model performance across multiple data sources. The insights gained from this research offer valuable understanding of the advantages and limitations of ML and DL methodologies in ADHD detection. These findings are expected to drive future advancements in the domain of mental healthiness informatics.
The present research contributes to the advancement of knowledge about computational methodologies employed in the diagnosis of ADHD, as well as their prospective ramifications for early intervention and customized treatment approaches. It contributes to the broader comprehension and application of these techniques in the context of ADHD diagnosis.
Background of research
The diagnosis of ADHD is presently being advanced with the use of several data gathering methods and artificial intelligence algorithms. As an example, many research groups have used DL and ML algorithms to examine ADHD diagnosis using the Neuro Bureau ADHD-200 dataset (Riaz et al., 2020; Peng et al., 2021; Zhang et al., 2022; Zhao et al., 2022).
Peng et al. (2021) applied a convolutional neural network (CNN) DL method, achieving a diagnostic accuracy of 72.9% for ADHD. Chen et al. (2020) developed an ML approach based on SVM, resulting in an accuracy of 88.1% in diagnosing ADHD. A high-quality public dataset for MRI-based ADHD diagnostic research is available, enabling research teams to continuously improve their algorithms and achieve better research outcomes.
Another example involves studies on diagnosing ADHD using electroencephalogram (EEG) data (Chen et al., 2019; Vahid et al., 2019; Altinkaynak et al., 2020; Dubreuil-Vall et al., 2020; Tosun, 2021; Koh et al., 2022). Tosun (2021) achieved a classification accuracy of 92.2% by utilizing a long short-term memory (LSTM)-based DL system on a sample of 1088 ADHD patients and 1088 individuals without ADHD. Altinkaynak et al. (2020) utilized a multilayer perceptron (MLP)-based ML system to obtain an accuracy of 91.3%. They employed EEG data from 23 ADHD patients and 23 neurotypical persons.
Furthermore, some studies focus on analyzing findings from continuous performance tests (CPTs), commonly used in hospitals for ADHD diagnosis. The input data for researching ADHD categorization are derived from the results of the CPT (Koh et al., 2022). In their study, Slobodin et al. (2020) utilized an RF-based ML algorithm to analyze CPT findings from 213 ADHD patients and 245 neurotypical subjects. They were able to reach an ADHD classification accuracy of 87%.
Birnbaum et al. (2017) investigated the potential of social media in detecting persons with schizophrenia by integrating clinical assessments with social media information. The data were gathered by the researchers from publicly accessible Twitter tweets spanning the years 2012-2016, via a Twitter crawler known as GetOldTweetsAPI. The data included tweets from users who openly shared their diagnosis of schizophrenia. An approach was developed using fold cross-validation. The findings indicated that the classifier successfully distinguished genuine disclosures of schizophrenia from those made by control users, achieving an average accuracy of 88% and a peak performance of 95% based on the area under curve (AUC) parameter.
Jo et al. (2020) suggested using network analysis to distinguish between persons diagnosed with schizophrenia and those who do not. The researchers devised many ML models by leveraging the suggested attributes of brain networks.
The task of predicting anxiety in a clinical setting might be difficult due to its resemblance to major depressive disorder (Aleem et al., 2022). Sau and Bhakta (2017) employed ML to predict sadness and anxiety in elderly individuals. They evaluated 10 different classifiers on a specific feature set, achieving 89% accuracy using the RF model. Sau and Bhakta (2019) conducted a research where they used the hospital anxiety and depression scale to predict anxiety and depression among sailors. After evaluating five different ML classifiers, it was found that CatBoost performed better than RF.
Through the utilization of a graph attention network and the hierarchical structure of depression detection, Niu et al. (2021) were able to demonstrate the ability of a DL model to independently identify instances of depression (Gratch et al., 2014).
The work done by Yoon et al. (2022) focused on the detection of depression and involved the development of a multimodal DL model. The study employed data obtained from a randomly selected multimodal dataset including 961 vlogs sourced from the online platform YouTube. The results indicated that the accuracy, recall, and F1-score of the tested models were 65.40%, 65.57%, and 63.50%, respectively. Xezonaki et al. (2020) utilized a hierarchical attention network to classify interviews conducted with patients who had received a diagnosis of depression. The authors considered the hierarchical structure of these interviews, which involved the utilization of turns and words. In this study, DAIC-WOZ depression dataset was used.
Cho et al. (2020) employed data from Korea’s National medical checkup cohort to detect depression using an RF algorithm. To balance the two groups, down- or up-sampling techniques were applied, as only 0.02% of individuals exhibited sadness while 99.8% did not. The experiment concluded with an AUC value of 0.849.
Sharma and Verbeke (2020) devised an ML technique that utilizes the Lifelines database to improve the identification of depression by integrating biomarker and self-reported depression data. Since the data were skewed, several resampling strategies were implemented. The extreme gradient boosting (XGBoost) algorithm was applied to each sample.
Tachmazidis et al. (2020) conducted a study involving 45 ADHD males and 24 ADHD females, utilizing the Diagnostic Interview for ADHD in Adults in conjunction with DTs and knowledge-based systems. The model was validated using leave-one-out cross-validation (LOOCV), achieving an accuracy of 95.7%.
Öztekin et al. (2021) examined 87 ADHD individuals and 75 normal controls using the behavior rating inventory of executive function-preschool version (BRIEF-P). They applied SVM with fivefold cross-validation, reporting an accuracy of 92.6%.
Christiansen et al. (2020) included 385 ADHD and 135 obese participants, using the Conners’ Adult ADHD Rating Scales (CAARS) and a DT model with a hold-out validation (70% training and 30% test), achieving an accuracy of 80.0%.
Yeh et al. (2020) explored 37 ADHD and 31 normal participants using a virtual reality game system, employing SVM with fivefold cross-validation and achieving an accuracy of 83.2%.
In studies using pupillometric data, Varela Casal et al. (2019) evaluated 21 ADHD and 21 normal participants using eye vergence features and SVM with 30-fold cross-validation, achieving an accuracy of 96.3%. Das and Khanna (2021) examined 28 ADHD and 22 normal participants using pupil-size dilation features and SVM with nested 10-fold cross-validation, reporting an accuracy of 76.1%.
In the domain of genetic data, Liu et al. (2021a) investigated 1033 ADHD and 950 normal individuals using a CNN with hold-out validation (75% training, 5% validation, and 20% test), identifying EPHA5 as a potential risk gene with a diagnostic accuracy of 90.2%. Liu et al. (2021b) also examined 116 ADHD and 408 normal participants using an MLP withhold-out validation (60% training and 40% test), highlighting that GRM1 and GRM8 genes have the highest weight in ADHD diagnosis. Cervantes-Henríquez et al. (2022) employed an ensemble model on 408 ADHD participants with hold-out validation (70% training and 30% test), identifying ADGRL3, DRD4, and SNAP25 genes as contributing to ADHD severity. Sudre et al. (2021) used an RF model on 362 ADHD participants, revealing that those with the highest polygenic risk scores for ADHD exhibited worsening symptoms.
METHODOLOGY
This section offers a comprehensive outline of the suggested methodology utilized in the development of an ADHD detection system employing ML techniques. The system aims to accurately diagnose ADHD by analyzing social media posts and comments on the Reddit platform. The methodology consists of six phases: dataset collection, data preprocessing, feature extraction, classification, evaluation metrics, and results analysis. The proposed methodology is illustrated with flowchart depicted in Figure 1.
Data collection
In this work, the dataset utilized was acquired via the Kaggle (n.d.) platform. It comprises content from Reddit forums dedicated to ADHD and includes both Reddit posts and comments collected until February 2021. From a total of 3,229,944 samples, 10,000 samples were randomly selected and manually labeled by three domain experts as positive or negative cases. This labeled dataset provides valuable insights into the experiences and perspectives shared within the ADHD community on Reddit, contributing to a better understanding of attitudes toward ADHD.
Data preprocessing
In this phase, we conducted several preprocessing steps to ensure that the dataset is clean and well-structured, enabling it to be effectively classified by the ML model. The data preprocessing in this study involved the following steps:
Data cleaning
Punctuation removal: Punctuation marks were eliminated from the comments to focus solely on the textual content.
Lowercasing: All text contents were converted to lowercase to ensure consistency in the data.
Removal of special characters: Irrelevant special characters, such as emojis or symbols, were removed.
Tokenization
Each Reddit post was divided into individual words or tokens. This step aids in breaking down the text into smaller units for further processing.
Stop word removal
Stop words, such as “and,” “the,” or “is,” were removed from the content of the Reddit posts. These words do not carry significant meaning and their exclusion helps reduce noise in the data.
Removal of single-word Reddit posts
Any Reddit post containing only one word was deleted from the dataset. Figure 2 illustrates the distribution of classes in the ADHD dataset. After performing the preprocessing steps, the dataset consists of 9021 samples, including 5340 Reddit posts labeled as positive ADHD cases, represented by 1, and 3681 Reddit posts labeled as negative ADHD cases, represented by 1, in Figure 1.
Feature extraction
By incorporating the term frequency-inverse document frequency (TF-IDF) technique as a feature extraction method (Ramos, 2003) in our methodology, our goal was to classify the most distinguishing words or terms associated with ADHD within the content of Reddit posts. This approach enabled us to generate a numerical representation of the text data, considering the significance of specific words in the context of ADHD detection. We utilized these TF-IDF representations as features for our ML models to train and classify posts as either ADHD-related or not. The TF-IDF method consists of two components: term frequency (TF) and inverse document frequency (IDF). The formula for TF is as follows:
Set D represents a collection of documents, where d functions as a document. In the context of d ∈ D, a document is defined as a subset of phrases and words w. Let nw (d) be the frequency of words w in document d. Hence, the calculation of the volume of document d may be calculated in the following manner:
The frequency of the word’s occurrence in the document may be determined using Equation (2) mentioned above. IDF is the second component utilized to calculate the ratio between the total number of documents in the corpus and the number of documents containing a certain phrase. The formula for determining IDF is as follows:
Hence, the calculation of TF-IDF for word w in relation to document d and corpus D may be accomplished using the below equation:
The TF-IDF diagram illustrates the frequency of word repetition in the content for the ML classifier.
Classification
In the classification phase of the proposed ADHD detection system, various ML and DL models were employed, utilizing TF-IDF features, to accurately detect and classify Reddit posts as positive or negative ADHD cases. The TF-IDF approach allows the conversion of the textual content of the posts into numerical representations that captured the importance of specific words within the context of ADHD discussions. For the ML models, different algorithms, such as SVM, RF, XGBoost, AB, and the Voting Classifier, were employed. These models utilized the TF-IDF features to learn patterns and relationships within the text data and make predictions regarding whether a Reddit post is associated with a positive or negative ADHD case.
ML models
In our experiment, we utilized the SVM model (Cortes and Vapnik, 1995) to categorize the content of Reddit posts as positive or negative cases of ADHD. SVM is a supervised learning approach that efficiently separates data points by employing hyperplanes in a high-dimensional space. Our objective in utilizing SVM was to identify the optimal decision boundary in the feature space that maximizes the margin between positive and negative ADHD instances. SVM is an excellent choice for our classification task as it is capable of handling high-dimensional data and capturing complex correlations between features.
Additionally, we employed the RF model (Breiman and Random, 2001) to classify the content of Reddit posts as positive or negative examples of ADHD. The RF model combines multiple TDs and aggregates their predictions. Each TD is trained with a random subset of the data and features, resulting in the generation of different tree models. By leveraging the collective knowledge of these trees, the RF model effectively handles complex interactions and provides reliable classification outcomes. We found this model to be a valuable addition to our experiment due to its ability to handle high-dimensional data and capture feature importance. The mathematical formulation for the RF model is created using the Gini index and entropy formulas, as given below.
In our ADHD classification task, we employed XGBoost. XGBoost is an advanced gradient boosting technique that has gained significant popularity due to its robust predictive capabilities (Chen et al., 2024). It sequentially trains a group of weak prediction models while optimizing a specific loss function, resulting in high overall performance. XGBoost is particularly adept at handling both numerical and categorical features, automatically managing missing values, and offering effective regularization techniques. By utilizing XGBoost, our aim was to achieve accurate classification outcomes for the content of Reddit posts. We leveraged XGBoost’s ability to handle complex relationships and optimize efficiency to enhance the precision of our ADHD classification results.
The AB algorithm was also utilized to classify the content of Reddit posts as positive or negative cases of ADHD. The AB method is an ensemble learning technique that pools numerous weak classifiers in order to construct a more robust classifier mechanism (Schapire, 2013). It trains the classifiers incrementally, assigning more weight to incorrectly classified instances in each iteration, allowing subsequent classifiers to focus on challenging data samples. AB’s adaptability and ability to focus on challenging instances made it a valuable addition to our classification models.
In our ADHD classification experiment, we also utilized a Voting Classifier. The Voting Classifier combines the predictions of multiple individual models, including SVM, RF, XGBoost, and AB, to make the final classification decision. It aggregates the predictions through methods like majority voting or weighted voting, depending on the configuration. By leveraging the diversity of predictions from different models, the Voting Classifier aims to enhance the overall accuracy and robustness of the ADHD classification task. This ensemble approach enables us to leverage the strengths of multiple models and make more informed decisions during the classification process.
DL models
In our experiment, we incorporated also the LSTM and gated recurrent unit (GRU) models for the classification of Reddit post content into positive or negative ADHD cases.
The LSTM is a surpasser at catching sequential dependencies and long-term dependencies in data (Loh et al., 2022). By using its ability to remember relevant information over extended sequences, LSTM can effectively model and classify text data. In our context, LSTM can analyze the sequential nature of Reddit post content and capture the existing patterns and context related to ADHD symptoms. Figure 3 displays the architecture of the LSTM model.

Structure of the LSTM model. Abbreviations: ADHD, attention-deficit/hyperactivity disorder; LSTM, long short-term memory.
The embedding layer in the LSTM model converts the numerical indices of vocabulary (words) features represented by W 0, W 1, W 2, and Wn in Figure 3 into dense vectors of fixed size. The LSTM layer processes the embedded sequences to capture temporal dependencies. The dense layer with a sigmoid activation function produces the final binary classification output. This makes LSTM a suitable choice for our classification task, allowing us to capture the temporal dynamics within the textual data. LSTM networks incorporate a cell state and gates to store, modify, and access learned temporal relationships. The cell state acts as a highway to transfer relevant information across sequence steps, functioning as the model’s memory. Gates modulate the cell state, controlling what to add or remove as data flow through the sequence. These gates are neural networks that learn to retain or discard information from the cell state during training.
The LSTM output at each timestep is determined by the current cell state, filtered by the output gate using a sigmoid activation to squeeze values from 0 to 1. This gate selects the next hidden state based on relevant memory in the cell. The hidden state then determines what information gets passed to the next step. The formal definition of the LSTM gates architecture is as follows:
The input gate:
The forget gate:
The memory cell:
The output gate:
The hidden state:
The symbol xt denotes the input at a particular time step t, while ht corresponds to the hidden state at the same time step. The memory cell state at time t is denoted as ct . Additionally, it , ft , and ot stand for the input, forget, and output gates, respectively. The network’s weights and biases are represented by W, U, and b. The sigmoid activation function is symbolized by σ, and the hyperbolic tangent activation function is denoted by tanh. These components collectively define the LSTM’s architecture, enabling it to process sequential data with the ability to retain long-term dependencies. Table 1 summarizes different parameters used in the LSTM model structure.
The GRU model is a specific architecture utilized in reccurrent neural neworks (Liu et al., 2021b). Our methodology incorporates the utilization of this model to detect and classify ADHD by extracting features from Reddit posts and comments. The proficient ability of this model in handling sequential data renders it very suitable for the analysis of the complex textual information that is inherent in social media interactions. The suggested architecture starts with an embedding layer that converts the textual information into a vector format. This allows the neural network to understand and learn from the subtle semantic details included in the text. The GRU layer enables the identification of temporal relationships in the sequences, successfully detecting patterns and context throughout the Reddit post content. Figure 4 presents a structure of the GRU model.

Structure of the GRU model. Abbreviations: ADHD, attention-deficit/hyperactivity disorder; GRU, gated recurrent unit.
The gating mechanisms inherent in the GRU layer facilitate the selective retention and updating of pertinent information, thus enhancing the efficiency of information flow in comparison to conventional recurrent neural networks. The GRU layer is followed by the utilization of dense layers to further analyze and condense the acquired representations, ultimately resulting in the final classification layer. The model is trained to differentiate between positive and negative cases of ADHD using binary cross-entropy loss and RMSprop optimizer. This is achieved by dividing the dataset into training and validation subsets. During the training process, the model undergoes iterative refinement of its parameters in order to minimize classification error and improve accurate predictions.
RESULTS AND DISCUSSION
Data split
The ADHD dataset has been divided into 80% training and 20% testing as part of this stage. A training set is utilized for the purpose of training an ML model, while a testing set is applied to evaluate the model’s performance.
Evaluation metrics
In this study, we used a comprehensive set of evaluation measures to compare the performance of several ML models in the classification of ADHD cases. These metrics serve as quantitative measurements of the models’ ability to reliably identify individuals with ADHD, taking into account both positive and negative instances. These metrics included precision, recall, accuracy, AUC, and F1-score. These metrics were obtained using confusion matrices.
Classification
This section offers the testing classification findings for ADHD identification obtained through several experiments based on ML approaches. The primary goal was to assess the effectiveness of several models in properly identifying persons with ADHD using metrics such as precision, recall, F1-score, accuracy, and AUC.
The XGBoost model
The testing results of the XGBoost model for classifying ADHD cases from Reddit post content using TF-IDF characteristics are presented in Table 2.
Testing results of the XGBoost model.
Precision % | Recall % | F1-score % | Accuracy % | AUC % | |
---|---|---|---|---|---|
ADHD (negative) | 77 | 75 | 76 | 81 | 80 |
ADHD (positive) | 83 | 84 | 84 | ||
Weighted average | 81 | 81 | 81 |
Abbreviations: ADHD, attention-deficit/hyperactivity disorder; AUC, area under curve; XGBoost, extreme gradient boosting.
A precision value of 77% was observed for negative ADHD, accompanied by a recall value of 75% and an F1-score of 76%. Higher precision (83%), recall (84%), and F1-score (84%) were observed in individuals with positive ADHD. The weighted average accuracy, recall, and F1-score of the model were calculated to be 81%. Significantly, the model demonstrated a notable accuracy of 81% and an AUC of 80%, thus confirming its effectiveness in accurately differentiating cases with ADHD. The findings of this study demonstrate the model’s proficiency in effectively categorizing instances of ADHD based on the content of the social media Reddit posts. The confusion matrix and graphical depiction of the receiver operating characteristic (ROC) metrics for the XGBoost model are illustrated in Figure 5.
The SVM model
Table 3 displays the results of evaluating the SVM model for ADHD classification using TF-IDF features extracted from Reddit posts.
Testing results of the SVM model.
Precision % | Recall % | F1-score % | Accuracy % | AUC % | |
---|---|---|---|---|---|
ADHD (negative) | 73 | 68 | 71 | 77 | 75 |
ADHD (positive) | 79 | 83 | 81 | ||
Weighted average | 77 | 77 | 77 |
Abbreviations: ADHD, attention-deficit/hyperactivity disorder; AUC, area under curve; SVM, support vector machine.
The accuracy, recall, and F1-score for negative ADHD were measured to be 73%, 68%, and 71%, respectively. Conversely, for positive ADHD, the same metrics exhibited higher values of 79%, 83%, and 81%. The precision, recall, and F1-score had a weighted average of 77%. Furthermore, the model demonstrated a precise accuracy of 77% and an AUC of 75%, suggesting its effectiveness in differentiating cases with ADHD. The results emphasize the efficacy of the SVM model in accurately categorizing ADHD cases based on Reddit post content. Figure 6 indicates the confusion matrix and ROC metrics for the SVM model.
The RF model
Table 4 displays the results of evaluating the RF model for ADHD classification using TF-IDF characteristics obtained from Reddit posts. The precision, recall, and F1-score for negative ADHD were found to be 77%, 77%, and 77%, respectively.
Testing results of the RF model.
Precision % | Recall % | F1-score % | Accuracy % | AUC % | |
---|---|---|---|---|---|
ADHD (negative) | 77 | 77 | 77 | 81 | 81 |
ADHD (positive) | 84 | 84 | 84 | ||
Weighted average | 81 | 81 | 81 |
Abbreviations: ADHD, attention-deficit/hyperactivity disorder; AUC, area under curve; RF, random forest.
These results suggest that the performance in recognizing negative instances was well-balanced. In contrast, the ADHD class that exhibited positive results displayed superior precision (84%), recall (84%), and F1-score (84%), indicating an improved level of accuracy in identifying positive cases. The precision, recall, and F1-score, when weighted, exhibited a value of 81%, indicating a consistent level of performance across both courses. Furthermore, the model demonstrated a precision of 81% and an AUC of 81%, underscoring its efficacy in distinguishing ADHD instances. The results of this study highlight the effectiveness of the RF model in accurately categorizing cases of ADHD based on the content of Reddit posts. Figure 7 denotes the confusion matrix and ROC for the RF model.
The AB model
The results of testing of the AB model for the classification of ADHD cases, utilizing TF-IDF characteristics derived from Reddit posts, are displayed in Table 5. The precision, recall, and F1-score for the negative ADHD class were 74%, 75%, and 74%, respectively. These values indicate a balanced performance in correctly detecting negative cases.
Testing results of the AB model.
Precision % | Recall % | F1-score % | Accuracy % | AUC % | |
---|---|---|---|---|---|
ADHD (negative) | 74 | 75 | 74 | 79 | 78 |
ADHD (positive) | 82 | 82 | 82 | ||
Weighted average | 79 | 79 | 79 |
Abbreviations: ADHD, attention-deficit/hyperactivity disorder; AUC, area under curve.
In contrast, the ADHD class that displayed positive results demonstrated superior precision (82%), recall (82%), and F1-score (82%), indicating an improved level of accuracy in identifying positive instances. The precision, recall, and F1-score, when weighted, exhibiting a value of 79%, indicating a consistent level of performance in both classes. Furthermore, the model demonstrated a precision of 79% and an AUC of 78%, highlighting its efficacy in differentiating ADHD instances. The results of this study highlight the effectiveness of the AB model in accurately categorizing cases of ADHD based on the content of Reddit posts. Figure 8 represents the confusion matrix and ROC for the AB model.
The Voting model
The results of the evaluation of the Voting model for ADHD classification, utilizing TF-IDF characteristics derived from Reddit posts, are displayed in Table 6. The precision, recall, and F1-score for the negative ADHD class were 77%, 75%, and 76%, respectively.
Testing results of the Voting model.
Precision % | Recall % | F1-score % | Accuracy % | AUC % | |
---|---|---|---|---|---|
ADHD (negative) | 77 | 75 | 76 | 81 | 80 |
ADHD (positive) | 83 | 85 | 84 | ||
Weighted average | 81 | 81 | 81 |
Abbreviations: ADHD, attention-deficit/hyperactivity disorder; AUC, area under curve.
These scores indicate a balanced performance in correctly recognizing negative cases. In contrast, the results indicate that positive ADHD had higher precision (83%), recall (85%), and F1-score (84%), implying enhanced accuracy in the identification of positive cases. The precision, recall, and F1-score, calculated as the weighted average, were all 81%, indicating a consistent level of performance in both classes. Moreover, the model exhibited a precision of 81% and an AUC of 80%, thus showcasing its efficacy in differentiating instances of ADHD. The results of this study highlight the effectiveness of the Voting model in accurately categorizing cases of ADHD based on the content of Reddit posts. Figure 9 represents the confusion matrix and ROC metrics for the Voting model.
The GRU model
In Table 7, the testing results of the GRU model for ADHD classification using TF-IDF features from Reddit posts are presented. As a DL model, GRU is proficient in analyzing sequential data, making it particularly suitable for tasks such as text classification. For the negative ADHD class, precision, recall, and F1-score were 77%, 68%, and 72%, respectively, demonstrating a balanced performance in identifying negative cases. Conversely, for positive ADHD, higher precision (79%), recall (85%), and F1-score (82%) were observed, indicating improved accuracy in detecting positive cases. The weighted average precision, recall, and F1-score were 78%, underscoring consistent performance across both classes. Furthermore, the GRU model had a precision rate of 78% and an AUC of 77%, underscoring its efficacy in differentiating ADHD instances. These results emphasize the proficiency of the GRU DL model in accurately classifying ADHD cases using Reddit post content. Figure 10 includes three graphical representations for confusion matrix, ROC, and performance plot for training and validation of the GRU model.
The LSTM model
The testing results of the LSTM model for each ADHD class, a DL architecture, for the classification of ADHD by the utilization of TF-IDF features derived from Reddit posts, are displayed in Table 8. Like the GRU model, the LSTM model has proficiency in processing sequential data, rendering it well-suited for applications such as text classification.
Testing results of the LSTM model.
Precision % | Recall % | F1-score % | Accuracy % | AUC % | |
---|---|---|---|---|---|
ADHD (negative) | 73 | 73 | 73 | 77 | 77 |
ADHD (positive) | 81 | 81 | 81 | ||
Weighted average | 77 | 77 | 77 |
Abbreviations: ADHD, attention-deficit/hyperactivity disorder; AUC, area under curve; LSTM, long short-term memory.
The negative ADHD class achieved an accuracy, recall, and F1-score of all 73%, indicating a balanced performance in accurately identifying negative instances.
In contrast, the precision, recall, and F1-score for positive ADHD were all greater at 81%, indicating enhanced accuracy in identifying positive instances. The precision, recall, and F1-score metrics, calculated as the weighted average, were all 77%, indicating consistent performance in both classes. In addition, the LSTM model exhibited a precision of 77% and an AUC of 77%, thus showcasing its efficacy in differentiating instances of ADHD. The findings of this experiments highlight the effectiveness of the LSTM DL model in accurately categorizing cases of ADHD based on information found in Reddit posts. Figure 10 includes three graphical representations for confusion matrix, ROC, and performance plot for training and validation accuracy of the LSTM model.
It is important to emphasize that the ADHD dataset used in this study contains a class imbalance between negative and positive ADHD patients. Such class imbalances can pose problems for ML and DL algorithms, resulting in biased models that favor the majority class.
Table 9 summarizes the testing results for both dataset classes obtained from testing performance of each model. Each model underwent rigorous evaluation to assess its ability to classify ADHD cases accurately.
Testing classification results of the proposed models.
Model name | Precision | Recall | F1-score | Accuracy | AUC |
---|---|---|---|---|---|
SVM | 0.789 | 0.830 | 0.809 | 0.768 | 0.75 |
RF | 0.843 | 0.837 | 0.840 | 0.811 | 0.8 |
XGBoost | 0.840 | 0.832 | 0.836 | 0.805 | 0.80 |
AB | 0.824 | 0.815 | 0.819 | 0.787 | 0.78 |
Voting | 0.832 | 0.849 | 0.840 | 0.809 | 0.80 |
LSTM | 0.807 | 0.806 | 0.807 | 0.77 | 0.77 |
GRU | 0.788 | 0.854 | 0.819 | 0.78 | 0.77 |
Abbreviations: AUC, area under curve; GRU, gated recurrent unit; LSTM, long short-term memory; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.
Among the models tested, the RF model exhibited the most promising performance, achieving a precision of 0.843, recall of 0.837, F1-score of 0.840, accuracy of 0.811, and AUC of 0.81. These metrics collectively highlight the model’s efficacy in striking a balance between precision, recall, and accuracy, making it a compelling choice for ADHD detection.
In comparison, while other models such as SVM, XGBoost, AB, LSTM, GRU, and Voting performed well, they fell short of the RF performance expectations. Notably, the LSTM model has good recall but poorer precision, indicating possible areas for refinement or optimization.
Wordcloud
A wordcloud approach refers to a graphical depiction of textual data, wherein words extracted from the input text dataset are shown in varying sizes based on their frequency or significance. This technique was employed to ascertain the most salient keywords and phrases within the dataset. Figure 11 represents a wordcloud based on the used dataset.
ADHD is a neurodevelopmental condition that is distinguished by enduring patterns of inattention, hyperactivity, and impulsivity. The phenomenon frequently presents itself throughout the developmental stage of infancy and has the potential to last into adulthood, exerting a substantial influence on several facets of an individual’s existence, including scholastic and professional achievements, interpersonal engagements, and psychological welfare. In this work, we used testing classification data to assess the performance of various ML algorithms for classifying and predicting ADHD cases from user-generated text. It is critical to recognize that ADHD is a complicated mental health disease, and a precise diagnosis is required for effective intervention and management measures. The emprical results revealed that the RF model exhibited the most promising performance among the models evaluated. The RF model demonstrated robust capabilities in accurately identifying individuals with ADHD. These findings align with previous research indicating the suitability of ensemble-based methods, such as RF, for handling high-dimensional and imbalanced datasets, characteristic of ADHD classification tasks.
Despite the RF model’s excellent performance, it is critical to recognize the complexity involved in ADHD diagnosis and classification. ADHD is normally diagnosed with a comprehensive clinical examination that includes behavioral observations, self-report assessments, and neuropsychological testing. ML models can help clinicians make decisions.
Moreover, the class imbalance present in the ADHD dataset underscores the importance of employing techniques to address this issue, such as oversampling, undersampling, or the use of ensemble methods designed to handle imbalanced data. Future research should explore strategies to mitigate class imbalance effects further and enhance model generalization.
CONCLUSIONS
In this study, we utilize ML and DL algorithms for developing an ADHD diagnosis system based on data derived from Reddit social media analysis. Using TF-IDF-based feature extraction and model classification, we achieved accurate identification of ADHD-related content, as demonstrated by the performance of the RF model, which reported an F1-score of 84% and an AUC of 81%. These findings have significant implications for both research and clinical practice.
The proposed ADHD detection method serves as a powerful tool for automatically identifying ADHD-related content shared on social media websites, enabling researchers to access large datasets for further analysis and insights. Additionally, in clinical settings, such a system can complement existing diagnostic approaches, thereby enhancing early detection and care for individuals with ADHD.
Moving forward, future research can focus on refining the developed system by exploring additional features or ML algorithms to further enhance accuracy and robustness. Efforts to validate the system’s performance in real-world scenarios and integrate it into clinical practice would be crucial steps toward improving ADHD detection and management strategies. Expanding the dataset, incorporating multimodal data, and employing advanced DL techniques such as bidirectional encoder representations from transformers and CNN can also contribute to more sophisticated and reliable detection mechanisms. This approach will not only support the advancement of ADHD research but also improve practical diagnostic tools for healthcare providers.