Developing System-based Voice Features for Detecting Parkinson’s Disease Using Machine Learning Algorithms

Al‐nefaie, Abdullah H.; Aldhyani, Theyazn H. H; Koundal, Deepika

doi:10.57197/JDR-2024-0001

INTRODUCTION

Parkinson’s disease (PD) is a neurological condition that is distinguished by the progressive degradation of neurons in the substantia nigra ( Wang et al., 2022). Dopamine is a biologically occurring molecule that has chemical properties similar to phenethylamine and catecholamines. The neurotransmitter dopamine is of utmost importance in regulating physical activity ( Yousefvand and Hamidi, 2020), as it assumes a crucial function in facilitating the transmission of information between the substantia nigra and several other regions within the brain. When there is a significant loss of dopamine-producing cells, typically ranging from 60 to 80%, the resultant decrease in dopamine levels becomes insufficient for adequate regulation of movement. Consequently, the manifestation of PD symptoms arises from a deficiency in dopamine ( Panda and Bhuyan, 2022). The PD is used to effective the depletion of dopamine-producing neurons in the body leads to a decline in motor control ( Behl et al., 2022). PD is distinguished by four symptoms that are connected to the motor system. Tremors, which have the potential to manifest in many parts of the body, such as the jaw, hands, legs, and arms, serve as an illustrative instance of these observable indications. Stiffness, characterized by a restricted range of motion in the joints and muscles, is an additional symptom of PD. Additional motor symptoms often seen in individuals with PD include impaired postural stability and reduced velocity of movement ( Alfonsetti et al., 2022). PD is associated with other nonmotor symptoms, including cognitive impairment (dementia), mood disorders (depression), sensory abnormalities (restless legs), sensitivity to temperature changes, and gastrointestinal dysfunctions ( Wang et al., 2023). Significant progress has been achieved by researchers in the exploration of therapeutic options for individuals experiencing both motor and nonmotorized manifestations of PD despite the inherent incurability of this condition. The available choices for the diagnosis and treatment of a medical condition span a spectrum, including minimally invasive approaches, such as pharmacological interventions, as well as more intrusive measures, such as surgical procedures.

The etiological processes of the illness exhibit variability and remain incompletely understood ( Bloem et al., 2021). The condition manifests with a diverse range of motor and nonmotor symptoms, some of which directly impact an individual’s capacity to function. PD often presents clinically after a prodromal phase that may span many decades. During this period, there are no observable clinical indicators, although individuals may have nonspecific symptoms, such as constipation, apathy, daytime drowsiness, inattention, melancholy, anxiety, olfactory impairment, pain, and motor slowness ( Pont-Sunyer et al., 2015; Schrag et al., 2015; Gaenslen et al., 2021; Walter et al., 2012). Motor symptoms often emerge in the latter stages of neurodegeneration, as the compensatory mechanisms for smooth motor control have been depleted. These symptoms do not become apparent until about 50% of the dopaminergic neurons in the midbrain’s substantia nigra area have perished ( Bezard et al., 2003). However, these indicators serve as distinguishing features that aid in the identification of the illness. The primary manifestations of PD are bradykinesia, which signifies a decline in voluntary motor control and is marked by reduced speed, progressively restricted range of motion, muscular stiffness, and tremor ( Postuma et al., 2015). Currently, the diagnosis of PD mostly relies on clinical assessment, which presents several obstacles. This occurs later in the neurodegenerative process. Diagnosing the illness requires good clinical sign identification, especially when motor symptoms are mild. It is worth noting that both moderate motor symptoms and nonmotor symptoms are often seen in the older population without any underlying health issues ( Mahlknecht et al., 2020). PD exhibits symptoms similar to those of several other neurodegenerative, toxic, or vascular disorders. These conditions may possess distinct prognoses and require different treatment strategies. Furthermore, these symptoms may also bear resemblance to the natural aging process, as previously discussed.

The timely detection of PD is crucial due to the potential misattribution of its debilitating motor symptoms to aging or alternative factors. When promptly identified, these symptoms may be successfully managed with appropriate treatment strategies. The implementation of disease intervention at an earlier stage may lead to an extended period of optimal functioning ability, as well as an increased duration of the patient’s life characterized by a high quality of life. The imminent emergence of neuroprotective therapy necessitates the early identification of those who might benefit from it, prior to the manifestation of symptoms. Numerous investigations in the field of neurodegeneration have lately focused on the identification of biomarkers for the illness ( Kouba et al., 2022). The deterioration of brain structures has been identified as a significant factor in the vulnerability in speech impairment ( Duffy, 2019). Extensive research in the field of PD has consistently revealed a reduction in the quality of speech ( Mostafa et al., 2019; Mohammed et al., 2021). Hence, it is unsurprising that there has been a significant surge in scientific interest regarding speech acoustic analysis in PD in recent years. This interest stems from the possibility of such research providing valuable insights into the intricacies of fine motor control. The accessibility and simplicity of the recording approach position it as a promising contender for serving as a diagnostic biomarker and a measure of disease progression in PD.

Recently, healthcare has admired machine learning (ML). Computer software may learn and extract relevant information from data with minimal human input using ML. Motor data gait ( Alqahtani et al., 2018; Ali et al., 2019a, b), neuroimaging ( Amoroso et al., 2018; Anand et al., 2018; Khodatars et al., 2021), speech ( Baby et al., 2017; Baggio et al., 2019), cerebrospinal fluid, cardiac scintigraphy ( Benba et al., 2016b), serum ( Benba et al., 2016c), and optical coherence tomography ( Bernad-Elazari et al., 2016) have been used to diagnose PD using ML models. ML may also integrate magnetic resonance imaging and single photon emission computed tomography data for PD diagnosis ( Bhati et al., 2019; Buongiorno et al., 2019). To identify preclinical or atypical PD, we may use ML techniques to find significant traits that are not commonly used in clinical diagnosis and rely on these alternative measurements.

The paper’s contributions may be succinctly described as follows:

The use of artificial intelligence algorithms was implemented to facilitate the early identification and diagnosis of PD via the analysis of acoustic signals.
This application aims to assist healthcare professionals and physicians in achieving prompt medical interventions and accurate diagnoses.
The suggested approach uses many ML algorithms to identify suitable ways for identifying PD with maximum accuracy.
Additionally, Pearson’s coefficient was employed to determine the connection between all characteristics within the datasets.

BACKGROUND

The extant corpus of academic literature has presented several approaches that facilitate the identification, classification, and prognosis of PD severity. Joshi et al.’s ( Joshi et al., 2022) proposal is a recommended model to distinguish persons diagnosed with PD from those who are healthy; a collection of 12 ML models has been developed. These models possess the capability to detect indicators of relapse, such as modified ranges. This study used numbers of ML algorithms include naive Bayes (NB), k-nearest neighbors (k-NN), logistic regression (LR), multilayer perceptron, decision tree (DT), support vector machine (SVM) with linear, polynomial, and radial basis function (RBF) kernels, and random forest (RF). The evaluation of the proposed model, which comprises RF, SVM, k-NN, extra trees, and an extreme gradient boosting classifier is conducted using the mean square error and mean absolute error metrics. The proposed model employs principal component analysis and linear discriminant analysis as methodologies to improve accuracy. The achieved testing accuracy is between 90 and 91%, whereas the training accuracy ranges between 98 and 100%. Behroozi and Sami (2016) used machine learning repository dataset for testing the proposed models. The convolutional neural network architectures are suggested to aid in the identification of PD using the speech feature set. The F-score, a measure of the model precision and recall, shows a high level of performance at 88%. Additionally, the accuracy metric reflects an accuracy rate of 82%. Lucijano et al. proposed models to remote monitoring in PD ( Berus et al., 2018). A transnasal flexible laryngoscope (TQWT) was used to evaluate and examine the vocal symptoms shown by persons who had been diagnosed with PD. The alignment of the waveform offset of the adjustable q-factor is coordinated with the acoustic properties of the PD target signal to optimize the extraction procedure during the merging operation. The incorporation of the PD request inquiry with the mel frequency cepstral coefficients and tapered quarter wave tube coefficients yields supplementary data that bolster the reliability of the methodology. The person in issue has regularly used the same greetings on several occasions. To obtain a more precise demarcation of the limits within which perception occurs, the estimated boundaries are placed on the channel that may be adjusted by the q-factor. The TQWT approach may be used to assess the current extent of joint PD review criteria for individuals diagnosed with PD via the integration of a comprehensive range of remote monitoring technologies. Based on quantitative approaches, the F1-score and MCC for the classifier’s identification of supply with low redundancy were determined to be 0.84 and 0.59, respectively. OkanSakar et al. (2019) used extreme gradient boosting, RF, SVM, k-NN, and other relevant methodologies for the significant analysis of PD; it was performed to ascertain the most prominent attributes of individuals with PD. The overall number of attendances will be 80, including 40 individuals diagnosed with PD and 40 individuals considered healthy controls. The top-performing model used light gradient-boosting to obtain an area under the receiver operating characteristic (ROC) curve of 95.1% in a fourfold cross-validation analysis. Gunduz (2019) used a battery of speech tests to discern individuals with PD. Researchers use both linear and recursive regression analyses to ascertain the first manifestation of PD. By conducting meticulous experimentation and data manipulation to train SVM classifiers with certain characteristics, we have successfully established a comprehensive understanding of and connection between these two elements. Consequently, we can ensure that the PD scores are calculated with universal insight. The removal of the cross-validity method in the analyst model was implemented to mitigate the occurrence of patterns. The software developed by the scientists demonstrated a level of accuracy of up to 92.75%. The algorithms used in the study were k-NN, SVM with RBF kernel, and NB. The observed true-positive rate was 88%, whereas the false-positive rate was 90% and the false-negative rate was 85%. The PD diagnosis in persons was developed by Arroyo et al. (2021) using the analysis of key findings obtained from 26 distinct speech tests. The absence of a selection has been shown to render artificial neural network (ANNs) the most effective sequential method for PD analysis. Upon performing calibration on the brain tissue, the obtained test results exhibited an accuracy rate of 86.47%. To achieve optimal preparation and thorough inspection, a one-dimensional ANN was used. In addition to including more hidden layers, the present layers were augmented with an increased number of neurons. Numerous studies have shown that alterations in the outcome are possible and that the accuracy of the ANNs’ response is contingent upon the specific architecture of the ANNs.

In their study, Grover et al. (2018) used a deep neural network to make predictions about the severity of PD. The proposed methodology used exhibited a predictive accuracy of 81.66%, as shown by the motor-unified PD rating scale (UPDRS) score. Khoury et al. (2019) conducted a study in which they presented supervised diagnostic methods for PD using gait analysis. The techniques included in this set were k-NN, NB, SVM, RF, and classification and regression trees. Several unsupervised methods were used to conduct the investigation, including those already described. The algorithms that exhibited the greatest degree of accuracy were the k-NN, RF, and SVM models. Nilashi et al. (2020) used deep learning and clustering techniques. The researchers used a deep belief network (DBN) and support vector regression (SVR) to forecast the UPDRS. The use of self-organizing maps in the clustering procedure has shown an improvement in the precision of predictions. In the setting of a real-world dataset, it has been shown that clustering, DBN, and SVR exhibit superior performance in terms of prediction accuracy compared with other learning techniques. Ghaderyan and Fathi (2021) investigated gait markers in persons diagnosed with PD. The basic approach involves partitioning the signal and determining the most relevant segments to measure the discrepancy between limbs within the domain of a single value. The work conducted by Nilashi et al. (2019) used a hybrid methodology that included clustering, singular value decomposition, and adaptive neuro-fuzzy inference system approaches. The method presented has shown higher performance in terms of both detection accuracy and computing efficiency, exceeding the existing state-of-the-art techniques.

METHODOLOGY

The detection of PD is a significant challenge for clinicians, mostly attributable to the inherent complexities associated with its treatment. The use of artificial intelligence algorithms based on classification has significant importance in the evaluation of neurodegenerative conditions. The diagnosis of PD is predicated upon a comprehensive evaluation that encompasses an assessment of the patient’s family medical history, a thorough physical examination, and an investigation of the patient’s reaction to medicine. ML algorithms have been created as objective methods to assist in the diagnosis of PD.

Framework for the diagnosis of PD

The framework of the suggested system, as shown in Figure 1, is based on the use of ML methods with the objective of detecting PD.

Figure 1:

PD system based on ML algorithm. Abbreviations: ML, machine learning; PD, Parkinson’s disease.

Dataset

The identification of PD is a significant challenge for medical practitioners, mostly attributable to the inherent complexities associated with its treatment. Classification algorithms play a vital role in the evaluation of this neurodegenerative illness. The diagnosis of PD relies on a complete evaluation that includes an assessment of the patient’s familial medical history, a detailed physical examination, and an evaluation of their response to medication. ML algorithms have been developed as impartial techniques to aid in the diagnosis of PD. The speech signal dataset has a total of 195 voice samples, with 147 belonging to individuals diagnosed with PD and 48 from healthy individuals. The collection encompasses 23 distinct characteristics. Figure 2 displays the class of the dataset. Dataset is variable in this link https://archive.ics.uci.edu/dataset/174/parkinsons.

Figure 2:

Class of PD dataset. Abbreviation: PD, Parkinson’s disease.

Preprocessing

The presence of outliers in the empirical data might potentially lead to a distortion of the underlying distribution. Numerous statistical analyses encounter challenges when confronted with the skewed data. The presence of an outlier has the potential to significantly impact the statistical measures of a distribution. It is important to use caution when dealing with these extreme levels. The skewness method was used in the statistical analyses to assess the degree of asymmetry shown by the distributions. This statement elucidates the extent to which the bell curve diverges from a state of symmetry. Figure 3 displays the skewness values of mdvp_fo_hz feature is 0.5917.

Figure 3:

Skewness method for the mdvp_fo_hz feature.

Normalization

The normalization method is one features engineering should be applied to enhance the scaling of the data for increasing the accuracy of the ML approaches. The feature values in this approach are scaled to a range of 0-1. This process involves subtracting the minimum value of the feature from each individual value, followed by dividing the results by the range of the feature.

Machine learning

K-nearest neighbors

The simplest ML approach is k-NN, a nonparametric regression and classification method. No assumptions are needed for nonparametric data distribution. k-NN requires no assumptions. Classification and regression use k closest feature space training examples. Using k-NN for classification or regression influences the results. The most frequent class among its k-NN receives this data point. Usually, k is a small positive integer. In our algorithm, we have put k-values of algorithm’s (k=5) and the data point is assigned to its nearest neighbor’s class.

(1)

$E_{i} = \sqrt{(a_{1} - a) + {(b_{1} - b)}^{2}}$

where a ₁, a ₂, b ₁, and b ₂ are training data that are used to detect PD.

Support vector machines

SVMs are widely recognized as very effective ML algorithms used for tasks such as classification, regression, and outlier identification. The SVM classifier constructs a model that assigns newly observed data points to predefined categories. Therefore, it may be regarded as a binary linear classifier that operates without considering probabilities.

The SVM only used the hyperplanes for linear classification. A hyperplane refers to a decision boundary that is used to divide a particular group of data points that have distinct class labels. The SVM classifier employs a hyperplane that optimally separates data points by maximizing the margin. The hyperplane in question is often referred to as the maximum margin hyperplane, while the linear classifier it generates is commonly known as the maximum margin classifier.

Random forest algorithm

RF is well recognized as one of the prominent supervised ML algorithms for classification and regression tasks. The RF classifier is an ML algorithm that enhances the accuracy of predictions by aggregating many DTs trained on diverse subsets of the dataset. The RF is indicated that a greater abundance of trees contributes to enhanced strength and stability. The accuracy and problem-solving capabilities of an RF algorithm are enhanced by increasing the number of trees. Ensemble learning is a technique that leverages several classifiers to address complex issues and to improve the performance of the model.

(2)

$E n t r o p y = (S) = \sum_{i = 1}^{C} p_{i} l o g_{2} p_{i}$

(3)

$e n t r o p y (S | B) = \sum_{j = 1}^{j} \frac{| s_{i} |}{| s_{i} |} e n t r o p y (S_{i})$

(4)

$G a i n (S | B) = e n t r o p y (S) - e n t r o p y (S | B)$

where the S is the PD dataset, the label of data is indicated as C. P _i is the probability of label training data C.

Logistic regression

LR is a supervised ML methodology used to estimate the probability of an instance being classified into a certain class. Classification strategies use LR. The regression model utilizes a sigmoid function to estimate the likelihood of class membership based on the output of the linear regression function. Linear regression is a statistical modeling technique that produces a continuous output, allowing for a wide range of possible values. The LR is a predictive modeling method that aims to determine the probability of an instance belonging to a certain class.

AdaBoost boosting

The AdaBoost algorithm, a boosting ensemble classifier, was developed for the classification of real-time data. The iterative ensemble approach is known as AdaBoost. The AdaBoost classifier is an ML algorithm that leverages an ensemble approach by combining many weak classifiers to construct a strong classifier with improved accuracy. The AdaBoost algorithm operates by assigning weights to classifiers and iteratively training the data sample to effectively predict rare events.

EXPERIMENTS

This paper proposes the use of supervised ML techniques, such as k-NN, SVM, RF tree, LR, and AdaBoost boosting, for the purpose of diagnosing PD. Given that diagnosis requires only a limited number of clinical test results, this approach has the potential to reduce both the time and expense associated with screening for PD.

Performance metrics

The suggested methods for detecting PD is evaluated using performance measures, including accuracy, recall, F1-score, and precision.

(5)

$A c c u r a c y = \frac{T P + T N}{T P + F P + F N + T N} \times 100 %$

(6)

$S e n s i t i v i t y = \frac{T P}{T P + F N} \times 100 %$

(7)

$P r e c i s i o n = \frac{T P}{T P + F P} \times 100 %$

(8)

$Fscore = \frac{2 * p r e i s i o n *Sensitivity}{p r e i s i o n + Sensitivity} \times 100 %$

Experiment setting

The suggested study was executed using Python 3.8 inside the Jupyter notebook environment. In this study, we provide a comprehensive description of the experimental configuration used as well as the outcomes obtained from the implementation of four distinct ML classification techniques.

Results

Table 1 presents the outcomes of the logistics regression method, indicating that the accuracy achieved by LR is 86%. The LR method achieved 91% performance across all evaluation measures in the Parkinson class.

Table 1:

Results of the RL approach.

	Precision	Recall	F1-score	Accuracy
Normal	69	69	69	86
Parkinson’s	91	91	91
Weighted average	86	80	80

The findings of the k-NN algorithm are shown in Table 2. The analysis revealed that the k-NN technique achieved an accuracy rate of 92%. Additionally, the k-NN methodology demonstrated a precision metric of 100% in accurately recognizing instances of the normal class. The k-NN technique showed a significant level of accuracy in diagnosing PD, surpassing the detection rate of 90%.

Table 2:

Results of the k-NN approach.

	Precision	Recall	F1-score	Accuracy
Normal	100	62	76	92
Parkinson’s	90	100	95
Weighted average	95	81	86

Abbreviation: k-NN, k-nearest neighbors.

Table 3 presents a summary of the outcomes obtained by using SVM in the detection of PD. SVM, a robust ML technique, is well recognized for its effectiveness in classifying binary classes. The SVM technique demonstrated a remarkable accuracy of 95%, surpassing other ML algorithms. In the standard classroom setting, the SVM technique demonstrated a high level of accuracy, with a precision rate of 100%. Similarly, the SVM methodology yielded a high percentage of accuracy, surpassing 94% across all performance measurements in the PD classification.

Table 3:

Results of the SVM approach.

	Precision	Recall	F1-score	Accuracy
Normal	100	77	87	95
Parkinson’s	94	100	97
Weighted average	95	95	95

Abbreviation: SVM, support vector machine.

The outcome of the RF strategy is shown in Table 4, indicating that the RF approach achieved a notable accuracy level of 95%. It has been noted that the outcomes obtained from the RF algorithm are identical to those obtained from the SVM algorithm.

Table 4:

Results of the RF approach.

	Precision	Recall	F1-score	Accuracy
Normal	100	77	87	95
Parkinson’s	94	100	97
Weighted average	95	95	95

Abbreviation: RF, random forest.

Table 5 shows the result of the AdaBoost boosting approach for classifying and detecting PD. The weighted average AdaBoost boosting approach is 93% for all the evaluating metrics.

Table 5:

Results of the AdaBoost boosting approach.

	Precision	Recall	F1-score	Accuracy
Normal	91	77	83	93
Parkinson’s	94	98	96
Weighted average	93	93	93

Confusion metrics

The confusion matrix is a tabular representation that displays the many outcomes of k-NN, SVM, RF tree, LR, and AdaBoost boosting for classification problems’ predictions and findings, serving as a visual aid to comprehend the consequences. Figure 4 shows the confusion metrics of the logistic regression (RL0 and k-NN models for predicting the output of these algorithms. Investigations have revealed that the misclassification of the output of the LR approach is 5 and false negative is 4, whereas the misclassification of the k-NN approach is but the false negative is 5.

Figure 4:

Confusion metrics of LR and k-NN approaches: (a) LR model and (b) k-NN. Abbreviations: k-NN, k-nearest neighbors; LR, logistic regression.

The outputs of the SVM and RF algorithms are shown in Figure 5. It can be seen that the misclassification rate is 0 for both methods. However, it is worth mentioning that both algorithms exhibit a false-negative rate of 5. The confusion metrics AdaBoost boosting approach is presented in Figure 6.

Displays Confusion metrics of SVM and RF approaches

Figure 5:

Confusion metrics of SVM and RF approaches: (a) SVM and (b) RF. Abbreviations: RF, random forest; SVM, support vector machine.

Figure 6:

Confusion metrics of the AdaBoost approach.

DISCUSSION

PD has a global prevalence of around 6 million individuals. During the first phases of PD, when symptoms manifest intermittently and are inadequately characterized, practitioners who are not specialized in this field continue to face challenges in arriving at a definitive diagnosis. The lack of specialized knowledge resulted in a misdiagnosis rate of 25% among no specialists, leading to prolonged periods of untreated illness for affected individuals. There is a need for the development of a home-use early detection approach characterized by increased reliability and impartiality. A proposed system using ML algorithms for the early detection of PD becomes very important.

The results shown in Table 6 indicate that a majority of the leading models exhibit little overfitting. The SVM and RF algorithms stand out as exceptions in this study. Consequently, we recommend using SVM and RF algorithms for the diagnosis of PD.

Table 6:

Accuracy results of machine learning.

Models	Accuracy %
LR	86
k-NN	92
SVM	95
RF	95
AdaBoost boosting	93

Abbreviations: k-NN, k-nearest neighbors; LR, logistic regression; RF, random forest; SVM, support vector machine.

The use of the ROC curve is employed for the evaluation of performance in binary classification tasks. Hence, it shows the efficacy of the categorization model across various thresholds. The measure known as the area under the curve is used to quantify the discriminative capacity of a binary classifier. It does this by providing a summary of the ROC curve. Figure 7 depicts the ROC curve of the SVM technique. The SVM method achieved a notable accuracy rate of 88%.

Figure 7:

ROC of RF approach.

A correlation matrix is a Figure 8 representation of the correlation coefficients between variables. The presented matrix illustrates the link among all conceivable combinations of 23 PD features. Investigations have shown that most of the features have a high correlation.

Displays Correlation between PD features

Figure 8:

Correlation between PD features. Abbreviation: PD, Parkinson’s disease.

Table 7 presents a comparison of the performance of the proposed system with the coexisting PD systems using same dataset.

Table 7:

Evaluating the performance of the PD proposed system against the existing past research.

Authors	Dataset	Models	Accuracy (%)
Khan et al. (2018)	Same dataset	Wavelet neural networks	90
Benba et al. (2016a)	Same dataset	Support vector machine	82.50
Behroozi and Sami (2016)	Same dataset	KNN	70
Proposed model	Same dataset	Random forest tree	95

Abbreviation: PD, Parkinson’s disease.

CONCLUSION

The identification of PD helps researchers comprehend its underlying etiology. The detection of PD enables patients to promptly start therapeutic interventions. This research utilizes several ML methods, such as k-NN, SVM, RF, LR, and AdaBoost boosting models, to distinguish between individuals with PD and those under normal conditions. This study examined the efficacy of various ML classifiers, namely LR achieving an accuracy rate of 86%, k-NN achieving an accuracy rate of 92%, SVM achieving an accuracy rate of 95%, RF achieving an accuracy rate of 95%, and AdaBoost boosting achieving an accuracy rate of 93%. These approaches were able to distinguish those who were impacted from those who were healthy. ML classifiers have a strong performance when applied to speech data that involve the extraction of many phonetic characteristics. The early detection of PD has the potential to facilitate accurate diagnosis and mitigate the progression of debilitating symptoms.

This study has the potential to be applied to various ML techniques and datasets to enhance the accuracy of classifiers. In future investigations, researchers will use handwriting samples from individuals diagnosed with PD.

[1] Alfonsetti M, Castelli V, d’Angelo M. 2022. Are we what we eat? Impact of diet on the gut–brain axis in Parkinson’s disease. Nutrients. Vol. 14:380

[2] Ali L, Zhu C, Golilarz NA, Javeed A, Zhou M, Liu Y. 2019a. Reliable Parkinson’s disease detection by analyzing handwritten drawings: construction of an unbiased cascaded learning system based on feature selection and adaptive boosting model. IEEE Access. Vol. 7:116480–116489

[3] Ali L, Zhu C, Zhang Z, Liu Y. 2019b. Automated detection of Parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE J. Transl. Eng. Health Med. Vol. 7:1–10

[4] Alqahtani EJ, Alshamrani FH, Syed HF, Olatunji SO. 2018. Classification of Parkinson’s disease using NNge classification algorithmProceedings of the 2018 21st Saudi Computer Society National Computer Conference (NCC); Riyadh, Saudi Arabia. 25-26 April 2018; p. 1–7

[5] Amoroso N, La Rocca M, Monaco A, Bellotti R, Tangaro S. 2018. Complex networks reveal early MRI markers of Parkinson’s disease. Med. Image Anal. Vol. 48:12–24

[6] Anand A, Haque MA, Alex JSR, Venkatesan N. 2018. Evaluation of machine learning and deep learning algorithms combined with dimentionality reduction techniques for classification of Parkinson’s diseaseProceedings of the 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT); Louisville, KY, USA. 6-8 December 2018; p. 342–347

[7] Arroyo A, Periáñez JA, Ríos-Lago M, Lubrini G, Andreo J, Benito-León J, et al.. 2021. Components determining the slowness of information processing in parkinson’s disease. Brain Behav. Vol. 11(3):e02031

[8] Baby MS, Saji AJ, Kumar CS. 2017. Parkinson’s disease classification using wavelet transform based feature extraction of gait dataProceedings of the 2017 International Conference on Circuit, Power and Computing Technologies (ICCPCT); Kollam, India. 20-21 April 2017; p. 1–6

[9] Baggio HC, Abos A, Segura B, Campabadal A, Uribe C, Giraldo DM, et al.. 2019. Cerebellar resting-state functional connectivity in Parkinson’s disease and multiple system atrophy: characterization of abnormalities and potential for differential diagnosis at the single-patient level. Neuroimage Clin. Vol. 22:101720

[10] Behl T, Makkar R, Sehgal A, Sharma N, Singh S, Albratty M, et al.. 2022. Insights into the explicit protective activity of herbals in management of neurodegenerative and cerebrovascular disorders. Molecules. Vol. 27:4970

[11] Behroozi M, Sami A. 2016. A multiple-classifier framework for Parkinson’s disease detection based on various vocal tests. Int. J. Telemed. Appl. 2016. 6837498. [Cross Ref]

[12] Benba A, Jilbab A, Hammouch A. 2016a. Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people. Int. J. Speech Technol. Vol. 19:449–456

[13] Benba A, Jilbab A, Hammouch A. 2016b. Discriminating between patients with Parkinson’s and neurological diseases using cepstral analysis. IEEE Trans. Neural Syst. Rehabil. Eng. Vol. 24:1100–1108. [Cross Ref]

[14] Benba A, Jilbab A, Hammouch A, Sandabad S. 2016c. Using RASTA-PLP for discriminating between different neurological diseasesProceedings of the 2016 International Conference on Electrical and Information Technologies (ICEIT); Tangiers, Morocco. 4-7 May 2016; p. 406–409

[15] Bernad-Elazari H, Herman T, Mirelman A, Gazit E, Giladi N, Hausdorff JM. 2016. Objective characterization of daily living transitions in patients with Parkinson’s disease using a single body-fixed sensor. J. Neurol. Vol. 263:1544–1551. [Cross Ref]

[16] Berus L, Klancnik S, Brezocnik M, Ficko M. 2018. Novel discourse signal preparing calculations for high accuracy grouping of Parkinson’s illness. Biomed. Design. IEEE Trans. Vol. 59(5):1264–1271

[17] Bezard E, Gross CE, Brotchie JM. 2003. Presymptomatic compensation in Parkinson’s disease is not dopamine-mediated. Trends Neurosci. Vol. 26:215–221

[18] Bhati S, Velazquez LM, Villalba J, Dehak N. 2019. LSTM siamese network for Parkinson’s disease detection from speechProceedings of the 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP); Ottawa, ON, Canada. 11-14 November 2019; p. 1–5

[19] Bloem BR, Okun MS, Klein C. 2021. Parkinson’s disease. Lancet. Vol. 397:2284–2303

[20] Buongiorno D, Bortone I, Cascarano GD, Trotta GF, Brunetti A, Bevilacqua V. 2019. A low-cost vision system based on the analysis of motor features for recognition and severity rating of Parkinson’s disease. BMC Med. Inform. Decis. Mak. Vol. 19 Suppl 9:243

[21] Duffy JR. 2019. Motor Speech Disorders. https://www.elsevier.com/books/motor-speech-disorders/duffy/978-0-323-53054-5accessed on 1 October 2022

[22] Gaenslen A, Swid I, Liepelt-Scarfone I, Godau J, Berg D. 2021. The patients’ perception of prodromal symptoms before the initial diagnosis of Parkinson’s disease. Mov. Disord. Vol. 26:653–658

[23] Ghaderyan P, Fathi G. 2021. Inter-limb time-varying singular value: a new gait feature for Parkinson’s disease detection and stage classification. Measurement. Vol. 177:109249

[24] Grover S, Bhartia S, Yadav A, Seeja K. 2018. Predicting severity of Parkinson’s disease using deep learning. Procedia Comput. Sci. Vol. 132:1788–1794

[25] Gunduz H. 2019. Deep learning-based Parkinson’s disease classification using vocal feature setsIEEE Access. Vol. 7:p. 115540–115551. [Cross Ref]

[26] Joshi DD, Joshi HH, Panchal BY, Goel P, Ganatra A. 2022. A Parkinson disease classification using stacking ensemble machine learning methodology2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE); Greater Noida, India. p. 1335–1341. IEEE. New York:

[27] Khan MM, Mendes A, Chalup SK. 2018. Evolutionary wavelet neural network ensembles for breast cancer and Parkinson’s disease prediction. PLoS One. Vol. 13:e0192192

[28] Khodatars M, Shoeibi A, Sadeghi D, Ghaasemi N, Jafari M, Moridian P, et al.. 2021. Deep learning for neuroimaging-based diagnosis and rehabilitation of autism spectrum disorder: a review. Comput. Biol. Med. Vol. 139:104949. [Cross Ref]

[29] Khoury N, Attal F, Amirat Y, Oukhellou L, Mohammed S. 2019. Data-driven based approach to aid Parkinson’s disease diagnosis. Sensors. Vol. 19:242

[30] Kouba T, Illner V, Rusz J. 2022. Study protocol for using a smartphone application to investigate speech biomarkers of Parkinson’s disease and other synucleinopathies: SMARTSPEECH. BMJ Open. Vol. 12:e059871

[31] Mahlknecht P, Stockner H, Marini K, Gasperi A, Djamshidian A, Willeit P, et al.. 2020. Midbrain hyperechogenicity, hyposmia, mild parkinsonian signs and risk for incident Parkinson’s disease over 10 years: a prospective population-based study. Parkinsonism Relat. Disord. Vol. 70:51–54

[32] Mohammed MA, Elhoseny M, Abdulkareem KH, Mostafa SA, Maashi MS. 2021. A multi-agent feature selection and hybrid classification model for Parkinson’s disease diagnosis. ACM Trans. Multimed. Comput. Commun. Appl. Vol. 17:1–22

[33] Mostafa SA, Mustapha A, Mohammed MA, Hamed RI, Arunkumar N, Abd Ghani MK, et al.. 2019. Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. Vol. 54:90–99

[34] Nilashi M, Ibrahim O, Samad A, Ahmadi H, Shahmoradi L, Akbari E. 2019. An analytical method for measuring the Parkinson’s disease progression: a case on a Parkinson’s telemonitoring dataset. Vol. 136:545–557

[35] Nilashi M, Ahmadi H, Manaf AA, Rashid TA, Samad S, Shahmoradi L, et al.. 2020. Coronary heart disease diagnosis through self-organizing map and fuzzy support vector machine with incremental updates. Int. J. Fuzzy Syst. Vol. 22:1376–1388

[36] OkanSakar C, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE, et al.. 2019. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. Vol. 74:255–263

[37] Panda A, Bhuyan P. 2022. Machine learning-based framework for early detection of distinguishing different stages of Parkinson’s disease. Spec. Ugdym. Vol. 2:30–42

[38] Pont-Sunyer C, Hotter A, Gaig C, Seppi K, Compta Y, Katzenschlager R, et al.. 2015. The onset of nonmotor symptoms in Parkinson’s disease (the ONSET PD study). Mov. Disord. Vol. 30:229–237

[39] Postuma RB, Berg D, Stern M, Poewe W, Olanow CW, Oertel W, et al.. 2015. MDS clinical diagnostic criteria for Parkinson’s disease. Mov. Disord. Vol. 30:1591–1601

[40] Schrag A, Horsfall L, Walters K, Noyce A, Petersen I. 2015. Prediagnostic presentations of Parkinson’s disease in primary care: a case-control study. Lancet Neurol. Vol. 14:57–64

[41] Walter U, Kleinschmidt S, Rimmele F, Wunderlich C, Gemende I, Benecke R, et al.. 2012. Potential impact of self-perceived prodromal symptoms on the early diagnosis of Parkinson’s disease. J. Neurol. Vol. 260:3077–3085

[42] Wang ZL, Yuan L, Li W, Li JY. 2022. Ferroptosis in Parkinson’s disease: glia-neuron crosstalk. Trends Mol. Med. Vol. 28:258–269

[43] Wang L, Gao Z, Chen G, Geng D, Gao D. 2023. Low levels of adenosine and GDNF are potential risk factors for Parkinson’s disease with sleep disorders. Brain Sci. Vol. 13:200

[44] Yousefvand S, Hamidi F. 2020. Role of lateral hypothalamus area in the central regulation of feeding. Int. J. Pept. Res. Ther. Vol. 28:83

Journal of Disability Research

Developing System-based Voice Features for Detecting Parkinson’s Disease Using Machine Learning Algorithms

Abstract

Main article text

INTRODUCTION

BACKGROUND

METHODOLOGY

Framework for the diagnosis of PD

Dataset

Preprocessing

Normalization

Machine learning

K-nearest neighbors

Support vector machines

Random forest algorithm

Logistic regression

AdaBoost boosting

EXPERIMENTS

Performance metrics

Experiment setting

Results

Confusion metrics

DISCUSSION

CONCLUSION

REFERENCES

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article