INTRODUCTION
Emotion can be described as a mental state linked with the nervous system; that is, what an individual feels inside as the effects of the environment. The emotions of a person can be identified in many ways ( Nandwani and Verma, 2021). Some can be examined by body gestures, tonal properties, and facial expressions. The classification or computing of emotions from facial or speech expressions formed a significant part of human information processing ( Ahire and Borse, 2022). In the intellectual learning environment, emotion detection of learners’ images during class hours using computer and deep learning (DL) methods enables prompt monitoring of the emotional and psychological states of learners. Emotion detection using facial expression images needs high-quality cameras for capturing facial images, resulting in high application costs ( Zad et al., 2021). Hence, the speech-related human emotion detection approach has slowly become the principal approach to studying human–computer emotion detection. In expression and communication, the speech of humans does not have semantic data but implies rich data like the emotions of speakers ( Sailunaz et al., 2018). Thus, the study of emotion detection related to image and human speech using computer and intellectual methods of DL is of great significance ( Vasantharajan et al., 2022). Automated emotion detection is a significant research study that solves two subjects: artificial intelligence and human emotion recognition. The emotional state of a person can be gained from verbal and nonverbal data captured using the different sensors, for instance from facial changes, physiological signals, and tone of voice ( De and Mishra, 2022).
Face changes during an interaction are the initial signs that send the emotional status which is why many authors are very involved in this modality ( Kumar et al., 2022). Abstracting attributes from one face to another becomes a sensitive and tough task to have a superior classification. The automated FER is mostly studied by authors compared to other modalities of statistics, but it is not a simple task as all persons present their emotions in different ways. Various difficulties and encounters are presented in a zone that one must not be negligent like the gender, background, disparity of head poses, age, and luminosity, in addition to the issue of obstruction caused by skin illness, sunglasses, scarf, etc. ( Vijayvergia and Kumar, 2021). Numerous conventional approaches exist and are utilized for the abstraction of facial features like regular and texture features for instance local directional paradigms, local binary patterns, Gabor wavelet, and facial action units FAC ( Cui et al., 2022). Currently, DL is a very successful and effective method thanks to the outcome gained with its structure that allowed the automatic extraction of classification and features like RNN and CNN; this provoked authors to start utilizing this method for detecting human emotions ( Riza and Charibaldi, 2021). Various studies have been done by authors on the growth of a deep neural network structure that produces very reasonable outcomes in this area.
This study presents an emotion analysis approach using improved cat swarm optimization with machine learning (EA-ICSOML) technique. The EA-ICSOML technique applies the concepts of computer vision (CV) and DL to identify various types of emotions. For feature vector generation, the ShuffleNet model is used in this work. To adjust the hyperparameters related to the ShuffleNet approach, the ICSO algorithm is used. Finally, the recognition and classification of emotions are performed using the Transient Chaotic Neural Network (TCNN) approach. The performance validation of the EA-ICSOML technique is validated on facial emotion databases.
RELATED STUDIES
Modran et al. (2023) focused on forecasting and finding whether music has healing benefits. An ML method can be industrialized, utilizing a multi-class NN for categorizing emotions into four categories and forecasting the output. The NN has three layers: an output layer, an input layer that has many attributes, and a deeply connected hidden layer. To assess the estimator, K-fold cross validation has been utilized. Catania and Garzotto (2023) presented Emoty, a speech-related conversational agent devised for individuals with NDD for training emotional transmission skills. A characteristic of this agent is the expressive expression power of voice. Emoty engages users in minor chats that are requested to reprise sentences and direct some emotions with a suitable vocal tone.
Garcia-Garcia et al. (2022) presented a new software application as a somber game for teaching progenies with ASD to express and identify emotions. The mechanism incorporated cutting-edge technology for supporting new transmission mechanisms related to tangible user interface and emotion detection from facial expressions. Likewise, children communicate with mechanisms by grabbing substances with their hands and using their faces. Aoki et al. (2022) explored the impacts of speech balloon shapes on sender–receiver agreement concerning the emotionality of text messages. Depending on these outcomes, the author formed a system that generated speech balloons matching linear emotional arousal intensity by ACGAN.
In Hou (2022), a DL-related human emotion detection framework was devised to assess the probability of digital representation, detection, and prediction of feelings. The presented method analyzed the effect of emotional methods on multimodal detection. The study presents developing works that utilize present approaches like CNN for human emotion detection related to video, language, image, and sound physiological signals. While the discoveries gained are not a province, the evidence gathered specifies that DL can be adequate to categorize facial emotion. Mridha et al. (2022) developed an emotion detection mechanism related to landmarks. This study utilizes a CNN for recognizing facial feelings to understand the state of mind of impaired persons. Only wireless button-related transmission, as per the presented approach, can alert the caregiver that a handicapped individual needs anything.
THE PROPOSED MODEL
In this study, we have presented a novel EA-ICSOML technique for the emotion analysis process. The EA-ICSOML technique applies the CV and DL concepts to identify various types of emotions. The working process of the EA-ICSOML technique comprises ShuffleNet feature extraction, ICSO-based hyperparameter optimization, and TCNN-based classification. Figure 1 exemplifies the overall process of the EA-ICSOML algorithm.

Overall process of the EA-ICSOML approach. Abbreviation: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning.
Feature extraction: SqueezeNet
For feature vector generation, the ShuffleNet model is used in this work. Xiangyu Zhang et al. from Megvii ShuffleNet 2018 proposed a neural network structure devised for potential DL on mobile devices ( Ullah et al., 2022). It depends on pointwise group convolution that enables effective computation of mapping features by diminishing the operations needed for convolutional. The network structure embraces a shuffling procedure executed after pointwise group convolution that aids in the rise of the mixing of mapping feature across various channels. This enhances the capability of networks to increase their accuracy and capture complicated features. ShuffleNet has accomplished remarkable performance on benchmark datasets while requiring a low memory footprint and being computationally efficient.
Hyperparameter tuning: ICSO algorithm
For adjusting the hyperparameters related to the ShuffleNet approach, the ICSO algorithm is used. The CSO algorithm stimulated through the natural behaviors of felines to resolve complicated optimization problems ( Wang and Han, 2023). The model considers the cat location as the potential solution of the problems to be augmented, attains the estimated potential area of the solution through the prior data of problems, viz., potential area, abstracts the feline actions as a searching pattern, and searches for the optimum solution of problems that improved from the potential area with these patterns. The searching mode of CSO mainly comprises tracing and seeking components; where the amount of populations carrying out two modes is defined as the mixture ratio.
During the seeking mode, many crucial components are determined like counts of dimension to change (CDC), seeking memory pool (SMP), self-position considering (SPC), and seeking the range of the selected dimension (SRD). SMP is the number of random locations that are produced by all the cats; SRD characterizes the changing range of every cat in all the dimensions, and usually takes values within [0,1]; CDC shows the number of dimensions that change in all the cats, as well as take values within [0,1]; SPC defines whether the existing location of cat is a candidate location for the following iteration. These elements are Boolean values (0 or 1). Its location updating equation can be given as follows:
In Equation (1), L new shows the updated location of the cat; L o1d indicates the original location of the cat; Sr represents the SRD of a cat with values of 0 and 1; and r 1 denotes the arbitrarily created values in 0 and 1. Consider the SPC value of the present locality; if the SPC values are 1, retain the existing place and copy SMP-I copy of the existing place as the memory pool; on the other hand, copy SMP as the memory pool. Figure 2 presents the flowchart of the CSO algorithm.
For every cat in the memory pool, initially, randomly define the individual dimension where the variation takes place as per the CDC values; consequently, updating the position data.
Evaluate the fitness value for every individual.
Choose the next candidate point position through Roulette probability selection.
The tracking module corresponding to the local search of optimization method is the same as the location updating of the PSO, mostly by changing the individual velocity in all the dimensions to accomplish the location update as follows:
In Equation (2), r 2 indicates the randomly generated value, takes value from 0 to 1; L best,d represents the location of better individuals in the existing iteration in d dimension, V k,d indicates the velocity of kth individuals at d dimension, Lnewk,d denotes the location of the updated individuals at d dimension, L k,d shows the location of the kth individuals at d dimension; c represents the constant that is fixed to 2.05; Loldk,d shows the location beforehand the update.
The ICSO system assesses a fitness function (FF) to gain better classifier results. It defines a positive integer to exemplify the best result of candidate performances. During this case, the minimized classifier error rate was assumed to be FF, as expressed in Equation (3).
Emotion detection: TCNN model
Finally, the recognition and classification of emotions are performed using the TCNN model. The self-feedback term with the simulated annealing model was additional based on the Hopfield neural network (HNN), named as transient chaotic neural network (TCNN) and can be given as follows ( Chen et al., 2022):
In Equation (4), x i ( t) denotes the output of neurons, ε shows the steepness parameter of activation function ( ε > 0), y i ( t) shows the internal state of neurons, α represents a positive proportion parameter, ꞵ shows the annealing attenuation factor of z i ( t), z i ( t) refers to the self-feedback connection weight, k indicates the damping factor of the neural diaphragm (0 ≤ k ≤ 1), αε[0, ∞) denotes the effect of energy function on chaotic dynamics, and I 0 denotes the positive parameter.
A positive Lyapunov exponent shows that the method has chaotic features and the stronger the degree of chaos, the larger the Lyapunov exponent and is determined by the subsequent expression:
Set parameters: k = 1, β = 0.02, I 0 = 0.65, z(0) = 0.8, α = 0.07, and ε = 0.05. In an earlier phase of the development of the TCNN approach, z i ( t) proceeds a large primary value, and this method was chaotic. Meanwhile, z i ( t) continuously degenerates with time and still, it becomes 0; this method ensures inverse bifurcation transition and decays to HNN with gradient convergence.
The TCNN model is used for mapping the main function of problems as to the energy function of networks and later changing the dynamics of networks as the objective function. Once the network converges toward the stable point, the neuron output becomes the suboptimal or optimal solution to the problem:
Kwok and Smith developed a modified energy function using the following equation:
For i, j = 1,2,…, N, N refers to the neuron counts, H denotes the energy value of the self-feedback term, and its selective procedure defines the variation feature of chaotic dynamics. X j ( t) indicates the output of the ith neurons at t time, I i shows the threshold of ith neurons, W ij represents the weight connected between neuron ith and jth neurons, f −1(·) shows the inverse function of the activation function, τ i signifies the time constant of the ith neuron, and H denotes the further energy term.
RESULTS AND DISCUSSION
In this section, the emotion recognition results of the presented approach are tested on the CK+ Database. The suggested technique is put under simulation by employing the Python 3.6.5 tool on PC i5-8600k, 250GB SSD, GeForce 1050Ti 4GB, 16GB RAM, and 1TB HDD. The setups of the parameters are as follows: learning rate: 0.01, activation: ReLU, epoch count: 50, dropout: 0.5, and size of the batch: 5.
In Table 1, the overall emotion detection outcomes of the EA-ICSOML technique are demonstrated. Figure 3 represents the results of the EA-ICSOML technique on 80% of the TRP. The results indicate that the EA-ICSOML technique recognizes all types of emotions. In the anger class, the EA-ICSOML technique attains accu y , prec n , reca l , F score , and AUC score of 98.93%, 97.30%, 94.74%, 96%, and 97.16%, respectively. Meanwhile, in the contempt class, the EA-ICSOML method reaches accu y , prec n , reca l , F score , and AUC score of 98.93%, 96.34%, 96.34%, 96.34%, and 97.86%, correspondingly. Eventually, in the fear class, the EA-ICSOML approach accomplishes accu y , prec n , reca l , F score , and AUC score of 98.93%, 96.30%, 96.30%, 96.30%, and 97.83%, respectively. Concurrently, in the happy class, the EA-ICSOML system achieves accu y , prec n , reca l , F score , and AUC score of 99.82%, 100%, 98.73%, 99.36%, and 99.37%, correspondingly. Finally, in the sadness class, the EA-ICSOML method attains accu y , prec n , reca l , F score , and AUC score of 98.75%, 96.15%, 94.94%, 95.54%, and 97.16%, correspondingly.
Emotion detection outcome of the EA-ICSOML algorithm on 80:20 of TRP/TSP.
Class | Accu y | Prec n | Reca l | F score | AUC score |
---|---|---|---|---|---|
Training phase (80%) | |||||
Anger | 98.93 | 97.30 | 94.74 | 96.00 | 97.16 |
Contempt | 98.93 | 96.34 | 96.34 | 96.34 | 97.86 |
Fear | 98.93 | 96.30 | 96.30 | 96.30 | 97.83 |
Disgust | 99.64 | 98.75 | 98.75 | 98.75 | 99.27 |
Happy | 99.82 | 100.00 | 98.73 | 99.36 | 99.37 |
Surprise | 98.93 | 94.25 | 98.80 | 96.47 | 98.87 |
Sadness | 98.75 | 96.15 | 94.94 | 95.54 | 97.16 |
Average | 99.13 | 97.01 | 96.94 | 96.97 | 98.22 |
Testing phase (20%) | |||||
Anger | 99.29 | 100.00 | 95.83 | 97.87 | 97.92 |
Contempt | 99.29 | 100.00 | 94.44 | 97.14 | 97.22 |
Fear | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Disgust | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Happy | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Surprise | 99.29 | 94.44 | 100.00 | 97.14 | 99.59 |
Sadness | 99.29 | 95.45 | 100.00 | 97.67 | 99.58 |
Average | 99.59 | 98.56 | 98.61 | 98.55 | 99.19 |
Abbreviation: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning.

Emotion detection outcome of the EA-ICSOML algorithm on 80% of TRP. Abbreviation: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning.
Figure 4 signifies the outcomes of the EA-ICSOML method on 20% of the TSP. The outcomes inferred that the EA-ICSOML technique recognizes all types of emotions. In the anger class, the EA-ICSOML methodology gains accu y , prec n , reca l , F score , and AUC score of 99.29%, 100%, 95.83%, 97.87%, and 97.92%, correspondingly. In the meantime, in the contempt class, the EA-ICSOML system achieves accu y , prec n , reca l , F score , and AUC score of 99.29%, 100%, 94.44%, 97.14%, and 97.22%, correspondingly. Followed by, in the fear class, the EA-ICSOML algorithm achieves accu y , prec n , reca l , F score , and AUC score of 100%, 100%, 100%, 100%, and 100%, correspondingly. Simultaneously, in the happy class, the EA-ICSOML system attains accu y , prec n , reca l , F score , and AUC score of 100%, 100%, 100%, 100%, and 100% respectively. Lastly, in the sadness class, the EA-ICSOML approach reaches accu y , prec n , reca l , F score , and AUC score of 99.29%, 95.45%, 100%, 97.67%, and 99.58%, correspondingly.

Emotion detection outcome of the EA-ICSOML system on 20% of TSP. Abbreviation: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning.
Figure 5 exemplifies the outcomes of the EA-ICSOML technique on 80:20 of TRP/TSP. The outcomes referred to the EA-ICSOML methodology recognizing all types of emotions. With 80% of TRP, the SCADL-AFD system reaches average accu y , prec n , reca l , F score , and AUC score of 99.13%, 97.01%, 96.94%, 96.97%, and 98.22%. Afterwards, with 20% of TSP, the SCADL-AFD method realizes average accu y , prec n , reca l , F score , and AUC score of 99.59%, 98.56%, 98.61%, 98.55%, and 99.19%.

Average outcome of the EA-ICSOML system on 80:20 of TRP/TSP. Abbreviation: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning.
Figure 6 inspects the accuracy of the EA-ICSOML method in the training and validation procedure on the test database. The result implies that the EA-ICSOML approach gains higher accuracy values over maximum epochs. Moreover, the enhanced validation accuracy over training accuracy outperforms that the EA-ICSOML approach learns effectively on the test database.

Accuracy curve of the EA-ICSOML approach. Abbreviation: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning.
The loss curve of the EA-ICSOML system at the time of training and validation is displayed on the test database in Figure 7. The outcomes show that the EA-ICSOML approach attains near values of training and validation loss. The EA-ICSOML approach learns capably on a test database.

Loss curve of the EA-ICSOML approach. Abbreviation: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning.
In Table 2 and Figure 8, the overall comparative outcome of the EA-ICSOML approach is provided. The outcome inferred that the EA-ICSOML method attains higher outcomes under all measures. Concerning accu y , the EA-ICSOML system reaches a higher accu y of 99.59% while the Gaussian NB, QDA, random forest (RF), MLP, support vector machine (SVM), and KNN approaches offer lesser accu y of 84%, 86%, 89%, 94%, 94%, and 97%. Similarly, based on prec n , the EA-ICSOML system attains a maximum prec n of 98.56% while the Gaussian NB, QDA, RF, MLP, SVM, and KNN methods provide lesser prec n of 84%, 85%, 90%, 94%, 94%, and 97%.
Comparative outcome of the EA-ICSOML approach with other systems.
Classifier | Accu y | Prec n | Reca l | F score |
---|---|---|---|---|
Gaussian NB | 84.00 | 84.00 | 84.00 | 84.00 |
QDA model | 86.00 | 85.00 | 86.00 | 85.00 |
Random forest | 89.00 | 90.00 | 89.00 | 88.00 |
MLP model | 94.00 | 94.00 | 94.00 | 94.00 |
SVM model | 94.00 | 94.00 | 94.00 | 94.00 |
KNN model | 97.00 | 97.00 | 97.00 | 97.00 |
EA-ICSOML | 99.59 | 98.56 | 98.61 | 98.55 |
Abbreviations: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning; SVM, support vector machine.

Comparative outcome of the EA-ICSOML approach with other systems. Abbreviations: EA-ICSOML, emotion analysis approach using improved cat swarm optimization with machine learning; SVM, support vector machine.
Besides, in terms of reca l , the EA-ICSOML approach attains a superior reca l of 98.61% while the Gaussian NB, QDA, RF, MLP, SVM, and KNN systems offer minimal reca l of 84%, 86%, 89%, 94%, 94%, and 97%. Finally, based on F score , the EA-ICSOML approach accomplishes a maximum F score of 98.55% while the Gaussian NB, QDA, RF, MLP, SVM, and KNN models offer decreased F score of 84%, 85%, 88%, 94%, 94%, and 97%.
CONCLUSION
In this manuscript, we have presented a novel EA-ICSOML system for the emotion analysis process. The EA-ICSOML technique applies the concepts of CV and DL concepts to identify various types of emotions. The working process of the EA-ICSOML technique comprises ShuffleNet feature extraction, ICSO-based hyperparameter optimization, and TCNN-based classification. For feature vector generation, the ShuffleNet model is used in this work. To change the hyperparameters compared to the ShuffleNet approach, the ICSO algorithm was used. Finally, the recognition and classification of emotions take place using the TCNN model. The simulation result of the EA-ICSOML system was validated on facial emotion databases. The experimental analysis inferred the improved emotion recognition results of the EA-ICSOML algorithm compared to other recent models in terms of different evaluation measures.