INTRODUCTION
Pain intensity detection is crucial for rehabilitation facilities for the disabled as it influences the rehabilitation process and patient well-being ( Bargshady et al., 2020). To maximize the benefits of rehabilitation, it is essential to identify the pain intensity along with suitable measures to alleviate it ( Hassan et al., 2019; Miao et al., 2019; Thiam et al., 2019). In addition, measuring the pain intensity enables prompt identification of potential issues or secondary ailments that may develop throughout the rehabilitation process. As a defense mechanism toward localized pain along with additional normal responses, including facial emotions, the brain may initiate reflex movements while a specific portion of the body experiences pain. Pain thresholds, past experiences, and emotions influence pain response ( Mohan et al., 2020). It may manifest as either acute or chronic. Severe illness or injury results in acute intense pain ( Xu et al., 2019). In contrast, chronic pain persists and diminishes the quality of life for individuals ( Xu et al., 2019). It may influence mental health and emotional well-being. Physicians regularly depend on patient self-reporting to evaluate pain in the clinic ( Ahmed et al., 2021). Pain self-reporting utilizes numerical ratings and visual analog scales (VASs), which rely on effective patient–doctor communication. Patients may encounter challenges during the procedure due to age, cultural barriers, communication challenges, and cognitive limitations ( Ahmed et al., 2020).
Emotions conveyed by the face are multifaceted, dynamic, and frequently challenging to interpret ( Bellamkonda and Gopalan, 2020). A significant area of computer vision study focusing on pattern recognition and image processing is the automated identification of pain intensity identification using facial expressions ( Hu et al., 2019). The research community has access to a large number of databases that include facial expressions. These databases serve as essential instruments for assessing various algorithms for face expression recognition.
Accurately recognizing and addressing the level of pain has a beneficial impact on the psychological and social components of rehabilitation, leading to a decrease in tension and anxiety ( Zhang et al., 2020). Pain intensity evaluation in rehabilitation facilitates a patient-centric methodology, encouraging patients to engage actively in treatment options ( Taati et al., 2019). It promotes cooperation between healthcare practitioners and patients in attaining rehabilitation objectives. Rehabilitation centers enhance the overall effectiveness of the rehabilitation process by controlling pain intensity ( Kola and Samayamantula, 2021). Rehabilitation outcomes improve the ability of impaired persons to operate independently and boost their overall quality of life.
In order to determine the efficacy of medical therapies and to avoid the development of chronic pain, it is essential to evaluate and monitor the levels of pain experienced by individuals. Traditional pain assessment approaches, such as patient self-reporting or medical staff observations, are subjective and inaccurate, making it hard to monitor pain levels in individuals, especially those with difficulties in communicating ( Fan et al., 2020; Liu et al., 2020; Chen and Joo, 2021). In addition, the lack of trained observers renders pain assessments extremely subjective, which may render it difficult to maintain a consistent monitoring schedule. There is an increasing demand for automated pain monitoring systems that precisely identify pain levels. These systems are essential in hospitals and homes for elderly and disabled patients. The implementation of a reliable and unbiased pain assessment system can assist patients in receiving appropriate pain treatment and minimize excessive pain and discomfort.
Convolutional neural networks (CNNs) have emerged as highly efficient methods for analyzing images, making them especially appropriate for facial expression identification ( Park et al., 2021; Gkikas and Tsiknakis, 2023). Deep learning (DL)-based pain recognition models employing facial images have shown promising results. However, a few challenges and limitations have to be addressed. Training DL models for pain recognition demands significant, well-annotated datasets ( Park et al., 2021; Gkikas and Tsiknakis, 2023). The acquisition of such datasets could be resource-intensive, especially when dealing with a variety of pain conditions. Facial image resolution and light exposure affect the model’s performance. Low-quality images may lead to erroneous predictions. The nature and severity of pain are ever-changing. For chronic or acute pain, static image models may fail to convey temporal dynamics ( Lopez-Martinez et al., 2019; Werner et al., 2019; Al-Qerem, 2020). Using prediction algorithms, it is possible to categorize the degrees of pain intensity. The existing pain intensity detection models produced promising results using real-time images ( Al-Qerem, 2020). The lack of data adversely affects the prediction algorithm’s performance due to the highly sensitive nature of medical conditions. To effectively diagnose the source of pain, design an effective treatment plan, and calculate the drug dose, physicians have to assess the patient’s pain level of severity ( Lucey et al., 2011; Li et al., 2021; Vu et al., 2021). Physicians typically employ specialized scales and the patient’s self-report to assess pain severity. The visual analog measure (VAS) is the most popular pain measure for assessing pain intensity. It entails a pain intensity scale of 0 to 10 ( Mavadati et al., 2013). In this context, a value of 0 indicates the complete absence of pain, whereas 10 corresponds to the highest level of terrible pain.
The perception of pain is highly subjective, and conventional approaches to evaluating pain typically depend on individuals’ self-reporting. For the purpose of clinical decision-making, including the determination of suitable treatment plans and treatments, an accurate evaluation of the degree of pain is crucial ( Mavadati et al., 2013). Human pain interpretation may be biased, and healthcare experts may rate pain severity using unique approaches ( Hasani et al., 2021). DL models trained on varied datasets can reduce bias and generalize pain assessments ( Hasani et al., 2021). Addressing the limitations of traditional pain assessment methods, providing objective measures, supporting clinical decision-making, and improving healthcare pain management motivate this study to develop a DL-based pain intensity identification model. The study’s contributions are:
A hybrid feature engineering technique for extracting the crucial pain features from the facial images.
A pain identification model (PIM) using the fine-tuned LightGBM model.
The proposed study is divided as follows: the Introduction section describes the application of DL-based PIMs. The features and limitations of the existing studies are discussed in the Literature Review section. The Materials and Methods section presents the proposed methodology. The experimental results are discussed in the Results and Discussion section. Lastly, the Conclusion section concludes the study by addressing its contributions toward healthcare and rehabilitation.
LITERATURE REVIEW
The development of computer vision algorithms has led to the exploration of using visual cues to assess pain. These visual cues include facial expressions, body movements, and physiological signs. Significant characteristics can be extracted from these data using computer vision algorithms. Patient emotions and reflexes may be used to automatically recognize pain severity using physiological signals or facial expressions. Lopez-Martinez et al. (2019) offered two-stage learning for a 10-level automated estimation. In the initial phase, a recurrent neural network is used to estimate the Prkachin and Solomon Pain Intensity (PSPI) score. The outcomes of this phase are subsequently incorporated into the next phase. In the following phase, the PSPI score of each individual is used to estimate the VAS through the hidden conditional random fields. The model is customized by including a unique score for each person based on facial expression. The dataset comprises 25 patients with shoulder discomfort whose faces were captured during movements in both afflicted and unaffected arms.
Al-Qerem (2020) proposed a machine learning model for pain assessment. Data augmentation techniques supported the model to generate an exceptional outcome using the Denver Intensity of Spontaneous Facial Action (DISFA) dataset. Similarly, Li et al. (2021) utilized the DISFA dataset to train a CNN model for facial action unit (AU) detection. An attention map based on facial landmarks was used to improve the performance of the CNN model. Transfer learning, attention coding, and region of interest were used to detect the facial position and orientation.
Vu et al. (2021) employed a CNN-long short-term memory (LSTM) for identifying pain using facial expressions. They discussed the significance of the computational complexity in PIM development. They applied Inception and ResNet models to generate the key features. The LSTM model was used to find the temporal relation in the video frames. The model was evaluated using the University of Northern British Columbia McMaster Shoulder Pain Expression Archive database (UNBC-MSPE) ( Lucey et al., 2011) and DISFA ( Mavadati et al., 2013) datasets.
Wongpatikaseree et al. (2021) developed a model for recognizing the facial AU. They employed feature engineering and bottleneck features to improve the PIM performance. The support vector machine model was used for feature classification. It used histograms of directional gradients, facial landmarks, distance, and angle to classify the features.
Zhi et al. (2021) built a PIM using a CNN model. They used temporal dynamic-based 3D convolutional neural network with grid of neurons for video analysis. An evolutionary framework was employed to optimize the deep neural network. The discriminative power coefficients derived the correlation between the AU of videos and emotions. An adaptive subsequence matching algorithm was used to evaluate the similarity between AU sequences. The performance evaluation was performed using the DISFA dataset.
Nerella et al. (2022) collected an annotated pain-related AU dataset. The dataset contains 55,085 images of critically ill individuals. An open-source facial behavior analysis tool was generalized using the dataset. However, the experimental findings outlined that the tool has obtained an average F1-score (F1) of 0.77.
Barua et al. (2022) developed a PIM using transfer learning and shutterblind models. They used the weights of the Darknet 19 model for feature extraction. An iterative neighborhood component analysis is used to select the intricate features. A 10-fold cross-validation was employed to improve the K-nearest neighbor classifier. UNBC-MSPE and DISFA datasets were used for the model generalization. The experimental findings revealed the importance of feature engineering in pain identification.
Çelik (2023) built a Znet model based on CNN for pain assessment. They compared the model with six pretrained models. The experimental outcomes highlighted that Znet obtained an accuracy, precision, and recall of 99.54%, 64.4%, and 63.4%, respectively.
In controlled contexts, DL models may show promise; however, their implementation in clinical situations is crucial. There is an ongoing challenge in adapting models to a wide variety of clinical settings and patient groups. There is a lack of comprehensive datasets that encompass a variety of pain levels, degrees of intensity, and expressions. To generalize models, datasets should contain diverse ages, ethnicities, and socioeconomic statuses.
MATERIALS AND METHODS
The authors propose a hybrid feature engineering technique and image classification based on the LightGBM model. A liquid neural network (LNN)-based image extractor is designed to extract facial images from video files. The hybrid feature engineering technique generates the critical features of the facial expressions. The proposed image classification detects the pain intensity using facial images. DenseNet’s dense connection and feature reuse improve feature extraction by optimizing information flow and gradient propagation. The parameter efficiency of DenseNet is achieved by sharing features across layers. In comparison to conventional models, the number of parameters has been substantially decreased, resulting in a significantly greater degree of computing efficiency. MobileNet efficiently extracts features using depthwise separable convolutions, utilizing a simplified model architecture, and successfully capturing features at various dimensions. These two architectures have shown their effectiveness in various computer vision applications. These models address the challenges associated with feature learning and improve the efficiency of image classification models. Thus, the authors employ DenseNet and MobileNet models for feature extraction.
Data acquisition
The DISFA dataset has been extensively utilized in academic research on facial expressions, emotion identification, and affective computing. It has helped create new algorithms and methods for facial dynamics analysis and emotion inference from facial expressions. It contains several video sequences of varied persons of various ages, genders, and ethnic groups. This variety makes the dataset more generalizable and lets researchers analyze cross-cultural and demographic face expressions. DISFA is ideal for benchmarking and assessing face expression analysis algorithms because of its ground truth annotations. Researchers can train and test DL models for facial AU detection, emotion identification, and affective computing using the dataset. The authors generalize the proposed model using the DISFA dataset ( Mavadati et al., 2013). It consists of 27 videos with 130,788 images. Each video frame was coded from 0 (no pain) to 5 (maximum pain intensity). The dataset owner recruited 27 adults (12 women and 15 men) aged 18 to 50. In this study, the authors follow Barua et al. (2022) to determine the pain intensity. They employed the PSPI scale in order to classify the facial expressions. The authors applied image augmentation techniques to produce diverse images. Figure 1 offers the proposed PIM using the PSPI scale.
Image extraction
LNNs are recurrent neural network architectures inspired by biological neural networks ( Hasani et al., 2021). Their reservoir computing method uses a large dynamical recurrent network (the “liquid”) as a fixed feature extractor and a readout layer for classification or prediction. As recurrent neural networks are parallel, LNNs may process input data across numerous reservoir neurons or units ( Hasani et al., 2018). Parallel processing lowers the computation time compared to sequential designs, including feedforward neural networks. LNN is effective for object identification, image classification, and video analysis, where spatial and temporal patterns in input data are crucial ( Wongpatikaseree et al., 2021; Yin et al., 2021; Bidollahkhani et al., 2023). With LNN parallel processing, distributed representations, and temporal dynamics, researchers can develop efficient and effective image processing solutions with reduced computation time and better performance compared to traditional architectures. In order to extract unique facial images, the authors applied the LNN model. The dynamic connectivity patterns of LNN extract meaningful images. The connectivity matrix reveals network strength and pattern. Liquid dynamics assist LNNs in handling nonstationary data, resisting noise, and exploring varied solution spaces. By allowing for structural flexibility, LNNs promote the exploration of solution space. The dynamic connection patterns allow the network to explore several paths, perhaps uncovering innovative techniques for image extraction. The fluid-like behavior prevents irrelevant information and improves the generalization of the proposed PIM in the real-time environment. The authors use a sigmoid activation function with liquid time constant networks value of 6.00 ± 4.16, neural ordinary differential equations of 0.24 ± 0.01, and constant time recurrent neural network of 4.12 ± 2.71 for the image extraction. Equation 1 shows the computational form of the LNN implementation.
where images are the individual’s facial images, N is the video size, and O is the image’s output size.
Feature engineering
The authors build a CNN model with four convolution layers. They used the DenseNet 201 model’s weight in order to extract the features. The images are resized into 224 × 224. The enhanced depth enables the DenseNet 201 model to capture intricate hierarchical features of the facial images. Each convolutional layer is accompanied by batch normalization and rectified linear network (ReLU) activation functions, which assist in stabilizing and nonlinearizing the network. The amount of feature mappings added to each dense block layer depends on the growth rate. The network can generate more feature maps with a faster growth rate, boosting its expressive ability. Transition blocks decrease the spatial dimensions of the feature maps between dense blocks. This entails the use of a composite of a 1 × 1 convolutional layer with global average pooling. Reducing the number of spatial dimensions allows for better management of parameter expansion. The model employs a global average pooling layer to consolidate all feature maps into a single vector per channel. Additionally, class activation map and quantization aware training (QAT) strategy are employed to detect the meaningful features with limited computational resources. Figure 2 highlights the feature extraction using the DenseNet 201 model.

The suggested feature extraction based on the DenseNet 201 model. Abbreviation: QAT, quantization aware training.
Designing neural networks with convolutional layers subsequent to fully connected layers requires the flattened layer. These CNN layers facilitate the seamless transfer of data across multiple levels, guaranteeing that the network can efficiently acquire hierarchical characteristics from the images. A flattened layer with a reshape function is employed for converting the features into a two-dimensional (2D) array. In addition, a fully connected network (FCN) layer with a softmax function is used for classifying the features. Equation 2 shows the mathematical form of the DenseNet 201-based feature extraction.
Furthermore, the MobileNet V3 model is used for feature generation. A CNN model is developed with five convolution layers, batch normalization, and ReLu. The final set of layers are trained with the MobileNet V3-Small model. In order to enhance the network’s information flow, MobileNet V3 uses inverted residuals. Figure 3 highlights the suggested MobileNet V3-based feature generation.
A lightweight depthwise separable convolution is added at the beginning of each block, followed by a linear bottleneck layer that increases the total number of channels. The use of inverted residuals aids in the effective capture and preservation of crucial information. The design integrates efficient construction pieces, such as squeeze-and-excitation blocks, that enhance the network’s ability to concentrate on significant channels and improve feature representation. A global average pooling layer is used to decrease the spatial dimensions of the feature maps to a vector. In addition, the authors integrated a flattened layer and reshape function for converting the images into a 2D array. In addition, the authors applied the QAT strategy to improve the performance of the MobileNet V3 model. Equation 3 presents the computational form of the MobileNet V3 model-based feature extraction.
LightGBM-based image classification
The FCN layers are replaced with the LightGBM model. Due to its speed and efficiency, LightGBM is commonly employed for the identification of pain using the key features. LightGBM uses a leaf-wise growing approach. A more effective and faster training procedure is achieved by increasing the tree by selecting the leaf with the highest delta loss. LightGBM utilizes the gradient-based one-side sampling (GOSS) approach to lower the amount of data points during the training procedure. GOSS preserves examples with significant gradients and randomly subsamples instances with minor gradients. This minimizes computing time by concentrating on the most valuable data points. To avoid overfitting, LightGBM employs regularization methods like L1 and L2 regularization. It uses shrinkage to enhance the model resilience and generalize to new data by multiplying each tree’s predictions by a minimal learning rate. In order to fine-tune the LightGBM model, the authors employed the random search algorithm. Using the iterative training, the random search algorithm optimizes the learning rate, tree growth, and regularization strength of the LightGBM model.
Performance evaluation
Accuracy, precision, recall, and F1 are used to evaluate the PIM, including DL-based models. The percentage of occurrences that were accurately predicted relative to the total instances is known as accuracy. Accuracy (Acc) provides an aggregated evaluation of the model’s performance across all classes. The ratio of accurately predicted positives to total expected positives is called precision (Pre). It measures how often projected positives are true. Recall (Rec) is the proportion of expected positives to actual positives. It quantifies the accuracy of adequately predicting the positive cases. The harmonic mean of accuracy and recall is F1. F1 balances accuracy and recall, particularly when classes diverge. Common metrics for assessing classification models, such as DL-based models, include Cohen’s kappa (κ) and Matthews correlation coefficient (MCC). Cohen’s kappa takes the likelihood of coincidental agreement between projected and actual labels into account in order to quantify the degree of agreement. It is particularly advantageous when working with datasets with unequal data point distribution. MCC is a metric used to evaluate the performance of binary classification models by considering the number of true positives, true negatives, false positives, and false negatives. It is especially advantageous for handling datasets with unequal data point distribution.
RESULTS AND DISCUSSION
In this section, the authors offer the experimental analysis outcomes of the proposed model. They implemented the model using Windows 10 Professional, i7 Processor, 8 GB RAM, and NVIDIA GeForce RTX 3050 environment. The dataset is divided into a train set (70%) and a test set (30%). PyTorch, Keras, and TensorFlow libraries are used for the model development. A total of 12 batches and 14 epochs are used for training the model. The performance analysis outcome is presented in Table 1. The PIM obtained an exceptional outcome for each class. The feature engineering process has assisted the PIM to produce better results. In addition, the image augmentation technique supported the suggested model to overcome the challenges of identifying pain from facial expressions. Figure 4 highlights the performance of the PIM.

Performance analysis outcomes. Abbreviations: DISFA, Denver Intensity of Spontaneous Facial Action; MCC, Matthews correlation coefficient; PSPI, Prkachin and Solomon Pain Intensity.
Performance analysis—multiple class classification.
Classes | Acc | Pre | Rec | F1 | κ | MCC |
---|---|---|---|---|---|---|
PSPI = 0 | 97.2 | 96.9 | 97.2 | 97.0 | 95.8 | 93.4 |
PSPI = 1 | 98.5 | 97.1 | 96.8 | 96.9 | 94.7 | 94.6 |
2 ≤ PSPI ≤ 3 | 97.6 | 96.8 | 97.2 | 97.0 | 97.7 | 94.3 |
PSPI > 3 | 98.1 | 97.5 | 96.8 | 97.1 | 96.7 | 95.1 |
Average | 97.8 | 97.0 | 97.0 | 97.0 | 96.2 | 94.3 |
Abbreviations: MCC, Matthews correlation coefficient; PSPI, Prkachin and Solomon Pain Intensity.
Table 2 outlines the findings of the batch-wise performance analysis. It shows fine-grained batch variances and global patterns over epochs. The proposed PIM batch-wise and epoch-wise performance analysis reveals the model’s training behavior. There is a significant performance improvement in real-time progress monitoring, optimizing hyperparameters, early detection of difficulties, and informed decision-making for model refinement and selection. It is evident that the proposed model addressed the existing challenges in detecting pain intensity.
Performance analysis based on epochs and batches.
Batches/Epochs | Acc | Pre | Rec | F1 | κ | MCC |
---|---|---|---|---|---|---|
4/3 | 92.3 | 90.1 | 90.5 | 90.3 | 94.1 | 93.2 |
6/5 | 94.1 | 92.7 | 92.1 | 92.4 | 94.5 | 91.8 |
8/7 | 94.8 | 93.1 | 93.5 | 93.3 | 93.7 | 92.4 |
10/8 | 95.1 | 94.5 | 93.8 | 94.1 | 94.2 | 95.1 |
12/9 | 97.8 | 97.0 | 97.0 | 97.0 | 96.2 | 94.3 |
Abbreviation: MCC, Matthews correlation coefficient.
Table 3 outlines the findings of the experimental analysis using the DISFA dataset. The recommended model outperformed the recent models by obtaining exceptional results. The outcomes suggest that the PIM can be deployed in healthcare and rehabilitation centers with limited computational resources. Figure 5 represents the findings of the comparative analysis.
Comparative analysis.
Models | Acc | Pre | Rec | F1 | κ | MCC |
---|---|---|---|---|---|---|
PIM | 97.8 | 97.0 | 97.0 | 97.0 | 96.2 | 94.3 |
Barua et al. (2022) | 96.7 | 96.1 | 96.4 | 96.2 | 93.2 | 91.8 |
Vu et al. (2021) | 91.4 | 89.4 | 90.4 | 89.9 | 88.7 | 81.3 |
Zhi et al. (2021) | 93.4 | 91.3 | 86.4 | 88.7 | 89.4 | 85.4 |
Li et al. (2021) | 92.1 | 91.8 | 90.8 | 91.3 | 85.4 | 93.8 |
EfficientNet B7 | 95.3 | 94.5 | 95.1 | 94.8 | 91.7 | 90.6 |
Abbreviations: MCC, Matthews correlation coefficient; PIM, pain identification model.
Understanding prediction uncertainty improves decision-making. The results of the uncertainty analysis are listed in Table 4. It is evident that PIMs produced a minimal loss and effective confidence interval (CI) and standard deviation. Decisions may be fine-tuned for crucial applications depending on the degree of CI. The findings indicated that the recommended mode is more interpretable, trustworthy, and applicable in situations demanding reliable predictions.
Findings of uncertainty analysis.
Models | Loss | SD | CI |
---|---|---|---|
PIM | 0.3 | 0.0003 | 97.5-98.3 |
Barua et al. (2022) | 1.2 | 0.0005 | 95.4-96.1 |
Vu et al. (2021) | 2.4 | 0.0003 | 97.5-98.5 |
Zhi et al. (2021) | 2.6 | 0.0007 | 96.1-97.2 |
Li et al. (2021) | 1.8 | 0.0003 | 95.3-95.8 |
EfficientNet B7 | 0.7 | 0.0005 | 95.1-96.3 |
Abbreviations: CI, confidence interval; PIM, pain identification model; SD, standard deviation.
A machine learning model’s performance may be evaluated based on a number of critical metrics and factors, including the testing time, the parameters, the computational techniques, and the floating-point operations per second (FLOPs). Table 5 provides valuable information on the models’ efficacy, computational cost, and overall performance. The number of parameters influences the ability of a model to learn from data. The possibility of overfitting increases in proportion to the number of parameters, even though additional parameters can make the model more capable of capturing intricate patterns. A lower number of FLOPs suggest that the recommended model is more efficient in terms of computing power. This is particularly valuable when deploying the model on devices with limited resources or when computational efficiency is paramount. A well-balanced model is accurate and efficient, making it deployable in terms of computing cost and utilization of resources.
Computational strategies.
Models | Parameters (in millions) | FLOPs (in giga) | Testing time (in seconds) |
---|---|---|---|
PIM | 5 | 9 | 1.2 |
Barua et al. (2022) | 12 | 17 | 1.8 |
Vu et al. (2021) | 18 | 21 | 1.4 |
Zhi et al. (2021) | 23 | 29 | 1.7 |
Li et al. (2021) | 27 | 19 | 2.1 |
EfficientNet B7 | 25 | 15 | 1.2 |
Abbreviations: FLOPs, floating-point operations per second; PIM, pain identification model.
Individuals with impairments may have challenges communicating their pain. It is essential to receive an early diagnosis in order to avoid the worsening of pain. The proposed model can analyze nonverbal clues, such as facial expressions or body movements, which may be used as an alternative approach for assessing pain intensity. It may aid in the early identification of pain, facilitating prompt intervention and enhancing pain control. The model can determine pain severity, enabling physicians to analyze patient discomfort. This may enhance subjective assessments and improve patient pain comprehension. An identification model can assess pain levels in real time during rehabilitation activities or treatments. This continual input enables physicians to customize the rehabilitation treatment depending on the suffering experienced by patients. The model may notify physicians of significant pain intensity. Early identification permits timely treatment in order to prevent additional problems.
As telemedicine and remote healthcare monitoring expand, the proposed model can remotely evaluate and monitor pain intensity. Remote monitoring will be beneficial for individuals who may lack convenient access to healthcare services. A better quality of life may be achieved by providing proper assessment and treatment. Improving pain treatment strategies using the proposed model may lead to superior outcomes and significant mental health. The PIM can strengthen pain-related studies by offering objective assessments of pain severity. It contributes to better findings in pain treatment by boosting assessment reliability and consistency.
CONCLUSION
The authors built an effective PIM using the DenseNet 201 and MobileNet V3 models. In addition, they improved the performance of the LightGBM model in order to detect pain using facial images. A dedicated image extraction was conducted using the LNN model. The authors improved the DenseNet 201 model using class activation map visualization. They performed additional feature extraction using the enhanced MobileNet V3 model. The hyperparameters of the DenseNet 201 and MobileNet V3 models were used to generate diverse features using limited resources. The features were converted into 2D arrays in order to enable the LightGBM model to make a final prediction. The PIM produced an Acc, Pre, Rec, and F1 of 97.8%, 97.0%, 97.0%, and 97.0%. The experimental findings suggested that the PIM can be deployed in healthcare centers for identifying pain using facial images. The minimum number of parameters and FLOPs indicated that the PIM can be implemented in a resource-constrained environment. However, extended training is required in order to improve the generalizability and interpretability of the PIM. Graph convolutional networks and transformers can replace the recommended feature engineering and image classification to improve the effectiveness of PIMs.