INTRODUCTION
Human activity recognition (HAR) is a popular research domain for its higher usability in fields like the medical industry (Dokania and Chattaraj, 2024). Its capability to be employed in medical systems has become more familiar, supporting medical specialists in making better solutions and permitting better distribution of medical resources dependent upon automatic monitoring. HAR’s extent has also surpassed the medical industry, and it is employed by people to monitor their physical condition or identify anomalous activities in older people, such as falls (Ramanujam et al., 2024). Additionally, human exercise should be more significant to people of every age; however, particularly for older persons, there is a requirement for a technique that is capable of monitoring these events recurrently, such as fitness and human abilities, and to recognize anomalous activities (Yazici et al., 2023). The requirement for the old people monitoring technique is improved because it saves money and time for individuals and allows earlier identification of risks nearby them. The technology is advanced for monitoring the activities at home; however, we will be outdoors. Protection is the main challenge for older people and visually challenged (Duhayyim, 2023). These older people are frequently more vulnerable to falls, accidents, and other risks because of impaired vision and restricted mobility. Monitoring their events will help avoid accidents and offer urgent assistance when required. It permits earlier identification of health problems, indications of distress, and behavioral alterations, allowing rapid medical involvement or changes to care tactics (Akilandeswari et al., 2022). With the support of everyday activities and overcoming possible protection difficulties, monitoring events increases the overall standard of living for the visually impaired and older people. This permits them to appreciate more comfort and fulfillment. The monitoring system will aid in decreasing medical expenses by avoiding hospitalization or nursing home locations. It can allow people to live independently, which is often more cost-effective than formal care (Alzahrani et al., 2023).
Recently, HAR has become a primary prevalent research domain (Hayat et al., 2022). Because of the lower energy consumption and reduced expenses, accessibility of accelerometers and sensors, and advancements in computer vision (CV), Internet of Things (IoT), machine learning (ML), and artificial intelligence (AI), several applications are developed employing human-centered development monitoring to categorize, find, and identify human activities, and also research workers have developed various techniques in this domain. HAR is a crucial tool for monitoring people’s dynamism and is achieved through ML methods (Tarik et al., 2023). HAR is a technique of automatically identifying and analyzing human activities dependent upon data gained from multiple smartphone sensors and wearable devices, namely accelerometers and gyroscope sensors, position, time, and alternative diverse environmental sensors. When combined with new technologies like the IoT, it will be employed in an extensive range of application fields, namely industry, sports, and medical field (Walle et al., 2022). AI is increasingly prevalent in HAR owing to its self-learning nature and robust classification methods. Currently, numerous research works are performed for HAR by utilizing ML and deep learning (DL) approaches; however, more consideration is needed to develop a model for the HAR method for older persons (Anitha and Priya, 2022). Recently, DL approaches such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been presented proficiently, and recent outcomes have even been accomplished by automatically learning features of the raw data from the sensors.
This article introduces Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with DL (IAM-CDMODL) technique for elderly and visually impaired people. At the initial stage, the IAM-CDMODL technique follows a bilateral filtering (BF) approach for image preprocessing. In addition, the IAM-CDMODL technique exploits the MobileNetV2 (MN-V2) model for learning complex and intrinsic patterns from the preprocessed images. Moreover, the CDMO method has been applied to choose better the hyperparameters related to the MN-V2 method. At the last stage, the deep convolutional neural network bidirectional long short-term memory (DCNNBiLSTM) method is applied to identify indoor activities. A wide range of simulations are executed on a benchmark database to guarantee the enhanced detection solution of the IAM-CDMODL method.
The proposed IAM-CDMODL technique integrates BF for preprocessing, which effectually mitigates noise while conserving significant edges and details in the input information. This step safeguards high-quality input features given to the method, improving the accuracy of the ensuing phases.
Implementing MN-V2 for learning intrinsic and complex patterns allows the technique to extract robust aspects from the data. The structure of MN-V2, which comprises depthwise separable convolutions (Conv), safeguards computational effectualness and is specifically appropriate to handle dense connections within the data.
The incorporation of the CDMO approach for hyperparameter selection confirms that the technique’s hyperparameters are finely tuned. This optimization model improves the method’s accomplishment by choosing optimum parameters, paving the way for faster convergence and greater accuracy.
The novelty of the presented technique is in its unique integration of advanced models such as BF, MN-V2, CDMO, and DCNNBiLSTM for preprocessing, feature extraction, hyperparameter tuning, and recognition of various indoor activities. This incorporation addresses several threats in indoor activity recognition, giving an overall and innovative outcome that improves both accuracy and efficiency.
LITERATURE WORKS
Deepa et al. (2023) present a Wi-Sense using the environment-independent fingerprint produced from the Wi-Fi channel state information. This human action classification system uses deep hybrid CNNs. The t-distributed stochastic neighbor embedding has been employed to remove excessive data further. Yanbo et al. (2021) proposed an Integrated DL-based Smart Video Monitoring Module (DLSVMM) technique. The DLSVMM is managed to transform higher-dimensional features into lower-dimensional features. Such collected data are trained in the sub-spaces and combined to encode the early retrieval influenced by error clearance performance utilizing digital communication codes to execute further enhancements. Busaeed et al. (2022) project a method utilizing LiDAR with an ultrasonic sensor and servo motor to gather data and forecast objects using DL. The paper approved this technique using a pair of smart glasses named LidSonic V2.0. Arduino assembles information, identifies problems utilizing easy data processing, functions the sensors on smart glasses, and offers buzzer opinions to visually impaired consumers.
Nagarajan and Gopinath (2023) developed a model utilizing an intended Honey Adam African Vultures Optimizer (HAAVO) methodology. The GAN and DCNN techniques have been deployed for detection. The DCNN and deep residual network classifier was employed to evaluate the distance utilizing the developed HAAVO that is attained by combining the Honey Badger Algorithm with AVO and Adam Optimizer. Nagarajan and Gopinath (Anitha and Priya, 2022) project a method utilizing the DL (VEFED-DL) technique. Once the preprocessing is over, the MobileNet method is used for feature extraction. Also, the removed spatial features have been fed into the GRU. At last, a GTOA with SAE was deployed as a dual classification method. Alzahrani et al. (2023) project an improved beluga whale optimization algorithm with the fuzzy-based Indoor Activity Monitoring (IBWOA-FIMS) approach. This method uses an ANFIS method for the indoor observing procedure. The IBWOA has been employed to alter the parameters linked to the ANFIS method to enhance the observing outcomes.
Yang et al. (2021) project a framework dependent upon impulse radio ultra-wideband (IR-UWB) and frequency-modulated continuous wave models. At first, features were extracted and integrated with the wavelet packet transform features on IR-UWB radar to observe the movement. A cascaded CNN module and an LSTM module are considered. In the study by Zhang et al. (2022), a Wi-Fi-based behavior sensing method has been presented. As the features of personnel actions would cause path variations, the path decomposition system can be planned for utilizing the path data as the action feature and ultimately develop the feature data to enhance the precision of detection; the method offers the BI-AT-GRU method where a bidirectional and attention device has been inserted to the gated loop module to attain action detection.
THE PROPOSED METHOD
This article introduces an IAM-CDMODL technique for elderly and visually impaired people. The method mainly intends to detect various indoor activities to ensure their safety. The IAM-CDMODL method contains distinct processes such as BF-based preprocessing, MN-V2-based feature extraction, CDMO-based hyperparameter tuning, and DCNNBiLSTM-based classification process. Figure 1 exemplifies the entire flow of the IAM-CDMODL method.

Overall flow of the IAM-CDMODL technique. Abbreviation: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning.
Image preprocessing
Initially, the IAM-CDMODL technique follows the BF approach for image preprocessing. BF surpasses as a powerful image preprocessing approach, mainly valuable for noise reduction but maintaining significant edges and designs (Radhika and Mahajan, 2023). With different typical smoothing manners that treat every pixel equally, BF assumes either spatial or intensity data. By integrating a weighted average of adjacent pixels depending on their spatial closeness and intensity similarity, BF efficiently smoothens images but preserves sharpness at edges. This generates BF that is particularly suitable in applications such as image denoising, HDR tone mapping, and increasing visual quality in CV tasks, offering a versatile tool for enhancing image quality without compromising essential details.
The steps involved in BF are as follows:
Spatial proximity: It takes into account the neighboring pixels within a particular spatial domain (defined by a kernel size or radius).
Intensity similarity: It weighs the contribution of every neighboring pixel depending on how similar its intensity (brightness) is to the central pixel. This resemblance is determined by utilizing a Gaussian kernel centered around the central pixel.
Filtering process: For every pixel in the image, the BF evaluates a weighted average of its neighbor’s intensities, with weights determined by both intensity resemblance and spatial distance. Pixels closer in spatial distance and with identical intensity values contribute more to the average, conserving edges and fine information while smoothing noise.
Parameters: Key parameters comprise the spatial extent (kernel size), the intensity standard deviation (determines how much intensity similarity affects weighting), and the spatial standard deviation (determines the spatial extent over which neighboring pixels are considered).
Feature extraction
The IAM-CDMODL technique exploits the MN-V2 model for learning complex and intrinsic patterns from the preprocessed images. TheMN-V2 technique is presented in the study by Xu and Mohammadi (2024). These developed systems are considered to be a solution for the colposcopy picture categorization issue. Few drives affect the structure of the MN-V2. Network training was subjected to the incidence of overfitting on visual classification, and the datasets used were nearly trivial. While employing a powerful and small network, the MN-V2 negated such an outcome. The MN-V2 is an efficient structure that enhances the usage of velocity and memory of implementation at the least charge when it initiates the fault. To clarify the configuration of MN-V2, it incorporates two crucial beginnings: separable depth-wise convolution (SD-WC) and inverted residual (IR). These concepts are described as follows.
The SD-WC is used in other efficient models, such as Xception, MN-V2, and ShuffleNet. The standard Conv is altered into SD-WC by dual processes. The first operator is a feature map-wise Conv, a separate Conv used for every feature map. The extended feature maps are loaded up and handled through a point-wise Conv, the second operator. This process is used to complete feature maps at the same time and has a 1 × 1 kernel. A conventional Conv manages the picture through its channel sizes of width and height. Concurrently, the SD-WC procedure takes the picture through its width and height during the first procedure and then processes the channel image size over the second procedure. The cost calculated for the standard Conv and SD-WC is definite in Equations (1) and (2) as shown below:
Here, i and j refer to the index of input and output layers, correspondingly. CSep and CNor denote the cost linked with SD-WC and standard Conv, respectively. dj and di refer to the output and input quantity, respectively. wi denotes the width, and hi signifies the height. Finally, k is the symbol of filtering size.
The benefits of using SD-WC is defined by Equation (3):
In the ResNet model, the inverted and residual blocks are vital modules. The three operators of Conv, bottlenecks, and residual links have been used in these blocks. A 1 × 1 filter can be applied in the first and last operators, moving statics from an input to an intermediate layer and from the intermediate to the output layer. In the middle layer, a 3 × 3 filter aids processing action. Many channels exist in the first and last Conv in the residual block. In the IR block, some channels are employed in the final and first Conv, contrasting with the interior Conv. Between them, the residual connecting stands among the first and final channels, and the quantity is lower in the MN-V2 method with ResNet. If several units are loaded in both methods, it results in a variation of lower and higher layers.
The MN-V2 network is a chain of blocks categorized as IR. They are loaded among two Conv that might perform as connecters by varying the input to an intermediate layer and another one to the output layer. The outcome is crossed during dual layers called inference and global mean pooling. So, the final layer of Conv is vast with 1280 channels and one 1 × 1 filter. The final layer is altered to limit the representation size to output 64 or 32 channels. This layer transforms into the final one used inside the set. As a result, the achieved feature maps are connected through those from other models and handled by the combined system. Next, this vector progresses through a linked layer to calculate the result. For the protection of the before-trained variables, other layers remained similar, but the unique method of MN-V2 was trained on a dataset of ImageNet. The complete network has been modified throughout the training process for the reasons discussed.
Hyperparameter tuning using the CDMO model
The CDMO methodology has been applied at this stage to determine the optimum choice of hyperparameters related to the MN-V2 method. The DMO is a metaheuristic model that pretends to dwarf mongoose (DM) foraging behavior and utilizes its apparent behavioral variations (Abdelrazek et al., 2024). The mongoose has dual foremost apparent behavioral variations, which are:
Generally, vast prey objects deliver food for the entire group, but they are not flexible enough to be taken by DMs because they do not have sufficient killing bites. The DM has developed a social organization that permits individuals to live independently and travel from one position to another. The DM survives a semi-nomadic way of life in a vast region to hold the whole colony. The traveling lifestyle safeguards from the entire area being discovered and averts over-exploitation of any one region.
Population initialization
The candidate populaces of the mongooses (X) have been set utilizing Equation (4). Among the lower bound (LB) and upper bound (UB), the populace is produced as follows:
where d is the dimension, xij refers to the location of the jth dimension in the ith populace, X denotes the populations generated randomly by Equation (5), and n represents the size of the population.
Here, rand specifies the random number between 0 and 1, and VarMin and VarMax are the LB and UB of the issue, respectively. The finest solution is the optimum acquired performance till now.
The fitness of every solution was intended after the populace started. Equation (6) computes the possibility value for every populace fitness. Then, the alpha female (α) is selected, dependent on this prospect.
The n-bs is equivalent to the number of mongooses in the alpha group. Here, peep denotes the alpha female’s vocalization, and bs signifies the number of nannies.
The DMO uses Equation (7) to deliver the candidate’s food location.
Meanwhile, phi represents the evenly spread random numbers [−1 and 1]. After every iteration, the sleeping mound is stated in Equation (8).
The sleeping mound’s normal value is specified by Equation (9).
The mongooses are recognized to prevent repaying to the preceding sleeping mound. Thus, the spies hunt for the subsequent one to make sure of exploration. It is pretended by Equation (10).
→M=∑ni = 1xi×smiXi refers to the vector that uses the mongoose’s action to its novel sleeping mound. CF=(1−iterMaxiter)(2iterMaxiter) specifies the variable, which reduces with every iteration.
Chaos is an event that can display non-linear variations in upcoming behavior when its early state is even somewhat changed. In addition, it is defined as a semi-random behavior produced by non-linear deterministic methods. One of the foremost exploration techniques is the chaos optimizer algorithm, which changes parameters and variables from the chaos to the space of the solution. The model utilized 10 familiar one-dimensional (1D) maps often employed in the literature to acquire the chaotic sets.
Fitness choice is a significant factor in controlling the solution of the CDMO approach. The parameter choice contains the outcome of an encoded method for measuring the performance of candidate results. During this case, the CDMO methodology assumes accuracy as the primary condition to design the fitness function that is expressed as follows:
FP and TP stand for the false and true positive values, respectively.
DCNNBiLSTM-based classification
In the last stage, the DCNNBiLSTM is applied to identify indoor activities. Several array layers are employed to develop data in the CNN (Hnamte and Hussain, 2023). The drive of the convolution (Conv) layer is to remove features while maintaining sequential data. The CNN has produced outstanding results for a few tasks like image detection. Every sequential layer of an NN is contained in input neuron links. This specific region has been recognized as the local receptive area, which mainly focuses on hidden neurons. It examines the incoming data inside the definite area without being alerted to the variations arising outside the limit.
LSTM is an artificial neural network employed in DL and AI models. LSTM features feedback links compared to the conventional feed-forward NN. This will analyze RNNs, individual data points, and entire data series. For this reason, the LSTM method has been developed. It will address the problem of RNNs’ long-term dependency, where the RNNs are incapable of forecasting words kept in LSTM but deliver many accurate forecasts depending on the existing input; when the size of the gap increases, RNNs’ performance will become less effective. At the same time, the LSTM can keep data for long periods. It has been employed for categorizing, estimating, and processing data depending on the time sequence.
In Equation (13), Φ represents the function of sigmoid; ht refers to the output from the HL; and xt denotes the input of the network. Ct represents the cell state; ˜Ct, ˆWi, ˆWO, ˆWf specify the state candidate values; ˆWc refers to the weight applied on output, drop, input, and memory state; at the same time, Bf , Bi , and BC indicate the bias for the output and input, which are employed in Equations (13)–(18). The drop gate defines whether data can be misplaced, the cell notes the processing status, the input gate defines if input data will be preserved, and the output gate provides the result.
The DCNNBiLSTM structure uses the CNN layer to extract features on input data BiLSTM for sequence prediction to acquire the preferred outcomes and the deep neural network (DNN) to optimize loss and fault. The hybrid BiLSTM and CNN technique was primarily recognized as the CNNBiLSTM structure, which mainly remained for the long-term recurrent convolutional network method. A DCNNBiLSTM is probably described by initially placing layers of the CNN on the front end, tracking using a layer of BiLSTM, and, lastly, a DNN model monitored by the output layer. Figure 2 depicts the infrastructure of DCNNBiLSTM.

Structure of the DCNNBiLSTM model. Abbreviation: DCNNBiLSTM, deep convolutional neural network bidirectional long short-term memory.
The CNN is stimulated with the ReLU, and the formulation can be assumed in Equation (19). Batch normalization has been expressed in Equation (20). The function of softmax is used for the resultant layer, as calculated in Equation (21).
BiLSTM is a mixture of dual-independent LSTMs. This structure allows the methods to have forward and backward sequence data at every input series.
PERFORMANCE VALIDATION
The fall detection results of the IAM-CDMODL technique are tested using the multiple cameras fall (MCF) database and UR Fall Detection (URFD) database. The MCF (Auvinet et al., 2010) database includes 192 samples under 2 classes, as defined in Table 1. The suggested technique is simulated using the Python 3.6.5 tool on PC i5-8600k, 250 GB SSD, GeForce 1050Ti 4G B, 16 GB RAM, and 1 TB HDD. The following parameter settings are provided: a learning rate of 0.01, ReLU activation, an epoch count of 50, a dropout of 0.5, and a batch size of 5.
Details of the MCF database.
Multiple cameras fall database | |
---|---|
Class | No. of videos |
Fall events | 96 |
Non-fall events | 96 |
Total videos | 192 |
Abbreviation: MCF, multiple cameras fall.
Figure 3 illustrates the confusion matrices produced by the IAM-CDMODL methodology with 80:20 and 70:30 of TRAPH/TESPH on the MCF database. The outcome infers the effective detection of the fall and non-fall event samples under each class label.

Confusion matrices on the MCF database: (a and b) 80:20 of TRAPH/TESPH and (c and d) 70:30 of TRAPH/TESPH. Abbreviation: MCF, multiple cameras fall.
The fall detection outcome of the IAM-CDMODL technique on the MCF database is stated in Table 2 and Figure 4. The outcome highlighted that the IAM-CDMODL technique accurately categorized fall and non-fall event classes. On 80% of TRAPH, the IAM-CDMODL technique provides an average accuy of 99.35%, precn of 99.38%, sensy of 99.32%, specy of 99.32%, and F score of 99.35%. Also, on 20% of TESPH, the IAM-CDMODL technique provides an average accuy of 97.44%, precn of 97.22%, sensy of 97.73%, specy of 97.73%, and F score of 97.41%. Besides, on 70% of TRAPH, the IAM-CDMODL technique provides an average accuy of 98.51%, precn of 98.59%, sensy of 98.46%, specy of 98.46%, and F score of 98.50%. Moreover, on 30% of TESPH, the IAM-CDMODL technique provides an average accuy of 98.28%, precn of 98.21%, sensy of 98.39%, specy of 98.39%, and F score of 98.27%.
Fall detection outcomes of the IAM-CDMODL approach on the MCF database.
Classes | Accuy (%) | Precn (%) | Sensy (%) | Specy (%) | F score (%) |
---|---|---|---|---|---|
TRAPH (80%) | |||||
Fall events | 99.35 | 98.75 | 100.00 | 98.65 | 99.37 |
Non-fall events | 99.35 | 100.00 | 98.65 | 100.00 | 99.32 |
Average | 99.35 | 99.38 | 99.32 | 99.32 | 99.35 |
TESPH (20%) | |||||
Fall events | 97.44 | 94.44 | 100.00 | 95.45 | 97.14 |
Non-fall events | 97.44 | 100.00 | 95.45 | 100.00 | 97.67 |
Average | 97.44 | 97.22 | 97.73 | 97.73 | 97.41 |
TRAPH (70%) | |||||
Fall events | 98.51 | 97.18 | 100.00 | 96.92 | 98.57 |
Non-fall events | 98.51 | 100.00 | 96.92 | 100.00 | 98.44 |
Average | 98.51 | 98.59 | 98.46 | 98.46 | 98.50 |
TESPH (30%) | |||||
Fall events | 98.28 | 96.43 | 100.00 | 96.77 | 98.18 |
Non-fall events | 98.28 | 100.00 | 96.77 | 100.00 | 98.36 |
Average | 98.28 | 98.21 | 98.39 | 98.39 | 98.27 |
Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; MCF, multiple cameras fall.

Average of the IAM-CDMODL approach on the MCF database. Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; MCF, multiple cameras fall.
The performance of the IAM-CDMODL technique on the MCF database (80:20) is graphically represented in Figure 5 in the form of training accuracy (TRAA) and validation accuracy (VALA) curves. The figure shows a valuable interpretation of the behavior of the IAM-CDMODL method over various epochs, pointing out its learning process and generalization abilities. Notably, the figure reported a continuous development in the TRAA and VALA with increasing epoch count. It ensures the adaptive nature of the IAM-CDMODL technique in the pattern recognition process on both TRA and TES data. The rising trend in VALA outlines the ability of the IAM-CDMODL method to adapt to the TRA data and also excel in providing accurate classification of hidden data, demonstrating strong generalizability.

Accuy curve of the IAM-CDMODL approach on the MCF database (80:20). Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; MCF, multiple cameras fall.
A complete representation of the training loss (TRLA) and validation loss (VALL) results of the IAM-CDMODL technique on the MCF database (80:20) over distinct epochs is demonstrated in Figure 6. The progressive reduction in TRLA highlights the IAM-CDMODL technique improving the weights and reducing the classifier error on the TRA and TES data. The figure clearly shows the IAM-CDMODL model’s relationship with the TRA data, emphasizing its ability to capture patterns within both databases. Notably, the IAM-CDMODL method continually enhances its parameters in decreasing the differences between the prediction and real TRA class labels.

Loss curve of the IAM-CDMODL technique on the MCF database (80:20). Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; MCF, multiple cameras fall.
In Table 3 and Figure 7, the comparison study of the IAM-CDMODL technique on the MCF database is reported (Vaiyapuri et al., 2021). The results highlighted that the 1D-ConvNN, 2D-ConvNN, and ResNet50 models have revealed reduced accuy values of 94.36%, 95.57%, and 96.15%, respectively. Next, the ResNet101, VGG16, VGG19, and IMEFDO-DCNN models have reported closer accuy values of 96.55%, 98.08%, 98.08%, and 99.21%, respectively. Nevertheless, the IAM-CDMODL technique provided better results with a maximum accuy of 99.35%.
Accuy outcomes of the IAM-CDMODL technique with other models under the MCF database.
MCF database | |
---|---|
Methods | Accuracy (%) |
VGG16 | 98.08 |
VGG19 | 98.08 |
1D-ConvNN | 94.36 |
2D-ConvNN | 95.57 |
ResNet50 | 96.15 |
ResNet101 | 96.55 |
IMEFDO-DCNN | 99.21 |
IAM-CDMODL | 99.35 |
Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; MCF, multiple cameras fall.

Accuy outcome of the IAM-CDMODL technique under the MCF database. Abbreviations: Conv, convolution; IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; MCF, multiple cameras fall.
Table 4 and Figure 8 show the comparison between the training time (TRAT) and testing time (TEST) study of the IAM-CDMODL approach on the MCF database. The results highlighted that the IAM-CDMODL approach performs more efficiently than other approaches. Based on TRAT, the IAM-CDMODL technique obtains a lesser TRAT of 1019.01 s, while VGG16, VGG19, 1D-ConvNN, 2D-ConvNN, ResNet50, ResNet101, and IMEFDO-DCNN approaches achieve higher TRAT values of 3627.54 s, 3189.12 s, 2341.52 s, 1903.60 s, 1163.50 s, 1274.23 s, and 1137.10 s, respectively. Moreover, based on TEST, the IAM-CDMODL technique obtains a lesser TEST of 411.89 s, while VGG16, VGG19, 1D-ConvNN, 2D-ConvNN, ResNet50, ResNet101, and IMEFDO-DCNN approach achieves higher TEST values of 1758.41 s, 1482.53 s, 924.73 s, 845.73 s, 946.28 s, 932.51 s, and 735.56 s, respectively.
TRAT and TEST outcomes of the IAM-CDMODL technique with other models under the MCF database.
Multiple cameras fall database | ||
---|---|---|
Methods | TRAT (s) | TEST (s) |
VGG16 | 3627.54 | 1758.41 |
VGG19 | 3189.12 | 1482.53 |
1D-ConvNN | 2341.52 | 924.73 |
2D-ConvNN | 1903.60 | 845.73 |
ResNet50 | 1163.50 | 946.28 |
ResNet101 | 1274.23 | 932.51 |
IMEFDO-DCNN | 1137.10 | 735.56 |
IAM-CDMODL | 1019.01 | 411.89 |
Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; MCF, multiple cameras fall; TEST, testing time; TRAT, training time.

TRAT and TEST outcomes of the IAM-CDMODL technique under the MCF database. Abbreviations: Conv, convolution; IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; MCF, multiple cameras fall; NN, neural network; TEST, testing time; TRAT, training time.
The URFD database (http://fenix.ur.edu.pl/~mkepski/ds/uf.html) includes 192 samples under 2 classes, as demonstrated in Table 5.
Details of the URFD database.
URFD database | |
---|---|
Classes | No. of frames |
Fall events | 74 |
Non-fall events | 240 |
Total frames | 314 |
Abbreviation: URFD, UR Fall Detection.
Figure 9 depicts the confusion matrices achieved by the IAM-CDMODL technique with 80:20 and 70:30 of TRAPH/TESPH under the URFD database. The results inferred the effectual recognition of the fall and non-fall event samples under all classes.

Confusion matrices on the URFD database: (a and b) 80:20 of TRAPH/TESPH and (c and d) 70:30 of TRAPH/TESPH. Abbreviation: URFD, UR Fall Detection.
The fall detection outcome of the IAM-CDMODL methodology on the URFD database is depicted in Table 6 and Figure 10. The experimental values inferred that the IAM-CDMODL methodology accurately classified fall and non-fall event classes. On 80% of TRAPH, the IAM-CDMODL methodology offers an average accuy of 99.74%, precn of 99.12%, sensy of 99.74%, specy of 99.74%, and F score of 99.43%. Also, on 20% of TESPH, the IAM-CDMODL methodology offers an average accuy of 98.89%, precn of 97.37%, sensy of 98.89%, specy of 98.89%, and F score of 98.09%. Besides, on 70% of TRAPH, the IAM-CDMODL methodology offers an average accuy of 93.55%, precn of 96.94%, sensy of 93.55%, specy of 93.55%, and F score of 95.08%. Moreover, on 30% of TESPH, the IAM-CDMODL approach provides an average accuy of 97.06%, precn of 99.37%, sensy of 97.06%, specy of 97.06%, and F score of 98.17%.
Fall detection outcomes of the IAM-CDMODL technique on the URFD database.
Classes | Accuy (%) | Precn (%) | Sensy (%) | Specy (%) | F score (%) |
---|---|---|---|---|---|
TRAPH (80%) | |||||
Fall events | 100.00 | 98.25 | 100.00 | 99.49 | 99.12 |
Non-fall events | 99.49 | 100.00 | 99.49 | 100.00 | 99.74 |
Average | 99.74 | 99.12 | 99.74 | 99.74 | 99.43 |
TESPH (20%) | |||||
Fall events | 100.00 | 94.74 | 100.00 | 97.78 | 97.30 |
Non-fall events | 97.78 | 100.00 | 97.78 | 100.00 | 98.88 |
Average | 98.89 | 97.37 | 98.89 | 98.89 | 98.09 |
TRAPH (70%) | |||||
Fall events | 87.72 | 98.04 | 87.72 | 99.38 | 92.59 |
Non-fall events | 99.38 | 95.83 | 99.38 | 87.72 | 97.58 |
Average | 93.55 | 96.94 | 93.55 | 93.55 | 95.08 |
TESPH (30%) | |||||
Fall events | 94.12 | 100.00 | 94.12 | 100.00 | 96.97 |
Non-fall events | 100.00 | 98.73 | 100.00 | 94.12 | 99.36 |
Average | 97.06 | 99.37 | 97.06 | 97.06 | 98.17 |
Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; URFD, UR Fall Detection.

Average of the IAM-CDMODL technique on the URFD database. Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; URFD, UR Fall Detection.
The performance of the IAM-CDMODL technique on the URFD database (80:20) is graphically represented in Figure 11 in the form of TRAA and VALA curves. The figure shows a valuable interpretation of the behavior of the IAM-CDMODL technique over various epochs, demonstrating its learning process and generalization abilities. Notably, the figure infers a steady improvement in the TRAA and VALA with maximum epoch count. It ensures the adaptive nature of the IAM-CDMODL technique in the pattern recognition method on both TRA and TES data. The rising trend in VALA outlines the ability of the IAM-CDMODL method to adapt to the TRA data and also excel in offering accurate classification of hidden data, illustrating robust generalization abilities.

Accuy curve of the IAM-CDMODL methodology on the URFD database (80:20). Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; URFD, UR Fall Detection.
Figure 12 illustrates a complete representation of the TRLA and VALL outcomes of the IAM-CDMODL technique on the URFD database (80:20) over different epochs. The progressive decrease in TRLA emphasizes the IAM-CDMODL technique, which improves the weights and minimizes the classifier error rate on the TRA and TES data. The figure clearly represents the IAM-CDMODL model’s relationship with the TRA data, highlighting its proficiency in capturing patterns within both databases. The IAM-CDMODL technique continually enhances its parameters to minimize the differences between the prediction and real TRA class labels.

Loss curve of the IAM-CDMODL methodology on the URFD database (80:20). Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; URFD, UR Fall Detection.
In Table 7 and Figure 13, the comparison outcome of the IAM-CDMODL technique on the URFD database is reported. The experimental values emphasized that the 1D-ConvNN, 2D-ConvNN, and ResNet50 techniques have shown decreased accuy values of 92.96%, 95.24%, and 95.56%, respectively. Next, the ResNet101, VGG16, VGG19, and IMEFDO-DCNN models have described closer accuy values of 96.44%, 97.88%, 98.28%, and 99.59%, correspondingly. Nevertheless, the IAM-CDMODL method obtains the best outcomes with the highest accuy of 99.74%.
Accuy outcomes of the IAM-CDMODL technique with other models under the URFD database (Vaiyapuri et al., 2021).
URFD database | |
---|---|
Methods | Accuracy (%) |
VGG16 | 97.88 |
VGG19 | 98.28 |
1D-ConvNN | 92.96 |
2D-ConvNN | 95.24 |
ResNet50 | 95.56 |
ResNet101 | 96.44 |
IMEFDO-DCNN | 99.59 |
IAM-CDMODL | 99.74 |
Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; URFD, UR Fall Detection.

Accuy outcome of the IAM-CDMODL technique under the URFD database. Abbreviations: Conv, convolution; IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; URFD, UR Fall Detection.
The comparison of the TRAT and TEST studies of the IAM-CDMODL methodology on the URFD database is defined in Table 8 and Figure 14. The outcomes indicated that the IAM-CDMODL methodology has efficient performances compared to other techniques. Based on TRAT, the IAM-CDMODL method attains a lesser TRAT of 980.87 s, while VGG16, VGG19, 1D-ConvNN, 2D-ConvNN, ResNet50, ResNet101, and IMEFDO-DCNN methods obtain a higher TRAT of 2352.72 s, 2778.66 s, 1173.68 s, 1228.89 s, 1420.94 s, 1545.76 s, and 1014.18 s, correspondingly. Furthermore, based on TEST, the IAM-CDMODL technique attains a minimum TEST of 401.11 s, while VGG16, VGG19, 1D-ConvNN, 2D-ConvNN, ResNet50, ResNet101, and IMEFDO-DCNN techniques attain a high TEST of 1108.88 s, 1372.27 s, 828.06 s, 780.07 s, 879.10 s, 925.89 s, and 677.47 s, correspondingly.
TRAT and TEST outcomes of the IAM-CDMODL technique with other models under the URFD database.
URFD database | ||
---|---|---|
Methods | TRAT (s) | TEST (s) |
VGG16 | 2352.72 | 1108.88 |
VGG19 | 2778.66 | 1372.27 |
1D-ConvNN | 1173.68 | 828.06 |
2D-ConvNN | 1228.89 | 780.07 |
ResNet50 | 1420.94 | 879.10 |
ResNet101 | 1545.76 | 925.89 |
IMEFDO-DCNN | 1014.18 | 677.47 |
IAM-CDMODL | 980.87 | 401.11 |
Abbreviations: IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; TEST, testing time; TRAT, training time; URFD, UR Fall Detection.

TRAT and TEST outcomes of the IAM-CDMODL technique under the URFD database. Abbreviations: Conv, convolution; IAM-CDMODL, Indoor Activity Monitoring using the Chaotic Dwarf Mongoose Optimization with Deep Learning; NN, neural network; URFD, UR Fall Detection; TEST, testing time; TRAT, training time.
CONCLUSION
This article introduces an IAM-CDMODL technique for elderly and visually impaired people. The IAM-CDMODL technique aims to recognize several indoor activities to confirm the safety of the elderly and visually impaired people. The IAM-CDMODL technique utilized the following diverse processes: BF-based preprocessing, MN-V2-based feature extraction, CDMO-based hyperparameter tuning, and DCNNBiLSTM-based classification. First, the IAM-CDMODL technique employs the BF approach for image preprocessing; then it implements the MN-V2 model for learning complex patterns from preprocessed images. Moreover, the CDMO model chooses the hyperparameters related to the MN-V2 model. Finally, the DCNNBiLSTM approach is applied to identify various indoor activities. To ensure the enhanced detection performance of the IAM-CDMODL methodology, a wide range of simulations were performed on the MCF and URFD datasets. The experimental values implied that the IAM-CDMODL methodology demonstrates its superior solution over recent models. The limitations of the IAM-CDMODL approach comprise potential threats in handling large-scale datasets due to computational constraints caused by the MN-V2 and DCNNBiLSTM models. The efficiency of the CDMO method for hyperparameter selection may change across activity detection tasks and diverse datasets. Future studies may focus on optimizing the computational effectualness of the technique, improving its scalability to various environmental conditions and activities, and exploring other DL or optimization models to enhance the accuracy and generalization.