Abnormal Activity Recognition in Private Places Using Deep Learning

----------------------------------------------------------------------------------------------------------------- Abstract: applying computer and engine delusion


I. INTRODUCTION
The automated teller machine (ATM) is now one of the most crucial tools used by customers all over the world to withdraw cash or conduct other activities. Yet, the ATM is where the major crimes are committed. Every day, there are several locations where ATM machines are robbed, creating a security most important information from a lengthy movie should exist. The main information in surveillance videos is any suspicious activity, such as robberies and murders. So, it is necessary to extract this crucial information from lengthy videos. It is impossible to manually monitor every incident captured on the CCTV camera. Even if the incident had already occurred, manually searching for it in the recorded video is a timeconsuming process. Sadly, there are a number of reasons why the existing systems are not very effective at detecting behavior and activity.The goal of this project is to develop an algorithm that would enable the authorities to identify suspicious frames from a lengthy surveillance video and provide them with priority information. The Convolution Neural Networks technique with Deep Learning was utilized in this study to sample the important data from the surveillance videos. The most important information concerned any suspicious activitysuch as a robbery, murder, theft, etc. -that occurred inside an ATM. The CNN model's outcomes successfully extracted suspicious activity frames from a lengthy movie, allowing users to first identify the features before extracting worrisome frames.Intelligent solutions that can automatically provide accurate warning feedback in real time are what we need. Monitoring of the ATM that looks for unusual behaviors. It calculates their position relations and extracts features that can be utilized to study a person's behavior in an efficient manner. When the system notices an odd behavior, it notifies the ATM monitoring staff, sends a warning message, and activates an alarm in the ATM.In this research work object detection is implemented using YOLOV5 algorithm. Conventional Neural Network (CNN) was designed and trained on the datasets in order to evaluate the performance of CNN trained from scratch. The performance of these models are evaluated using metrics such as accuracy, loss, precision, recall and f1-score. Confusion matrix is used to evaluate the model on a test data set issue. Each ATM has a watchman assigned to it in order to avoid this issue. Every day, numerous such films are captured by CCTV cameras installed within the ATM. Videos that have been recorded are too long, and automated video analysis techniques [2] have not yet produced the expected outcomes. As the videos are so long, watching them all becomes difficult and tedious

II.LITERATURE REVIEW
In this section, we present the related work and research undergone in developing video based security system. It suggested a deep network architecture based on residual bidirectional long-term memory (LSTM). With an improvement in recognition rate, the new network was capable of avoiding gradient vanishing in temporal and spatial dimensions. To understand the complexity of activities recognition and classification, two LSTM models, the basic model and the proposed model, were used in a comparative analysis to understand the classification of the models for the classification of images of five human activities such as abuse, arrest, arson, assault, and fighting.The suggested model is used to conduct the categorization [1] of five distinct human activities, and its performance is excellent. The training and testing accuracies were 99. 68%. With no loss and 0. 016%, the training and classification losses are both excessively low. The findings revealed that the suggested LSTM [3] model was extremely effective in training and comprehending human actions, as well as performing well in categorization.Further research will focus on constructing new LSTM-based recurrent neural network models capable of recognizing human actions even in large-scale films.

III.PROPOSED SYSTEM
With the literature review been conducted, it was revealed that the Deep Learning Models have been widely used resulting better scales of accuracy and to serve the Human Activity Recognition process revealed that the Deep Learning Models have been widely used resulting better scales of accuracy and to serve the Human Activity Recognition process. Data set ATM Image [6] (ATM) comprises 1491 images that cover most of the angles in which an ATM box can be viewed in an ATM vestibule. Images in the are augmented with blur (up to 2. 25px) and noise (up to 6% of pixels) effects. Augmentation is done to expand the data set and increase model performance. The image dateset has been created where each image is bounding box annotated for the ATM and person class. Second freely available dateset is ATM Anomaly Video (ATMA-V) [9] Dateset from Kaggle. The video comprises 65 videos that consist of both anomalous and normal video segments.As part of our abnormal behavior classification [11], a dateset carried out those activities Such as Fight, Activity with Knife, Normal Videos, Property Damage, robbery, peeping to check the password, snatching the withdrawn money, covered face etc. and classified [7] the that activates are normal or abnormal.

CNN Architecture
The CNN model was defined as having two CNN hidden layers.Each of the mare followed by two dropout layer so connected layer is used to interpret the features extracted by the CNNhidden layers. Finally,plate layer with the soft m was add as the final layer to make predictions(TableI).The sparse categorical cross entropy loss function will be used as the loss function and the efficient Adam version of stochastic gradient descent was used to optimize the network with a learning rate of 0. 001. CNN model was trained for 50epochs and a batch size of 64 samples were used. After the model is fit, it was evaluated on the test dateset and the accuracy of the CNN [14] model was obtained Table 2. The Dimensional Sructure Of The Adopted Immodest.
Python programming language. With the four model architectures described in the previous section,altogether models were compiled together with the sparse categorical entropy loss function and the Adam optimizer with nonthreatening 0. 001. All the NN models was fitted for the training data and-test data with a batch size of 64 and run for 50 epochs.The training accuracy was then plotted together With the varying the iterations for performance evaluation related to the two With respect to the CNN model, a training accuracy of0. 995%was achieved.

Object detection and Tracking
The frames are given as input to YOLOv5 (the best version of YOLO is considered for detection). The Bounding box output of YOLOv5 as input to the Object tracking phase. Track Identities is assigned to the detected bounding boxes, trajectory of which needs to be found. The bounding box from the object detection phase is used as reference to analyze the performance metric. Metrics such as false positive, false negative, true positive, true negative, mean average procession, MOTA (Multi Object Tracking Accuracy) and MOTP (Multi Object Tracking Procession) is analyzed to appreciate the accuracy of the detector and tracker.

IV Result Analysis Architecture
The LSTM model was defined as having a single LSTM[12] hidden layer. Dropout layer valuing0. 5followsthis. Serenade fully connected layer is used to interpret the features extracted by the single LSTM hidden layer. Finally, a dense layer was added as the final layer to make predictions.

Performance Metrics
The choice of performance metrics [13] will influence the analysis of the algorithms. This helps in identifying the reasons form is-classifications so that It can be corrected by taking necessary y measures.

Results of object detection and Tracking
Object classification is performed on the state-of -theart network called CNN. The network designed consists of 3,697,188 tunable parameters. The Accuracy of the network is gradually increasing, and loss curve is gradually decreasing with increase in number of epochs.

Confusion matrix
The performance of the classification model is measured using confusion matrix. Results Of Object Detection And Classifier Cnn Results And Analysis [16] Better accuracy and loss values achieved on large datasets a n d t h e model is more generalized when trained On Large Dateset. Non depreciating recall scores are comparable in both cases Table 6. Cnn Output Details The activity column which is categorical variable In the data set was then converted into then numerical format.

CNN Architecture
The CNN model was defined as having two CNN hidden layers.Each of the mare followed by two dropout layers of Then a dense fully connected layer is used to interpret the features extracted by the CNN hidden layers.Finally,plate layer with the soft max activation function was added as the final laye rto make predictions(TableI).The sparse categorical cross entropy loss function will be used as the loss function and the efficient adam version of-stochastic gradient descent was used to optimize the network[15] with a learning rate of 0. 001. CNN model was trained for 50epochs and a batch size of 64 samples were used. After the model is fit, it was evaluated on the test dateset and the accuracy of the CNN model was obtained For the purpose of compiling and training the same values for the loss function, optimizer, batch size and the number of epochs, which we used, in compiling and training the CNN model we reused. After the model is fit, it was evaluated on the test dateset and the accuracy was obtained.Multipurpose, the Labe l Encoder function from the Sk learn library was used for prepossessing. In the process of feature scaling, all the features were scaled to be within the same range, which would guarantee the value manipulations of every features equivalent and reweigh naturally the prediction model by real-dependency of the corresponding relevance of the features .Here the Sklearn's Standard Scaler function, which scale each feature by its maximum absolute value, was used for the scaling. in real time. Bounding boxes, which functioned as classes in this case, are utilized to detect tagged items. This is then used to categories labels in video and forecast whether the occurrences are normal or abnormal. that result is calculated using the Motion representation Depth data is derived from the classes' bounding boxes. Then multistream CNNs are used to distinguish constituents and actions. The choosing of an appropriate algorithm for a certain job. There is always a trade-off between speed and precision. The classifier trained on the Indigenous dateset has a validation accuracy of 99. 5%.It will be a p e r f e c t t a s k if we can generate with the use of appropriate sensors and applications for a defined number of frequent activitie speople are performing in day to day lives. This research are a seems having multiple advanced applications with Deep Learning [16] applications in near future. In the future, the proposed approach can be evaluated for other real-world outdoor scenarios like railway platforms, shopping malls, etc. Also, for the detection of unwanted objects, deep learningbased object detection models can be combined with the proposed framework for further improvement.