Improved Classification Accuracy for Identification of Cervical Cancer

The major purpose of this research is to forecast cervical cancer, compare which algorithms perform well, and then choose the best algorithm to predict cervical cancer at an early stage. Cervical cancer classification can be automated using a machine learning system. This study evaluates multiple machine learning techniques for cervical cancer classification. For this classification, algorithms such as Decision Tree, Naive Bayes, KNN, SVM, and MLP are proposed and evaluated. The cervical cancer Dataset, which was retrieved from the UCI machine learning data repository, was used to test these methods. With the help of Sciklit-learn, the algorithms' results were compared in terms of Accuracy, Sensitivity, and Specificity. Sciklit-learn is a Python-based machine learning package that is available for free. Finally, the best model for predicting cervical cancer is developed.

Symptoms: • Vaginal bleeding during or after a sexual encounter, during periods, or after menopause.
• Vaginal discharge that is watery, bloody, and has a terrible odour.
• Pelvic pain or discomfort during sexual activity.

II. PROPOSED SYSTEM
According to the World Health Organization, cervical cancer is the fourth most common cancer in women, accounting for 7.9% of all malignancies in women. Kurnianingsih et al., [9] has expressed views that Athinarayanan et al., [10] used an automated detection system to find the cells from the input images produced at the time of testing the patient.
Texture features are premeditated in the decision making system along with SVM. The experimental results are compared with other classifiers KNN and ANN which resulted SVM is better classifier. This proposed system helps the physician to make decision faster for further treatment of the patient as the classification of cells is done accurate.
Ghoneim et al., [11] has drawn a conclusion from previous cases that computer-aided systems are more accurate and Deep learning based systems are more efficient. Cervical Cancer prediction here is dealt with Convolutional Neural Networks (CNN) model followed with Extreme learning machine (ELM) based classifier. CNN extract features from raw input and that data is fed in to ELM-classifier, here two CNN models are used "VGG-16 Net and CaffeNet" for the efficient extraction of features and then output is fed into ELM-classifier which results the detection of cervical cancer cells.
Taha et al., [12] has focused mainly on the

IV. Methodology
Algorithms for Classification Classification is a model (functions) for describing and distinguishing classes of data or concepts with the goal of predicting the class of an object whose label class is unknown. Classification is an aspect of data mining, which is a term used to describe the process of discovering knowledge in databases. Data mining is also a technique for extracting and finding meaningful information and relevant knowledge from massive databases using statistical approaches, mathematics, artificial intelligence, and machine learning.

Classifier based on Decision Trees
The decision tree is a classification approach in which the splitting criteria are used to classify the data. The decision tree is a tree-like flow chart that sorts instances based on attribute (feature)

P (Ci |S) = P (S|Ci) P (Ci) /P(S)
The Bayesian Classification provides a useful framework for analysing and comprehending various learning methods. As a result, it estimates precise probabilities for hypotheses and is resistant to noise in the categorization input data.

Support Vector Machine classifier
Two classes of SVM training

Pre-Processing of Data
The extraction of valuable information and results is primarily dependent on the data's quality, while medical data is influenced by factors such as missing values, noisy data, inconsistencies, and outliers. As a result, before beginning the machine learning process, the data must be processed, and data pre-processing is an important step in increasing data efficiency. Data pre-processing includes data preparation and dataset transformation, which improves the effectiveness of knowledge discovery. The following steps were used to pre-process data in this project: These missing values were recorded in the dataset as"?" and imputed in one of two ways Imputation rested on mean values this is the most universally used insinuation tactics. This method works by calculating the mean value of non-missing data in a specific column and then replacing missing values with that value in that column.

Data transformation:
During the data transformation step, the data is converted or consolidated so that the processing results are more efficient and the existing patterns are easier to understand. The data is then ready for processing and the application of machine learning algorithms. Normalization is a data transformation strategy that refers to the process of scaling feature values to be within a small specified range or common range, such as [0,1] or [-1,1].Normalization methods include Min-Max, decimal, and Z-score normalisation.
This dataset was normalised using the min-max method.

5.Outliers:
Finding outliers in a dataset is a difficult and time-

VI. FUTURE WORK
In the future, we will improve the dataset used to detect cervical cancer and increase the efficiency of future prediction models by including several essential attributes that aid in the early detection of cervical cancer. Some data could be gathered and added to the dataset. To find the best classification method, multiple classification algorithms can be used.
Because this is a medical problem, and cervical cancer is more of a personal female problem, data authenticity is always an issue. The emphasis should be on obtaining data that is accurate and reliable.