A Study on the journey of Natural Language Processing models: from Symbolic Natural Language Processing to Bidirectional Encoder Representations from Transformers

In this digital era, Natural language Processing is not just a computational process rather it is a way to communicate with machines as humanlike. It has been used in several fields from smart artificial assistants to health or emotion analyzers. Imagine a digital era without Natural language processing is something which we cannot even think of. In Natural language Processing, firstly it reads the information given and after that begins making sense of the information. After the data has been properly processed, the real steps are taken by the machine throwing some responses or completing the work. In this paper, I review the journey of natural language processing from the late 1940s to the present. This paper also contains several salient and most important works in this timeline which leads us to where we currently stand in this field. The review separates four eras in the history of Natural language Processing, each marked by a focus on machine translation, artificial intelligence impact, the adoption of a logico-grammatical style, and an attack on huge linguistic data. This paper helps to understand the historical aspects of Natural language processing and also inspires others to work and research in this domain.

332 translation was among the first NLP studies. Machine translation aims to create automatic computers that can analyse text, speech and convert it into different languages. We all speak, read, and write using language. We also think about the world in terms of words, we make plans in terms of words also we dream in terms of words and take decisions in terms of words. The impact of AI is just not limited to a single field, but instead extends to any field that can be imagined; this seems to be true for NLP. NLP is being used in a variety of sectors to make systems more durable and automated in order to meet future requirements. In 2011, Apple introduces Siri, a voice assistant that is the best achievement for the technology and beauty of NLP. After Siri, Google integrated "Google Assistant", a voice assistant to the operating system Android. This is the groundbreaking achievement of NLP. These voice assistants are as good as a human. If we say: "Hey Google, how is the weather?" Google will reply: "You may need sunglasses; it is 22°C (Sunny).". Not only that it can predict next week's weather. It is all happening because of the advanced research in NLP, AI, and Deep Learning. Recently, Deep learning is used to predict and enhance with NLP methods. Earlier deep learning algorithms failed to produce satisfactory results because of the significant processing power required for deep learning implementation. NLP researchers have been improved a lot because of highperformance computers by which we can perform a lot of complex calculations, a steady increase of data and potential, and good algorithms. Many NLP models' foundations were proposed before the 90's decade. Several modern NLP researchers look into various ways to improve deep learning methodologies used in NLP, like using recurrent neural networks (RNNs) to predict the article's theme and selecting the very next word in a phrase. The primary goal of this paper is to give a simple understanding of how NLP starts its journey, how researchers developed modern NLP algorithms, the evolution of these NLP models. In this paper first section covers the basics of NLP is discussed and then the journey phases of NLP.
In the second section, I talk about some basic models to some advanced current models of NLP. The last section discusses the conclusion in NLP research and the future refinements.

II. NATURAL LANGUAGE PROCESSING
A. Classification of Natural Language Processing Natural Language Processing (NLP) is an optimized computational approach, tract of Artificial Intelligence and Linguistics for evaluating and understanding the statements or words and also analyse in order to achieve human-like language processing. NLP is created to communicate with computers as humans interact with each other.
Everyone does not know programming languages by which they can communicate with computers, but researchers want to create some algorithms so that computers can understand human language. This is the background of where NLP comes from.
Language consists of some discrete symbols. The base element for language is characters which is a representation of symbols. Words are formed by characters that imply some meaning of events, objects, actions, ideas, etc. All of these objects, events are based on some rules. NLP can be classified into some types shown in Figure.   Phonology is originated from the Greek prefix 'phono', which refers to sound, and the suffix 'logy', which simply means word. Russian linguist Nikolai Trubetzkoy stated in 1993 that the study of sound as it relates to a linguistic system. Although Lass argued in 1998 that phonology is properly involved with the function, behaviour, and organization of sounds as linguistic objects, it may well be understood as phonology legitimate is focused on the feature, behaviour patterns, and organization of sounds as linguistic elements. The semantic use of sound to encode meaning in any Human language is referred to as phonology.
Syntax defined the format of the sentence, which explains how words and phrases are used properly to generate a correct sentence. A sentence is not correct or meaningful until it is syntactically correct. Noam Chomsky gave a great example to understand about the syntax: 'Colourless green ideas sleep furiously' does not give any proper meaning though every word has its own meaning.
Morphology is basically an analogy of word representation which is also known as Morphemes.
Let's take the word "preoccupation", where the prefix is 'pre', the root is 'occupa', also the suffix is 'tion'. We understand the meaning of a word by breaking it into some morphemes. One word which has its own meaning is called Lexical morphemes and Lexical morphemes combined with words like 'ed', 'ing', etc.    In this phase, they developed first-stage tools.
Although in the last quarter of the 1960s, sending output to their clients MT production systems are used, the research of this era did not generate any technologies of scope.
We cannot say this era developed NLP but, in this era, the computer was used for language study. It was the beginning phase of NLP though there was a huge number of misconceptions, a lack of concern was present.   In this era, researchers understood that a realworld NLP model building was not so easy which can be accessed by a heavy application or system but if they focus on the utilitarian MT then it can lead them to a hope of the future of NLP. Before this era, researchers were finding different models to enhance the power of NLP models but in this era, they were more interested in the grammatical-logical phase to

European and Japanese researchers interested in MT.
Also, they started a project named "Eurota research project". Japanese multinational companies are interested to work in this field, so they started helping them financially.
In the 1990s, every researcher got interested in statistical models for NLP. Also, in this time These models can also learn from a few amounts of information and predict an outstanding output. Also, Reinforcement learning also helped these models to train and select the data.

IV. TEXT REPRESENTATION TECHNIQUES
Although a conclusion may review the main points of the paper, do not replicate the abstract as the

B. TF-IDF
In Bag of Words, we cannot distinguish the words between two words: 'Which value is more than other?' -this question's solution we cannot get from a bag of words. In semantic analysis the value of a word is more important, i.e. 'He is an intelligent boy.' Here 'intelligent' word has more value than the 'boy' word.
But in the Bag of Words, both the values give us the same weightage value of (1,1).
Term frequency of TF means in a given text: the measurement of repetition words appears. Because the lengths of the texts in the corpus vary, a word may appear more frequently in a longer text than in a shorter one. We divide the number of occurrences by the document's length to normalize the numbers.
The term's relevance in a corpus is measured using IDF (inverse document frequency). All terms are given equal weight when computing TF (weightage).
Stop words such as is, are, am, and others, on the other hand, are well-known for being unimportant, despite their widespread use. To adjust for these situations, IDF weights the terms that are relatively common across a corpus down and the rare terms up.
The frequency value of each word or TF-IDF score is the multiplication of TF and IDF.
If there are 3 sentences -"He is good boy.", "She is good girl.", "Boy and girl are good.". Frequency of the words are-Good(3), Boy(2), Girl (2).   TABLE III   TF   TABLE IIII   IDF   TABLE IVI TF-IDF

C. WORD2VEC
Word2Vec is one of the most used models in Word Embeddings. It is basically a mathematical approach to make a relationship between some similar words.
Each word is basically represented as a vector of 32 or more dimensions instead of a single number. King − Man + Woman = Queen () Let a 2D representation example where Man is in (3,6) and Woman is in (3.2,6.2) and word Play is in (6,4). So, here we can understand that the position of man and woman is very near rather than play which implies that man and woman are a respectively similar word than word play. Now if King has a position of (4,5) and Queen is (  The main problem of RNN is the dependency of an output weight which is input at the very initial state.
At the point of backpropagation when the weight became very less than the derivative of that weight is near tense to 0 which leads to an error. To solve this the LSTM or long short-term memory is used.
It is a type of RNN but used with some advanced features.    The first encoder receives the input sequence's word embeddings. After that, the data is converted and sent to the next encoder. All of the decoders in the decoder-stack get the result from the last encoder in the encoder-stack.
It's worth noting that, in addition to the selfattention and feed-forward layers, the decoders contain an additional layer called Encoder-Decoder Attention. This allows the decoder to concentrate on the relevant bits of the input sequence.  Natural language processing is used by most AI systems today, including Google Assistant, Netflix, Apple's Siri, and Grammarly. There is a theory that as Natural Language Processing and Biometrics improve, computers, such as humanoid robots, will be able to read facial expressions, body language, and speech, such as when a person is speaking face to face.
Because they can serve as the physical body for a programmed artificial soul, humanoid robots are required for this form of communication. NLP and Biometrics will be able to take Humanoid robot research to a whole new level as their popularity and accuracy grows, allowing them to communicate themselves through movement, postures, and expressions. As a result, despite any current limitations, we can expect many of these barriers to be dismantled in the coming years as new techniques and technology emerge on a daily basis. VII.