A Review on Handwritten Character Recognition Using Advanced Techniques

Due to its widespread use, handwriting recognition has drawn a lot of interest in the domains of pattern recognition and machine learning. The application domain for optical character recognition (OCR) and handwritten character recognition (HCR) is specific. For character recognition in a system for handwriting recognition, several strategies have been proposed. Despite this, a substantial number of studies and papers outline the methods for transforming the text of a paper document into a machine-readable format. Character recognition (CR) technology may be crucial in the near future in order to process and digitize existing paper documents in order to establish a paperless environment. This essay offers a thorough analysis of the handwritten character recognition field.


Introduction
The subject of pattern recognition has many practical applications, but character recognition is one of its most fundamental and difficult subfields. Due to the fact that it is a natural way for humans and computers to interact, it has been a highly active area of research since the early days of computer science. More specifically, character recognition is the process of identifying and extracting characters from an input image and converting them into an editable machine-readable format, such as ASCII.
Handwriting recognition system is a method that enables a computer to read characters and other symbols that are handwritten in human handwriting. Online and offline handwriting recognition are the two categories used to categorize handwriting recognition [6]. Offline handwriting recognition is the process of scanning handwriting and having the computer interpret it. Online handwriting recognition is the process of identifying handwriting when it is being written on a touch pad with a stylus pen. Character recognition systems are divided into two primary groups from the classifier's perspective: segmentation free (global) and segmentation based (analytic). The holistic method to character recognition without segmenting it into subunits or characters is often referred to as segmentation free. Algorithms for deep learning and machine learning have been extensively used in previous literature. Feature extraction is also incredibly important. For this objective, some common strategies include graph-based features, histograms, mathematical transforms, and moment-based features. Pre-processing, segmentation, representation, training, identification, and post-processing are a few of the essential phases in handwritten character recognition. In terms of practical applications, numerous mobile apps and web applications offer character recognition features to their users since end users continue to want better services that are technically defined as accurate. Character recognition presents a number of difficulties and obstacles, and our goal is to look at both existing and novel techniques in order to resolve the issue at hand. This study's contribution includes a comparison of various machine learning and deep learning methods for recognizing handwritten characters based on the dataset and method employed. The following is how the paper is organized: A detailed overview on the evolution of CR is provided in Section II. Methodologies adopted in character recognition system are covered in Section III followed by the discussion on various character recognition systems in Section IV. Conclusion and further work are included in Section V.

Evolution of Character Recognition
Writing, which has historically been the most natural method of gathering, storing, collecting, and distributing information, is currently used for both human-to-human and human-to-machine communication. In [7] the initial move has been made in the direction of CR by attempting to create a tool to assist the blind. Around the 1940s, the first character recognizer made its debut. The focus of the early works was either on mechanically printed text or on a limited number of clearly distinct handwritten text or symbols. Template matching was typically employed for machine-printed CR, and low-level image processing methods were applied to the binary picture to extract feature vectors, which were subsequently given to statistical classifiers [8]. [9] provides a useful overview of the CR strategies that were in use up to the 1980s. Due to the quick development of information technology, CR system development increased between 1980 and 1990 [10]. In various works, structural approaches were started along with statistical methods [6]. Effective primitive extraction is necessary for the syntactic and structural techniques [7]. Using structural method, Chan et al. [8] explained how to recognize handwriting online. The user provides a series of points that are utilized to remove the structural primitives and begin the recognition process. These primitives include several kinds of line segments and curves. However, there was a cap on the rate of recognition because the CR study mostly concentrated on shape recognition algorithms without utilizing any semantic data. For both online and offline techniques, a historical assessment of CR research and development from 1980 to 1990 can be found in [9]. Artificial intelligence was used to combine pattern recognition and image processing after 1990. Along with the development of powerful computers and more precise electronic devices like scanners, cameras, and tablets, effective approaches like neural networks (NNs), hidden Markov models (HMMs), fuzzy set reasoning, and natural language processing (NLP) came into use. Only limited applications were satisfied with the 1990s systems for machine-printed off-line [10] and limited vocabulary size, user-dependent on-line handwritten characters [11]. Detecting off-line cursive handwriting has been shown to be a difficult task, despite the fact that research on recognizing isolated handwritten characters has proved extremely successful.

Methodologies adopted in Character Recognition System
The approaches that can be used to construct different stages of the character recognition system are discussed in this section. Based on the methodologies being adopted in the state-of-the-art works, the CR system differs greatly. According to a survey of the literature in the field of CR, these hierarchical tasks are categorized according to the CR's stages, which are data development, pre-processing, segmentation, Volume 4, Issue 5, September-October 2022 3 feature extraction and classification. A feedback mechanism is employed in some approaches to update the outcome of each stage, while others combine or remove some of the stages. The standard methodology followed in CR system is seen in Fig. 1 followed by the detailed discussion. Figure 1: Birds overview of character recognition system

Dataset Development
According to the method of data collecting, the evolution of automatic character recognition systems has been divided into two categories:  Systems for online character recognition  Systems for offline character recognition While offline CR systems used optical scanners or cameras to collect data from documents, online character recognition systems used digitizers that directly record the pattern based on the knowledge of strokes, pen-up, pen-down and speed.

Pre-processing
Pre-processing is done to make the image better so that it can be utilized for additional processing [2], to make the input data more consistent, and to make it better suited for the following stage of the recognition system. This step involves a variety of techniques, including conversion from grayscale to binary, noise removal, binarization, normalization, etc. The input data are subjected to a grayscale shift, binary figuring, and subsequently an outcry reduction approach. After the grayscale and binary conversion, the researcher segmented the data using edge detection under the assumption of the results in [3]. When converting a grayscale image to a binary image, thresholding and Otsu's technique are frequently utilized.

Segmentation
Segmentation is a technique used to divide an input text data picture into lines and individual characters. The noise or outcast section of the data image moves. There are two types of segmentation: internal and exterior. The sentences are divided into paragraphs, lines, and words using external segmentation. On the other hand, internal segmentation is the division of the incoming text data into individual characters [1]. There are various segmentation algorithms available. Histogram profiles and linked component analysis are the fundamental techniques for line segmentation.

Feature Extraction
One of a recognition system's most crucial functions is image representation. The most straightforward method is to input binary or grayscale images to a recognizer. In most recognition systems, a more compact and characteristic representation is necessary in order to avoid the unnecessary complexity and increase algorithm accuracy. Because of this, a set of traits that aid to distinguish each class from other classes while being invariant to characteristic variations within the class are retrieved for each class [5].
The following is a breakdown of hundreds of document picture representation techniques into three main categories.
It is a technique used to gather various, extremely relevant information about an object or a collection of things so that, using the information gathered; we can twine the objects to discover new, untapped material. The strength of the representation of the raw data is a characteristic. These are some of the crucial feature extraction techniques that include zone-based, structural, mathematical, sliding window, chain code histogram, gradient feature, and hybrid methods [4]. The 4-neighbourhood and the 8neighbourhood are the two main chain code pathways.

Classification
An undefined sample is assigned to the pre-determined class in the categorization. As the attributes are extracted, the digits are classified and identified appropriately. The selection or approval process is generally advantageous for the decision-making process because the new character fits the class or has a similar appearance. This suggests that marking and assignment to marking occur during the categorization stage. The effective extraction and selection of information's key points is a constant requirement for classification information production. There are various order systems available, and each one is primarily based on picture preparation and artificial reasoning techniques. Template matching, Statistical techniques, structural techniques using neural networks, genetic approaches, and fuzzy logic based on soft computing are a few examples of categorization methods based on image processing. In the handwritten character recognition systems (HCR), machine learning (ML) has used a variety of techniques, including support vector machine, Naive Bayes, artificial neural networks (ANN), neurofuzzy, decision trees, nearest neighbor algorithms, etc. The human brain created the deep learning algorithm for using hierarchical level of artificial neural networks to deal with machine learning processes. It has gained momentum as a result of various hardware advancements and commutative researches on deep learning algorithms, including recurrent neural network [18], convolutional neural network [19], Auto-encoder [21], deep neural network, deep belief network [20], etc.

Character Recognition System
The handwritten character recognition system faces a number of difficulties. Fig. 2 depicts the stages of the handwritten recognition system. It can be divided into two categories: online and offline character recognition. A digital pen and tablet are used for online character recognition. Handwritten and printed character is recognized offline. Two broad categories, such as handwritten character recognition (HCR) and printed character recognition, can be divided into optical character recognition (PCR). PCR is less complex than HCR in comparison. Additionally, HCR is prevalent in both the offline and online recognition groups. The hierarchy of the optical character recognition system is depicted in Figure 2.
Online recognition is a real-time method that allows characters to be recognized as the user types. For offline identification, characters from previously published papers are stored in memory. Offline identification typically uses tools like optical scanners or images taken with a camera. Due to the lack of user pen strokes, offline identification is more difficult, less accurate, and has a lower degree of recognition. This offline handwritten character recognition system was created using a number of tools, including Python, Android, OpenCV, and TensorFlow [17]. Additionally, since we can recognize user pen strokes online, it is easier to use, more accurate, and has a higher recognition rate. Handwritten characters can be seen on several kinds. Segmentation and non-segmentation are relevant for written phrases. The next step is to pick features. The categorization process can be accelerated through optimization. A classification system for reading traits is therefore required. Finally, a trained model is employed for the intended tasks.

Conclusion and Future Work
The primary methodologies employed in the character recognition field during the past ten years are reviewed in this study. There is also discussion of various pre-processing, segmentation, and classifiers with diverse features. It is discovered that a complicated pattern cannot be solely represented by structural or statistical data. As a result, it is necessary to combine semantic information with quantitative and structural information. For many patterns identification issues, NNs or HMMs combine statistical and structural information quite successfully. Although they are somewhat robust to deformation, their end purpose in the classification stage can result in significant non matching. In the notion that an input plane is tested against a pattern restricted on and X-Y plane, template matching methods deal with a character as a whole. In future, instead of just adding the data available, the design of the training set should be approached methodically. Training sets should be large and contain random examples, including ones with bad writing. To improve character outputs, deep learning may eventually replace the handwritten character recognition system in future. The precision rate can be raised by using several feature extraction techniques. Additionally, it should be noted that larger data sets tend to perform better and provide the necessary precision.