Structured Detail Extraction from Government Documents
DOI:
https://doi.org/10.32628/CSEIT12173177Keywords:
GUI, JSONAbstract
Text recognition has been one of the most active and challenging research areas in the field of image processing and pattern recognition. This paper presents a method to extract various important data from government documents and store it in a JSON file and link different documents of a person so that different data can be used when required. The approach we have taken here is that first the user will have to upload the scanned copy of the document into the GUI and the data is extracted from the photo. We made sure to capture all the necessary data from the image so that even if some of the data in the image is in circular shape, the software captures all the data and from the captured data, we select the important data and store it in a JSON file.
References
- S. V. Rice, F.R. Jenkins, T.A. Nartker, The Fourth Annual Test of OCR Accuracy, Technical Report 95-03, Information Science Research Institute, University of Nevada, Las Vegas, July 1995
- R.W. Smith, The Extraction and Recognition of Text from Multimedia Document Images, PhD Thesis, University of Bristol, November 1987
- R. Smith, “A Simple and Efficient Skew Detection Algorithm via Text Row Accumulation”, Proc. of the 3rd Int. Conf. on Document Analysis and Recognition (Vol. 2), IEEE 1995, pp. 1145-1148
- P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, Wiley-IEEE, 2003.
- S.V. Rice, G. Nagy, T.A. Nartker, Optical Character Recognition: An Illustrated Guide to the Frontier, Kluwer Academic Publishers, USA 1999, pp. 57-60.
- P.J. Schneider, “An Algorithm for Automatically Fitting Digitized Curves”, in A.S. Glassner, Graphics Gems I, Morgan Kaufmann, 1990, pp. 612-626.
- R.J. Shillman, Character Recognition Based on Phenomenological Attributes: Theory and Methods, PhD. Thesis, Massachusetts Institute of Technology. 1974.
- B.A. Blesser, T.T. Kuklinski, R.J. Shillman, “Empirical Tests for Feature Selection Based on a Pscychological Theory of Character Recognition”, Pattern Recognition 8(2), Elsevier, New York, 1976.
- G. Nagy, “At the frontiers of OCR”, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp 1093-1100.
- H.S. Baird, R. Fossey, “A 100-Font Classifier”, Proc. of the 1st Int. Conf. on Document Analysis and Recognition, IEEE, 1991, pp 332-340.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.