Structured Detail Extraction from Government Documents

Shishir Kallapur; Shourya Sinha; Vinay Kumar; Shashank Singh

doi:10.32628/CSEIT12173177

Authors

Shishir Kallapur Department of Computer Science and Engineering, National Institute of Engineering, Mysore, India
Shourya Sinha Department of Computer Science and Engineering, National Institute of Engineering, Mysore, India
Vinay Kumar Department of Computer Science and Engineering, National Institute of Engineering, Mysore, India
Shashank Singh Department of Computer Science and Engineering, National Institute of Engineering, Mysore, India

DOI:

https://doi.org/10.32628/CSEIT12173177

Keywords:

GUI, JSON

Abstract

Text recognition has been one of the most active and challenging research areas in the field of image processing and pattern recognition. This paper presents a method to extract various important data from government documents and store it in a JSON file and link different documents of a person so that different data can be used when required. The approach we have taken here is that first the user will have to upload the scanned copy of the document into the GUI and the data is extracted from the photo. We made sure to capture all the necessary data from the image so that even if some of the data in the image is in circular shape, the software captures all the data and from the captured data, we select the important data and store it in a JSON file.

References

S. V. Rice, F.R. Jenkins, T.A. Nartker, The Fourth Annual Test of OCR Accuracy, Technical Report 95-03, Information Science Research Institute, University of Nevada, Las Vegas, July 1995
R.W. Smith, The Extraction and Recognition of Text from Multimedia Document Images, PhD Thesis, University of Bristol, November 1987
R. Smith, “A Simple and Efficient Skew Detection Algorithm via Text Row Accumulation”, Proc. of the 3rd Int. Conf. on Document Analysis and Recognition (Vol. 2), IEEE 1995, pp. 1145-1148
P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, Wiley-IEEE, 2003.
S.V. Rice, G. Nagy, T.A. Nartker, Optical Character Recognition: An Illustrated Guide to the Frontier, Kluwer Academic Publishers, USA 1999, pp. 57-60.
P.J. Schneider, “An Algorithm for Automatically Fitting Digitized Curves”, in A.S. Glassner, Graphics Gems I, Morgan Kaufmann, 1990, pp. 612-626.
R.J. Shillman, Character Recognition Based on Phenomenological Attributes: Theory and Methods, PhD. Thesis, Massachusetts Institute of Technology. 1974.
B.A. Blesser, T.T. Kuklinski, R.J. Shillman, “Empirical Tests for Feature Selection Based on a Pscychological Theory of Character Recognition”, Pattern Recognition 8(2), Elsevier, New York, 1976.
G. Nagy, “At the frontiers of OCR”, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp 1093-1100.
H.S. Baird, R. Fossey, “A 100-Font Classifier”, Proc. of the 1st Int. Conf. on Document Analysis and Recognition, IEEE, 1991, pp 332-340.

Structured Detail Extraction from Government Documents

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite