Structured Detail Extraction from Government Documents

Authors

  • Shishir Kallapur  Department of Computer Science and Engineering, National Institute of Engineering, Mysore, India
  • Shourya Sinha  Department of Computer Science and Engineering, National Institute of Engineering, Mysore, India
  • Vinay Kumar  Department of Computer Science and Engineering, National Institute of Engineering, Mysore, India
  • Shashank Singh  Department of Computer Science and Engineering, National Institute of Engineering, Mysore, India

DOI:

https://doi.org//10.32628/CSEIT12173177

Keywords:

GUI, JSON

Abstract

Text recognition has been one of the most active and challenging research areas in the field of image processing and pattern recognition. This paper presents a method to extract various important data from government documents and store it in a JSON file and link different documents of a person so that different data can be used when required. The approach we have taken here is that first the user will have to upload the scanned copy of the document into the GUI and the data is extracted from the photo. We made sure to capture all the necessary data from the image so that even if some of the data in the image is in circular shape, the software captures all the data and from the captured data, we select the important data and store it in a JSON file.

References

  1. S. V. Rice, F.R. Jenkins, T.A. Nartker, The Fourth Annual Test of OCR Accuracy, Technical Report 95-03, Information Science Research Institute, University of Nevada, Las Vegas, July 1995
  2. R.W. Smith, The Extraction and Recognition of Text from Multimedia Document Images, PhD Thesis, University of Bristol, November 1987
  3. R. Smith, “A Simple and Efficient Skew Detection Algorithm via Text Row Accumulation”, Proc. of the 3rd Int. Conf. on Document Analysis and Recognition (Vol. 2), IEEE 1995, pp. 1145-1148
  4. P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, Wiley-IEEE, 2003.
  5. S.V. Rice, G. Nagy, T.A. Nartker, Optical Character Recognition: An Illustrated Guide to the Frontier, Kluwer Academic Publishers, USA 1999, pp. 57-60.
  6. P.J. Schneider, “An Algorithm for Automatically Fitting Digitized Curves”, in A.S. Glassner, Graphics Gems I, Morgan Kaufmann, 1990, pp. 612-626.
  7. R.J. Shillman, Character Recognition Based on Phenomenological Attributes: Theory and Methods, PhD. Thesis, Massachusetts Institute of Technology. 1974.
  8. B.A. Blesser, T.T. Kuklinski, R.J. Shillman, “Empirical Tests for Feature Selection Based on a Pscychological Theory of Character Recognition”, Pattern Recognition 8(2), Elsevier, New York, 1976.
  9. G. Nagy, “At the frontiers of OCR”, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp 1093-1100.
  10. H.S. Baird, R. Fossey, “A 100-Font Classifier”, Proc. of the 1st Int. Conf. on Document Analysis and Recognition, IEEE, 1991, pp 332-340.

Downloads

Published

2021-06-30

Issue

Section

Research Articles

How to Cite

[1]
Shishir Kallapur, Shourya Sinha, Vinay Kumar, Shashank Singh, " Structured Detail Extraction from Government Documents, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 7, Issue 3, pp.610-613, May-June-2021. Available at doi : https://doi.org/10.32628/CSEIT12173177