The Deep Learning-Based OCR System for Korean Word with Web Search Engine 


Vol. 48,  No. 9, pp. 1169-1174, Sep.  2023
10.7840/kics.2023.48.9.1169


PDF
  Abstract

Optical character recognition (OCR) is the technology that recognizes text in an image and converts it into text data. In foreign countries, OCR enables automated document processing. Since the recognition rate of Hangul is lower than that of English and Numbers, the OCR is not widely used in Korea. If the OCR accuracy of Hangul is improved, we expect an increase in work efficiency through OCR in Korea as well. In this paper, the OCR system was based on the convolutional neural network (CNN) to train Hangul, English, and Numbers. Subsequently, the process was implemented that distinguishes the complex words to complete Hangul characters, recognizes the complete Hangul characters, and converts them into text data. Additionally, to further improve the accuracy of the OCR system, search the text data in a web search engine, and verify the existence of modified words. If a modified word is found in the web search results, it is considered the correct recognition result and included in the final text data. We conducted a recognition rate measurement and found that the OCR system was able to accurately recognize up to 90.1% of characters in documents containing Hangul, English, and Numbers.

  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Related Articles
  Cite this article

[IEEE Style]

H. Jang, S. Goh, J. Lee, S. Park, "The Deep Learning-Based OCR System for Korean Word with Web Search Engine," The Journal of Korean Institute of Communications and Information Sciences, vol. 48, no. 9, pp. 1169-1174, 2023. DOI: 10.7840/kics.2023.48.9.1169.

[ACM Style]

Hyuksoo Jang, Sangho Goh, Jaehyun Lee, and Sungkwon Park. 2023. The Deep Learning-Based OCR System for Korean Word with Web Search Engine. The Journal of Korean Institute of Communications and Information Sciences, 48, 9, (2023), 1169-1174. DOI: 10.7840/kics.2023.48.9.1169.

[KICS Style]

Hyuksoo Jang, Sangho Goh, Jaehyun Lee, Sungkwon Park, "The Deep Learning-Based OCR System for Korean Word with Web Search Engine," The Journal of Korean Institute of Communications and Information Sciences, vol. 48, no. 9, pp. 1169-1174, 9. 2023. (https://doi.org/10.7840/kics.2023.48.9.1169)
Vol. 48, No. 9 Index