tiendv / MCOCR2021

UIT AI Club Team At RIVF2021 MC-OCR Competition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🏆News: our team got the winner at Task 1 and reached the top 7 at Task 2 in MC-OCR 2021.

Mobile-Captured Image Document Recognition for Vietnamese Receipts 2021 (MC-OCR 2021)

This repository contains our source code of both Task 1 and Task 2 in the RIVF2021 MC-OCR Competition.

Introduction

Mobile captured receipts OCR (MC-OCR) is a process of recognizing text from structured and semi-structured receipts, and invoices in general captured by mobile devices. This process plays a critical role in the streamlining of document-intensive processes and office automation in many financial, accounting and taxation areas. However, MC-ORC faces big challenges due to the complexity of mobile captured images. First, receipts might be crumpled or the content might be blurred. Second, different from scanned images the quality of photos taken with mobile devices is very diverse because of the light condition and the dynamic environment (e.g., in-door, out-door, complex background, etc.), where the receipts were captured. These result in low quality of recognized information. To address them, in this challenge, we target two tasks including (1) evaluating the quality of the captured receipt, and (2) recognizing required fields of the receipt. The task hence is a multi-modal analysis task which can take advantages from both fields: computer vision and natural language processing which are two of the main interests of the RIVF community.

Detailed information of MC-OCR 2021 can be found here.

About

UIT AI Club Team At RIVF2021 MC-OCR Competition


Languages

Language:Python 88.0%Language:C++ 8.7%Language:Cuda 2.9%Language:Shell 0.3%Language:Dockerfile 0.1%Language:Makefile 0.0%Language:CMake 0.0%Language:Objective-C 0.0%