Projects

Text Understanding in Visual Data

Text recognition (traditionally called OCR) is crucial for visual understanding and reasoning in many application scenarios.

We focus on Arbitrary-Shaped Text detection and Recognition and related Natual Language Processing tasks for understanding these text.

MMOCR: A comprehensive toolbox for text detection, recognition and understanding. ACM MM 2021 Open Source Software Competition.
[code]

An open-source toolbox for text detection, recognition and understanding.

Text Detection

Fourier contour embedding for arbitrary-shaped text detection. CVPR 2021.
[code in MMOCR]

Geometry normalization networks for accurate scene text detection, rotated detection, oriented detection, sensetime, ocr

Geometry normalization networks for accurate scene text detection. ICCV 2019.
[github] [blog (in Chinese)]

A general framework for handling large geometry (scale and orientation) variances in scene text detection. Demonstrated performance on two state-of-the-art methods, EAST and PSENet.

Speeding up scene text detection, Guided CNN, Fast Scene Text Detector, sensetime, ocr

Boosting up scene text detectors with guided CNN. BMVC 2018. (Oral)
[paper]

A general framework for speeding up scene text detection. Demonstrated performance on two state-of-the-art methods, CTPN and EAST.

Text Recognition

Our team won the first runner-up in Scene text recognition (Chinese and Latin) Task of ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (ICDAR 2019-ArT). [results]

RobustScanner: Dynamically enhancing positional clues for robust text recognition, sensetime, ocr

RobustScanner: Dynamically enhancing positional clues for robust text recognition. ECCV 2020.
[paper] [blog (in Chinese)] [code in MMOCR]

RobustScanner has been used in solutions of 1st place winner of several challenges, such as CVPR 2021 RetailVision Product Pricing in the Wild Challenge (see report), ECCV 2022 Out of Vocabulary Scene Text Understanding (see report)

Stats