OCR challenge for index cards

The goal of this task is to post-process the output from the Tesseract OCR engine. Alternatively, it could be treated as an OCR, as images are also available. [ver. 1.0.2]

Git repo URL: git://gonito.net/fiszki-ocr / Branch: master
Run git clone --single-branch git://gonito.net/fiszki-ocr -b master to get the challenge data
Browse at https://gonito.net/gitlist/fiszki-ocr.git/master

Leaderboard

# submitter when ver. description test-A CER test-A WER test-A CharMatch ×
1 s444415 2022-12-22 13:58 1.0.2 Donut fine tuned fine-tuned donut 0.459 0.694 0.622 6
2 s444415 2023-01-05 15:02 1.0.2 Donut fine tune with data from challange fine-tuned donut ocr 0.664 0.915 0.508 6
3 p/tlen 2021-04-09 16:01 1.0.1 Baseline - just rewrite Tesseract output baseline tesseract 0.463 0.786 0.425 1
4 s444415 2022-12-22 13:46 1.0.2 Donut proto model donut proto 1.079 1.089 0.377 6
5 s444415 2022-12-22 13:51 1.0.2 Donut base model base donut 1.995 2.459 0.206 6