OCR challenge for index cards
The goal of this task is to post-process the output from the Tesseract OCR engine. Alternatively, it could be treated as an OCR, as images are also available. [ver. 1.0.2]
Git repo URL: git://gonito.net/fiszki-ocr / Branch: master
Run git clone --single-branch git://gonito.net/fiszki-ocr -b master to get the challenge data
Browse at https://gonito.net/gitlist/fiszki-ocr.git/master
Leaderboard
# | submitter | when | ver. | description | test-A CER | test-A WER | test-A CharMatch | × | |
---|---|---|---|---|---|---|---|---|---|
1 | s444415 | 2022-12-22 13:58 | 1.0.2 | Donut fine tuned fine-tuned donut | 0.459 | 0.694 | 0.622 | 6 | |
2 | s444415 | 2023-01-05 15:02 | 1.0.2 | Donut fine tune with data from challange fine-tuned donut ocr | 0.664 | 0.915 | 0.508 | 6 | |
3 | p/tlen | 2021-04-09 16:01 | 1.0.1 | Baseline - just rewrite Tesseract output baseline tesseract | 0.463 | 0.786 | 0.425 | 1 | |
4 | s444415 | 2022-12-22 13:46 | 1.0.2 | Donut proto model donut proto | 1.079 | 1.089 | 0.377 | 6 | |
5 | s444415 | 2022-12-22 13:51 | 1.0.2 | Donut base model base donut | 1.995 | 2.459 | 0.206 | 6 |