OCR challenge for index cards

The goal of this task is to post-process the output from the Tesseract OCR engine. Alternatively, it could be treated as an OCR, as images are also available. [ver. 1.0.2]

This is a long list of all submissions, if you want to see only the best, click leaderboard.

#	submitter	when	ver.	description	dev-0 CER	dev-0 WER	dev-0 CharMatch	test-A CER	test-A WER	test-A CharMatch
4	s444415	2023-03-23 11:49	1.0.2	Donut trained on wikisource fine-tuned donut ocr	1.038	0.991	0.370	1.000	1.000	0.401
6	s444415	2023-03-21 08:35	1.0.2	Donut fine tuned wikisource yellow fine-tuned donut ocr	1.021	1.010	0.367	1.066	1.122	0.370
2	s444415	2023-01-05 15:02	1.0.2	Donut fine tune with data from challange fine-tuned donut ocr	0.557	0.741	0.537	0.664	0.915	0.508
1	s444415	2022-12-22 13:58	1.0.2	Donut fine tuned fine-tuned donut	0.486	0.813	0.585	0.459	0.694	0.622
7	s444415	2022-12-22 13:51	1.0.2	Donut base model base donut	1.624	1.973	0.231	1.995	2.459	0.206
5	s444415	2022-12-22 13:46	1.0.2	Donut proto model donut proto	1.054	1.080	0.359	1.079	1.089	0.377
3	p/tlen	2021-04-09 16:01	1.0.1	Baseline - just rewrite Tesseract output baseline tesseract	0.432	0.745	0.422	0.463	0.786	0.425