Diachronic normalisation of Polish texts

Transform old Polish texts into modern spelling. [ver. 1.0.0]

# submitter when ver. description dev-0 CharMatch dev-1 CharMatch test-A CharMatch
26 ked 2023-10-12 10:01 1.0.0 plt5-base_normalizer_test_pruned, no finetuning N/A N/A 0.0021
14 p/tlen 2022-07-07 06:18 1.0.0 Lucene Transducers ver. 0.25-SNAPSHOT extended=yes rule-based 1.0000 0.5508 0.5968
7 p/tlen 2022-07-07 06:18 1.0.0 Lucene Transducers ver. 0.25-SNAPSHOT extended=no rule-based 1.0000 0.6633 0.6662
17 p/tlen 2022-07-06 18:45 1.0.0 Lucene Transducers ver. 0.24 extended=yes rule-based 1.0000 0.5375 0.5839
6 p/tlen 2022-07-06 18:45 1.0.0 Lucene Transducers ver. 0.24 extended=no rule-based 1.0000 0.6633 0.6662
16 p/tlen 2022-07-06 18:35 1.0.0 Lucene Transducers ver. 0.24 extended=yes rule-based 1.0000 0.5375 0.5839
5 p/tlen 2022-07-06 18:35 1.0.0 Lucene Transducers ver. 0.24 extended=no rule-based 1.0000 0.6633 0.6662
15 p/tlen 2022-07-06 18:23 1.0.0 Lucene Transducers ver. 0.24-SNAPSHOT extended=yes rule-based 1.0000 0.5375 0.5839
4 p/tlen 2022-07-06 18:23 1.0.0 Lucene Transducers ver. 0.24-SNAPSHOT extended=no rule-based 1.0000 0.6633 0.6662
22 p/tlen 2022-07-06 18:21 1.0.0 Lucene Transducers ver. 0.24-SNAPSHOT extended=yes rule-based 1.0000 0.3570 0.4012
3 p/tlen 2022-07-06 18:21 1.0.0 Lucene Transducers ver. 0.24-SNAPSHOT extended=no rule-based 1.0000 0.6633 0.6662
19 p/tlen 2022-07-06 14:41 1.0.0 Lucene Transducers ver. 0.24-SNAPSHOT rule-based 1.0000 0.5328 0.5820
1 p/tlen 2022-02-24 19:47 1.0.0 Lucene Transducers ver. 0.23-SNAPSHOT rule-based 1.0000 0.6618 0.6732
2 p/tlen 2021-10-20 14:00 1.0.0 Lucene Transducers ver. 0.23-SNAPSHOT rule-based 1.0000 0.6628 0.6663
9 p/tlen 2021-10-20 11:02 1.0.0 Lucene Transducers ver. 0.22-SNAPSHOT rule-based 1.0000 0.6724 0.6580
8 [anonymized] 2021-08-15 13:19 1.0.0 0.22 use nosecondary option 1.0000 0.6724 0.6580
21 [anonymized] 2021-08-03 20:03 1.0.0 Lucene transducers 0.22 - move pairs to a separate file 1.0000 0.4064 0.4101
23 p/tlen 2020-04-22 19:16 1.0.0 PSI-Toolkit Diachroniser 2020 1.0000 0.2045 0.2934
10 p/tlen 2019-10-26 19:38 1.0.0 Lucene Transducers 0.21 1.0000 0.6724 0.6580
11 p/tlen 2019-10-19 20:13 1.0.0 Lucene Transducers 20 1.0000 0.6143 0.6189
24 p/tlen 2018-03-30 12:49 1.0.0 PSI-Toolkit better-diachronizer 1.0000 0.1375 0.1951
12 p/tlen 2018-03-17 11:07 1.0.0 use Lucene token filter with sub-word variants (v. 0.15) 1.0000 0.6110 0.6093
25 [anonymized] 2018-03-16 20:38 1.0.0 Raw normalization N/A 0.0150 0.0284
13 p/tlen 2018-03-16 13:25 1.0.0 use Lucene filter with words mined using word2vec (v. 0.14) 1.0000 0.6061 0.6031
18 p/tlen 2018-03-16 11:16 1.0.0 use Lucene filter without OCR fixes (v. 0.13) 1.0000 0.6122 0.5833
20 p/tlen 2018-03-15 20:55 1.0.0 use Lucene token filter (v. 0.12) 1.0000 0.5181 0.4656
27 p/tlen 2018-03-15 20:47 1.0.0 do nothing stupid 0.0000 0.0000 0.0000