Diachronic normalisation of Polish texts

Transform old Polish texts into modern spelling.

CharMatch metric is used here, i.e. F-score for expected corrections (i.e. changes between the input text and the expected output).

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • dev-0/ — directory with dev (test) data
  • dev-0/in.tsv — input text for the dev set
  • dev-0/expected.tsv — reference text for the dev set
  • dev-1/ — directory with another dev (test) set
  • dev-1/in.tsv — input text for the dev set
  • dev-1/expected.tsv — reference text for the dev set
  • test-A — directory with test data
  • test-A/in.tsv — input data for the test set
  • test-A/expected.tsv — reference text for the test set