For a given Polish word, as used in a given year, give a diachronic equivalent (a.k.a. temporal word analogy) for a given year.
For instance, a task might be to give a 1910 equivalent of samolot, as used in 2010 (the right answer is aeroplan).
The challenge of temporal word analogies is defined as a list of independent items where each items is specified as a triple: word, its year, the target year. Expected output is an (unranked) set of words (in many cases just one word). The hypothesis (the output from a system for guessing diachronic equivalents) should be a ranked list of words (i.e. if the right word is returned at the top of the list, the evaluation score is higher).
Only single words occur in the data sets, there are no multi-word entities.
The data set was compiled using:
- Doroszewski's dictionary of Polish,
- Słownik Wyrazów Zapomnianych,
- manually entered words.
MAP (Mean Average Precision) is used as the evaluation metric. If the system is perfect, the score is 1.0.
README.md— this file
config.txt— configuration file
dev-0/— directory with dev (test) data
dev-0/in.tsv— input data for the dev set (word, its year and the target year in each line)
dev-0/expected.tsv— expected (reference) data for the dev set
test-A— directory with test data
test-A/in.tsv— input data for the test set (word, its year and the target year in each line)
test-A/expected.tsv— expected (reference) data for the test set
No training set is supplied.