Diachronic equivalents

For a given Polish word, as used in a given year, give a diachronic equivalent (a.k.a. temporal word analogy) for a given year.

For instance, a task might be to give a 1910 equivalent of samolot, as used in 2010 (the right answer is aeroplan).

The challenge of temporal word analogies is defined as a list of independent items where each items is specified as a triple: word, its year, the target year. Expected output is an (unranked) set of words (in many cases just one word). The hypothesis (the output from a system for guessing diachronic equivalents) should be a ranked list of words (i.e. if the right word is returned at the top of the list, the evaluation score is higher).

Only single words occur in the data sets, there are no multi-word entities.

Source

The data set was compiled using:

Evaluation metric

MAP (Mean Average Precision) is used as the evaluation metric. If the system is perfect, the score is 1.0.

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • dev-0/ — directory with dev (test) data
  • dev-0/in.tsv — input data for the dev set (word, its year and the target year in each line)
  • dev-0/expected.tsv — expected (reference) data for the dev set
  • test-A — directory with test data
  • test-A/in.tsv — input data for the test set (word, its year and the target year in each line)
  • test-A/expected.tsv — expected (reference) data for the test set

No training set is supplied.