Clipping Obituaries

Clip an obituary in a Polish newspaper. (This is only a sample challenge!)

The metric is ClippEU, i.e. F2-score (F-measure with preference for recall).

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • dev-0/ — directory with dev (test) data
  • dev-0/in.tsv — input DjVu files for the dev set
  • dev-0/expected.tsv — expected (reference) data for the dev set
  • dev-0/*.djvu - DjVu files listed in dev-0/in.tsv
  • test-A — directory with test data
  • test-A/in.tsv — input DjVu files for the test set
  • test-A/expected.tsv — expected (reference) data for the test set (hidden)
  • test-A/*.djvu - DjVu files listed in test-A/in.tsv

Reference format

(For expected.tsv files.)

Each line describes expected clippings (obituaries) to be found in a corresponding DjVu file. Each expected clipping is specified as P/X0,Y0,X1,Y1/M, where:

  • P — DjVu page number (starting from 1)
  • X0, Y0, X1, Y1 — clipping coordinates (in pixels)
  • M — margin of error for each direction (in pixels)

Output format

(for out.tsv files.)

Similar to the reference format, each line describes clippings found in a corresponding DjVu file. Each clipping should be given as P/X0,Y0,X1,Y1, where:

  • P — DjVu page number (starting from 1)
  • X0, Y0, X1, Y1 — clipping coordinates (in pixels)