Clipping death notices

Clip a death notice in a Polish newspaper.

The metric is ClippEU, i.e. F2-score (F-measure with preference for recall).

Prepared by Łukasz Borchmann and Filip Graliński.

Directory structure

  • README.md — this file
  • config.txt — configuration file
  • dev-0/ — directory with dev (test) data
  • dev-0/in.tsv — input DjVu files for the dev set
  • dev-0/expected.tsv — expected (reference) data for the dev set
  • dev-0/dev-0-djvus.tar.gz - DjVu files listed in dev-0/in.tsv
  • test-A — directory with test data
  • test-A/in.tsv — input DjVu files for the test set
  • test-A/expected.tsv — expected (reference) data for the test set (hidden)
  • test-A/test-A-djvus.djvu - DjVu files listed in test-A/in.tsv
  • train/train.tsv - training data
  • train/train-djvus.tar.gz - DjVus for training data

Packages with DjVus (*-djvus.tar.gz) are available via git-annex or at http://ijbox.pl/nekrologi/DjVu_all.zip

Reference format

(For expected.tsv files.)

Each line describes expected clippings (obituaries) to be found in a corresponding DjVu file. Each expected clipping is specified as P/X0,Y0,X1,Y1/M, where:

  • P — DjVu page number (starting from 1)
  • X0, Y0, X1, Y1 — clipping coordinates (in pixels)
  • M — margin of error for each direction (in pixels)

Output format

(for out.tsv files.)

Similar to the reference format, each line describes clippings found in a corresponding DjVu file. Each clipping should be given as P/X0,Y0,X1,Y1, where:

  • P — DjVu page number (starting from 1)
  • X0, Y0, X1, Y1 — clipping coordinates (in pixels)