Searching for Legal Clauses by Analogy. Few-shot Contract Discovery Shared Task

The aim of this task is to provide a substrings of requested document representing clauses analogous (semantically and formally equivalent) to provided examples from other documents.

Subsets of Corporate Bond and Non-disclosure Agreement documents from US Edgar and Charity Annual Reports form UK Charity Register were annotated, in a way clauses of the same type are selected (e.g. determining governing law, clause types depend on type of legal act). Clauses can consist of single sentence, multiple sentences or sentence parts. The exact type of clause is not important during the evaluation, since no full-featured training is allowed and one have to use solely a set of few sample clauses during execution.

Input file consists of up to 6 tab-separated fields, eg.:

| ID of document to search in | Entity considered | Example #1 | ... | Example #N | |-----------------------------|----------------------|----------------------|--------------------|----------------------| | NDA_057 | governing-law | NDA_059 15215-15453 | NDA_033 7890-8032 | NDA_009 12797-13364 |

Each example consists of document ID (NDA_059, NDA_033, NDA_009) and characters range (15215-15453 and so on). Ranges can be discontinuous. In such a case their parts are distinguished with colon, eg. 4103-4882,12127-12971.

Expected file contains one answer per line, consisting of entity name (to be copied from input) and characters range in the same format as described above.

Reference file contains 2 tab-separated fields: document id and its content.

The metric used is Soft-F1.

Directory structure

  • — this file
  • config.txt — configuration file
  • dev-0/ — directory with dev (test) data
  • dev-0/in.tsv — input data for the dev set
  • dev-0/expected.tsv — expected (reference) data for the dev set
  • dev-0/reference.tsv.xz — file with documents considered in dev set
  • test-A — directory with test data
  • test-A/in.tsv — input data for the test set
  • test-A/reference.tsv.xz — file with documents considered in test set