Sentiment by emoticons challenge

Give the probability of a positive sentiment for a short Polish text.

The corpus was created using emoticons: it was assumed that a positive emoticon (e.g. :-)) entails positive sentiment, whereas a negative emoticon (e.g. :-() — a negative sentiment. The emoticons were replaced by <EMOTICON> (so, actually, the challenge is to guess the sentiment at the place where an emoticon was used).

The data sets were prepared using the Common Crawl corpus. The class are balanced (50%/50%).

Log loss is used as the metric.

Classes

1 — positive sentiment
0 — negative sentiment

Directory structure

README.md — this file
config.txt — configuration file
train/ — directory with training data
train/in.tsv.xz — train set - input (compressed using xz)
train/expected.tsv — train set - expected output (compressed using xz)
train/meta.tsv.xz — metadata (do not use during training — this is just for a reference)
dev-0/ — directory with dev (test) data
dev-0/in.tsv — input data for the dev set (text fragments)
dev-0/expected.tsv — expected (reference) data for the dev set
dev-0/meta.tsv — metadata (not used during testing)
test-A — directory with test data
test-A/in.tsv — input data for the test set (text fragments)
test-A/expected.tsv — expected (reference) data for the test set (hidden)