Sentiment by emoticons challenge
Give the probability of a positive sentiment for a short Polish text.
The corpus was created using emoticons: it was assumed that a positive
emoticon (e.g. :-)
) entails positive sentiment, whereas a negative
emoticon (e.g. :-(
) — a negative sentiment. The emoticons were
replaced by <EMOTICON>
(so, actually, the challenge is to guess the
sentiment at the place where an emoticon was used).
The data sets were prepared using the Common Crawl corpus. The class are balanced (50%/50%).
Log loss is used as the metric.
Classes
1
— positive sentiment0
— negative sentiment
Directory structure
README.md
— this fileconfig.txt
— configuration filetrain/
— directory with training datatrain/in.tsv.xz
— train set - input (compressed usingxz
)train/expected.tsv
— train set - expected output (compressed usingxz
)train/meta.tsv.xz
— metadata (do not use during training — this is just for a reference)dev-0/
— directory with dev (test) datadev-0/in.tsv
— input data for the dev set (text fragments)dev-0/expected.tsv
— expected (reference) data for the dev setdev-0/meta.tsv
— metadata (not used during testing)test-A
— directory with test datatest-A/in.tsv
— input data for the test set (text fragments)test-A/expected.tsv
— expected (reference) data for the test set (hidden)