"He Said She Said" classification challenge

Guess whether a text in Polish was written by a man or woman.

submitter when description dev-0/Accuracy test-A/Accuracy
[anonymised] 2017-02-27 17:02 moje rozwiazanie 1 N/A N/A
[anonymised] 2016-06-23 09:01 3gram model KenLM + stemming 0.668890955905151 0.656475669042337
nozdi 2016-06-19 20:27 New 3 best voting 0.719267737021609 0.70309623825931
nozdi 2016-06-18 15:05 LSTM + ReLU + Softmax 0.699990563621413 0.693225798838911
nozdi 2016-06-18 09:25 RNN + more layers + ReLU 0.702632749625915 0.696541511304102
[anonymised] 2016-06-17 21:37 KenLM + stemming + 2grams 0.667212628570658 0.656139377920423
nozdi 2016-06-17 15:47 RNN + more layers 0.706798236745258 0.70309623825931
nozdi 2016-06-17 09:52 3 best voting 0.714279936911069 0.704700995893708
[anonymised] 2016-06-17 09:05 3gram model test-A KenLM + stemming 0.667219368841078 0.656475669042337
[anonymised] 2016-06-17 09:03 2gram model KenLM + stemming testA 0.667219368841078 0.656080379477982
nozdi 2016-06-17 09:00 RNN - LSTM - 2 layers - 3 epch 0.692940240762459 0.687296455373578
[anonymised] 2016-06-17 08:59 2gram model KenLM + stemming 0.667219368841078 N/A
[anonymised] 2016-06-16 09:56 KenLM + stemming + 3grams 0.671674687588466 N/A
nozdi 2016-06-13 13:36 Both directions LSTM 0.680457259945269 0.673384622645962
nozdi 2016-06-09 12:42 1M words + CONVOLUTION + LSTM 0.654979037758995 0.644109595506679
nozdi 2016-06-06 22:21 LSTM + 1M words 0.694578126474434 0.68615188559022
nozdi 2016-06-04 15:54 RNN - GRU & 5 epoch & 50k words 0.619821787250105 0.610386085807335
p/tlen 2016-05-30 20:51 simple NN trained on all (3 passes) with logistic regression 0.65064504387916 0.645265964978525
p/tlen 2016-05-30 19:40 simple NN trained on all (3 passes) 0.647173804613041 0.64280572992873
p/tlen 2016-05-30 10:41 simple NN train on all 0.646250387565549 0.640817482418464
p/tlen 2016-05-30 07:48 simple NN trained on 1M utterances 0.641019937719901 0.634840940199179
p/tlen 2016-05-30 07:46 simple NN with 1M utterances 0.641019937719901 N/A
p/tlen 2016-05-29 20:09 skeleton for NN solutions 0.4042949003114 0.408269221692547
nozdi 2016-05-25 06:51 Doc2vec + 50k words + LR 0.667259810463596 0.659702883843867
nozdi 2016-05-24 19:09 Doc2vec + LR 0.603854086625955 0.584462170198707
asdf 2016-05-24 17:13 lemma + nozdi naive bayes 0.660438656798911 0.639885307027894
Marek 2016-05-24 11:55 klon rozwiazania Mateusza + RandomForestClassifier 0.67275987112603 0.652923962807382
nozdi 2016-05-23 19:11 Logistic Regression + Hashing Vectorizer - in memory 0.680012402097572 0.670818190399773
nozdi 2016-05-23 18:19 ANN + 3gram + hashing vectorizer 0.660236448686321 0.656705762967858
Jacek 2016-05-23 16:04 Another 100k sample with RandomForestClassifier 0.553261616856068 0.553499787605607
Katarzyna 2016-05-23 13:20 more iterations + randomized start 0.621466413232499 0.613518903100958
Marta 2016-05-23 04:15 nozdi Naive Bayes + Tfidf + swear words + emoticons 0.67639961715264 0.6583164204465
Marta 2016-05-22 13:15 nozdi Naive Bayes + Tfidf + swear words + emoticons v2 0.676372656070962 0.658304620758012
Jacek 2016-05-18 20:07 100k sample TFIDF + RFC 0.557582130195063 0.558815547269552
Jacek 2016-05-18 18:21 100k sample CV + RFC 0.559354821315431 0.558089866427526
Jacek 2016-05-18 14:16 1mln sample with RandomForestClassifier 0.580188997182567 0.583683390758484
Marta 2016-05-17 05:14 nozdi Naive Bayes + Tfidf + swear words + emoticons 0.676372656070962 0.658304620758012
Jacek 2016-05-16 21:54 400k sample RandomTreeClassifier 0.571756918887586 0.571382215509511
Jacek 2016-05-16 19:44 200k sample size with RandomForestClassifier 0.565400843881857 0.565045782791334
Jacek 2016-05-16 16:08 100k samle with RandomForestClassifier 0.635937773823486 0.619176853731061
p/tlen 2016-05-15 18:31 VW tokens + 3-gram LM (+fix for the latest grep) 0.72310295089039 0.710777835465144
Maxi 2016-05-14 14:55 Added naive_bayes.py 0.620145320230248 0.600698541558503
Maxi 2016-05-14 14:44 char ngrams, sample train on 100k 0.620145320230248 0.600698541558503
Katarzyna 2016-05-09 16:36 Logistic regression (partial fit, iter_n=1, alpha=0.00005) 0.621311387012847 0.612911219143815
Katarzyna 2016-05-09 16:06 Logistic regression (partial fit, iter_n=5) 0.608491392674674 0.602203001840751
Katarzyna 2016-05-08 21:35 Logistic regression (partial fit) 0.608808185384398 0.602309199037145
Katarzyna 2016-05-08 21:28 Logistic regression (partial fit) 0.608808185384398 0.602309199037145
Marta 2016-05-07 21:24 nozdi Naive Bayes + Tfidf 0.676372656070962 0.658304620758012
nozdi 2016-05-04 19:04 Ensemble Multinomial NB+ BernoulliNB 0.66736765479031 0.647348609996696
nozdi 2016-04-23 12:29 Naive bayes 0.672766611396449 0.652923962807382
nozdi 2016-04-23 09:10 Naive bayes 0.672766611396449 0.652923962807382
nozdi 2016-04-23 08:43 Naive bayes 0.672766611396449 0.631784820880729
nozdi 2016-04-22 09:21 Naive bayes with stop words 0.672382415982529 0.62931868598669
nozdi 2016-04-20 06:55 Naive bayes 0.664051441743843 0.646322037098221
nozdi 2016-04-19 17:57 Naive bayes 0.664051441743843 0.646322037098221
p/tlen 2016-03-24 22:01 Vowpal Wabbit -nn 6 on morphosyntactic tags 0.595024332376215 0.591742577995941
p/tlen 2016-03-24 21:40 6-gram LM on morphosyntactic tags 0.598711260295763 0.605825506206636
p/tlen 2016-03-24 21:21 6-gram LM on morphosyntactic tags 0.689570105552635 0.605825506206636
p/tlen 2016-02-20 12:48 VW tokens + 3-gram LM 0.723204054946684 0.710665738424506
p/tlen 2016-02-19 21:30 3-gram language model 0.689570105552635 0.679862651625997
p/tlen 2016-02-19 20:28 VW tokens + 300 V2W classes 0.691086666397056 0.68508991362628
p/tlen 2016-02-19 13:15 VW tokens + 5-suffixes + NN 0.69100578315202 0.684299334497569
p/tlen 2016-02-19 07:51 VW tokens + 5-prefixes + NN 0.684407058411183 0.679290366734318
p/tlen 2016-02-18 20:05 Vowpal Wabbit on tokens only + small NN 0.688383817958776 0.683272761599094
p/tlen 2016-02-18 19:53 Vowpal Wabbit on tokens only 0.679351855596447 0.67549086704111
Veal 2016-02-15 18:27 Fixed source code, added makefile. 0.640291988514579 0.634050361070468
Veal 2016-02-15 18:16 Fixed source code, added makefile.' 0.613735323061161 0.609223816491245
Veal 2016-02-15 16:47 Logistic regression on words, punctuation n-grams, and suffixes. 0.613735323061161 0.609223816491245
Veal 2016-02-13 23:09 Logistic regression on words and punctuation n-grams. N/A 0.634050361070468
Veal 2016-02-13 23:00 Logistic regression on words and punctuation n-grams. N/A N/A
Veal 2016-02-12 09:24 Logistic regression, where features are unique, lowercased words longer than one character,truncated after the 5th if necessary. N/A 0.615743144380988
R.J. 2016-02-12 07:55 men only baseline 0.5 0.5
p/tlen 2016-02-11 22:10 use "leaks" 0.5099823404915 0.510389625713881
R.J. 2016-01-08 09:25 man only baseline 0.5 0.5
[anonymised] 2015-12-17 08:38 naive bayes 0.577344603065475 N/A
[anonymised] 2015-12-17 07:34 pliki zrodlowe w odp. folderach 0.626905811461156 0.628840798602917
[anonymised] 2015-12-16 21:17 Dodane kody zrodlowe 0.585277901349402 0.571913201491481
[anonymised] 2015-12-16 21:09 uhhh, test numer 3 0.585277901349402 0.571913201491481
[anonymised] 2015-12-16 18:56 dominik szczeszynski - proba 2 0.500990819751688 0.499150422428848
[anonymised] 2015-12-10 09:08 Piotr Mizerka kobieta_czy_mezczyzna Naiwny Bayes 0.626905811461156 0.499504413083495
Przemysław Nowaczyk 2015-12-10 09:05 naive bayes by Przemysław Nowaczyk kod zrodlowy i zasoby 0.666248769900648 0.646864822768679
[anonymised] 2015-12-10 09:05 Piotr Mizerka kobieta_czy_mezczyzna Naiwny Bayes 0.626905811461156 N/A
Przemysław Nowaczyk 2015-12-10 08:19 naive bayes by Przemysław Nowaczyk 0.666248769900648 0.646864822768679
[anonymised] 2015-12-10 07:48 dominik szczeszynski - solution 0.497782451031935 0.49843654127531