-
Notifications
You must be signed in to change notification settings - Fork 110
Home
sobota edited this page Nov 5, 2017
·
19 revisions
Results of different publicly available embeddings calculated using this script.
-
Rows are sorted by summed ranking for each benchmark.
-
In case word is missing from embedding random vector is used. More principled way would be calculating intersection of vocabularies beforehand
-
Embeddings were trained on different corpuses (however most of them on some version of wikipedia dump with various preprocessing), this page doesn't claim to be any sort of serious benchmark of word embeddings. Please see for instance this paper by O. Levy et al. for a thorough exploratory analysis.
-
There are no good skip-gram or CBOW embeddings available online, so I excluded them from this table for now.
Sources of embeddings:
MEN | MTurk | RG65 | RW | SimLex999 | WS353 | WS353R | WS353S | MSR | SemEval2012_2 | AP | BLESS | Battig | ESSLI_1a | ESSLI_2b | ESSLI_2c | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LexVec which="commoncrawl-W+C" | 0.809 | 0.712 | 0.765 | 0.478 | 0.419 | 0.647 | 0.571 | 0.756 | 0.710 | 0.601 | 0.187 | 0.612 | 0.795 | 0.438 | 0.818 | 0.750 | 0.667 |
PDC dim=300 | 0.773 | 0.672 | 0.790 | 0.455 | 0.427 | 0.721 | 0.641 | 0.789 | 0.748 | 0.596 | 0.290 | 0.639 | 0.805 | 0.431 | 0.773 | 0.725 | 0.644 |
HDC dim=300 | 0.760 | 0.655 | 0.806 | 0.438 | 0.407 | 0.677 | 0.581 | 0.787 | 0.731 | 0.564 | 0.293 | 0.632 | 0.815 | 0.432 | 0.773 | 0.750 | 0.644 |
SG GoogleNews (word2vec) | 0.741 | 0.670 | 0.761 | 0.471 | 0.442 | 0.700 | 0.635 | 0.772 | 0.402 | 0.712 | 0.335 | 0.649 | 0.795 | 0.406 | 0.750 | 0.800 | 0.644 |
PDC dim=100 | 0.755 | 0.710 | 0.774 | 0.421 | 0.361 | 0.690 | 0.606 | 0.779 | 0.704 | 0.543 | 0.280 | 0.632 | 0.760 | 0.431 | 0.727 | 0.750 | 0.622 |
GloVe dim=300 corpus=common-crawl-42B | 0.736 | 0.645 | 0.817 | 0.376 | 0.374 | 0.553 | 0.473 | 0.669 | 0.750 | 0.702 | 0.306 | 0.622 | 0.785 | 0.451 | 0.795 | 0.750 | 0.578 |
GloVe dim=300 corpus=wiki-6B | 0.737 | 0.633 | 0.770 | 0.359 | 0.371 | 0.522 | 0.446 | 0.653 | 0.718 | 0.616 | 0.280 | 0.637 | 0.820 | 0.410 | 0.773 | 0.825 | 0.644 |
HDC dim=100 | 0.738 | 0.648 | 0.804 | 0.388 | 0.324 | 0.617 | 0.523 | 0.753 | 0.667 | 0.497 | 0.260 | 0.619 | 0.825 | 0.432 | 0.773 | 0.750 | 0.622 |
GloVe dim=200 corpus=wiki-6B | 0.710 | 0.620 | 0.713 | 0.331 | 0.340 | 0.489 | 0.418 | 0.615 | 0.698 | 0.596 | 0.274 | 0.634 | 0.810 | 0.423 | 0.773 | 0.725 | 0.622 |
PDC dim=50 | 0.720 | 0.700 | 0.763 | 0.390 | 0.309 | 0.637 | 0.543 | 0.741 | 0.579 | 0.369 | 0.241 | 0.617 | 0.760 | 0.426 | 0.682 | 0.750 | 0.556 |
GloVe dim=100 corpus=wiki-6B | 0.681 | 0.619 | 0.676 | 0.310 | 0.298 | 0.451 | 0.380 | 0.587 | 0.632 | 0.551 | 0.279 | 0.644 | 0.780 | 0.435 | 0.705 | 0.750 | 0.644 |
HDC dim=50 | 0.708 | 0.649 | 0.723 | 0.361 | 0.281 | 0.575 | 0.472 | 0.713 | 0.534 | 0.347 | 0.243 | 0.555 | 0.730 | 0.429 | 0.705 | 0.775 | 0.578 |
GloVe dim=50 corpus=wiki-6B | 0.652 | 0.619 | 0.595 | 0.285 | 0.265 | 0.419 | 0.348 | 0.554 | 0.462 | 0.356 | 0.251 | 0.634 | 0.725 | 0.391 | 0.773 | 0.750 | 0.600 |
GloVe dim=200 corpus=twitter-27B | 0.594 | 0.555 | 0.698 | 0.197 | 0.130 | 0.451 | 0.373 | 0.590 | 0.534 | 0.503 | 0.246 | 0.515 | 0.690 | 0.326 | 0.773 | 0.700 | 0.578 |
NMT which=FR | 0.492 | 0.464 | 0.590 | 0.301 | 0.460 | 0.488 | 0.444 | 0.572 | 0.212 | 0.434 | 0.251 | 0.420 | 0.445 | 0.165 | 0.568 | 0.700 | 0.644 |
GloVe dim=100 corpus=twitter-27B | 0.577 | 0.559 | 0.677 | 0.210 | 0.122 | 0.442 | 0.364 | 0.592 | 0.429 | 0.428 | 0.250 | 0.500 | 0.675 | 0.315 | 0.727 | 0.675 | 0.600 |
NMT which=DE | 0.492 | 0.464 | 0.590 | 0.301 | 0.460 | 0.488 | 0.444 | 0.572 | 0.212 | 0.434 | 0.251 | 0.415 | 0.445 | 0.165 | 0.568 | 0.700 | 0.622 |
GloVe dim=50 corpus=twitter-27B | 0.531 | 0.515 | 0.574 | 0.196 | 0.098 | 0.392 | 0.325 | 0.540 | 0.260 | 0.271 | 0.223 | 0.458 | 0.665 | 0.308 | 0.705 | 0.675 | 0.511 |
GloVe dim=25 corpus=twitter-27B | 0.444 | 0.481 | 0.503 | 0.173 | 0.073 | 0.307 | 0.235 | 0.458 | 0.111 | 0.116 | 0.209 | 0.453 | 0.545 | 0.267 | 0.659 | 0.700 | 0.489 |