Release date: May 27, 2020
- Implemented pseudo-relevance classifier reranking technique.
- Added
TfidfVectorizer
to obtain vector representations of arbitrary documents from index. Verified that class works as expected by replicating classification demo on 20 Newsgroups with scikit-learn. - Added bindings to TREC COVID round 3 topics.
- Added script for CORD-19 length outlier detection.
- Added
__main__
topyserini.search
to perform TREC runs from the command line. - Fixed issues with computing BM25 term weights and query-document scores.
- Exposed access to basic index statistics in
IndexReaderUtils
.
Sorted by number of commits:
- Johnson Han (x65han)
- Jimmy Lin (lintool)
- Yuqi Liu (yuki617)
- Pepijn Boers (PepijnBoers)
- Tim Hatch (thatch)
- Stephanie Hu (stephaniewhoo)
Sorted by number of commits, according to GitHub:
- Jimmy Lin (lintool)
- Johnson Han (x65han)
- Zeynep Akkalyoncu Yilmaz (zeynepakkalyoncu)
- Yuqi Liu (yuki617)
- Chris Kamphuis (Chriskamphuis)
- Tommaso Teofili (tteofili)
- Pepijn Boers (PepijnBoers)
- Stephanie Hu (stephaniewhoo)
- Tim Hatch (thatch)
- Rodrigo Nogueira (rodrigonogueira4)
- Alireza Mirzaeiyan (amirzaeiyan)