DHQ Recommendation Systems

This repository contains scripts for creating three separate paper recommendation systems for the journal Digital Humanities Quarterly (DHQ):

Keyword-based Recommendations: Recommendations are based on the DHQ Classification Scheme, an editor-assigned controlled vocabulary comprising 88 terms, such as #gender and #machine_learning.
BM25 Recommendations: Recommendations are based on the full text (i.e., a concatenation of title, abstract, and body text without references) using the BM25 algorithm.
SPECTER2-based Recommendations: Recommendations are generated using the hidden states from SPECTER2, based on the paper's title and abstract.

Use

To get a set of the most updated recommendations when changes have been introduced to the DHQ-journal repository, please click Run workflow at GitHub Actions. It will also automatically update the tsv files every 1st and 15th at midnight if dhq-journal repository changes.

The ten most similar article IDs for each of the systems are documented in

Workflow

Manual or automatic recommendation works as follows:

Initialize the official DHQ repository as a submodule.
Extract relevant elements from DHQ papers in TEI format, with the keyword-based recommendation system primarily focusing on dhq_keywords, and the full text-based recommendation system extracting the title, abstract, and body text as well. Papers in the editorial process are not considered.
Construct a similarity matrix for generating recommendations.
Retrieve the most similar papers from the similarity matrix, utilizing a random seed to handle ties.

Reproduction

Click to expand

To reproduce the recommendations on your own machine (not recommended for production), please use the following commands:

# clone the repository and navigate into the directory
git clone https://github.com/Wang-Haining/DHQ-similar-papers.git
cd DHQ-similar-papers

# initialize and update submodules (dhq-journal)
git submodule update --init --remote

# set up a virtual environment
python3.10 -m venv .venv
source .venv/bin/activate

# install dependencies
python -m pip install -r requirements.txt

# execute the keyword-based recommendation
python -m run_kwd_recs

# execute the BM25 recommendation
python -m run_bm25_recs

# execute the SPECTER2 recommendation
python -m run_spctr_recs

License

This code and recommendation files are dedicated to the public domain under the CC0 1.0 Universal Public Domain Dedication, allowing unrestricted use, modification, and distribution.

Author

The Digital Humanities Quarterly Data Analytics Team

Contribution

Please open a ticket for any issues or suggestions, thank you!

History

Click to expand

v0.0.5
- Streamlined utilities.
- Added a pipeline for recs recalculation regardless of submodule updates.
v0.0.4
- Ignored remembrance pieces in recommendations.
- Added unit tests.
- Dumped annoy for spctr method.
- Added a rule to Actions to run tests before commit.
v0.0.3
- Merged Ben's SPECTER method.
- Added CI pipeline with Actions.
- Improved module/var naming.
- Updated data files.
v0.0.2:
- Implemented the full text-based recommendation system.
- Included logic for removing papers in the editorial process.
- Refactored the keyword-based recommendation system.
- Updated data files for both systems.
v0.0.1:
- Implemented the keyword-based recommendation system.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github/workflows		.github/workflows
dhq-journal @ 28d1483		dhq-journal @ 28d1483
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
__init__.py		__init__.py
dhq-recs-zfill-bm25.tsv		dhq-recs-zfill-bm25.tsv
dhq-recs-zfill-kwd.tsv		dhq-recs-zfill-kwd.tsv
dhq-recs-zfill-spctr.tsv		dhq-recs-zfill-spctr.tsv
requirements.txt		requirements.txt
run_bm25_recs.py		run_bm25_recs.py
run_kwd_recs.py		run_kwd_recs.py
run_spctr_recs.py		run_spctr_recs.py
tests.py		tests.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DHQ Recommendation Systems

Use

Workflow

Reproduction

License

Author

Contribution

History

About

Releases

Packages

Contributors 5

Languages

Digital-Humanities-Quarterly/DHQ-similar-papers

Folders and files

Latest commit

History

Repository files navigation

DHQ Recommendation Systems

Use

Workflow

Reproduction

License

Author

Contribution

History

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages