aae-recommender

Adversarial Autoencoders for Recommendation Tasks

Dependencies

torch
numpy
scipy
sklearn
gensim
pandas
joblib

If possible, numpy and scipy should be installed as system packages. The dependencies gensim and sklearn can be installed via pip. For pytorch, please refer to their installation instructions that depend on the python/CUDA setup you are working in.

To use pretreined word-embeddings, the word2vec Google News corpus should be download.

Installation

You can install this package and all necessary dependencies via pip.

pip install -e .

Running

The main.py file is an executable to run an evaluation of the specified models on the PubMed or EconBiz dataset (see the Concrete datasets section below). The dataset and year are mandatory arguments. The dataset is expected to be a path to a tsv-file, of which the format is described next.

The eval/aminer.py file is an executable to run an evaluation of the specified models on the AMiner datasets (see the Concrete datasets section below). The dataset and year are mandatory arguments. The dataset argument is expected to be either dblp or acm, and the DATA_PATH constant in the script needs to be set to the path to a folder which contains both datasets.

The eval/rcv.py file is an executable to run an evaluation of the specified models on the Reuters RCV1 dataset (see the Concrete datasets section below). The DATA_PATH constant in the script needs to be set to the path to a tsv-file, of which the format is described next.

Further scripts in the eval folder were used to perform experiments for other datasets which we are not allowed to redistribute (see the Concrete datasets section below).

Dataset Format

The expected dataset Format is a tab-separated with columns:

owner id of the document
set comma separated list of items
year year of the document
title of the document

The columns 'owner' and 'set' are expected to be the first two ones, since they are mandatory. An arbitrary number of supplementary information columns can follow. The current implementation, however, makes use of the year property for splitting the data into train and test sets. Also, title-enhanced recommendation models rely on the title property to be present.

The format of the ACM and DBLP datasets is described here.

Concrete datasets

We worked with the PubMed citations dataset from CITREC. We converted the provided SQL dumps into the dataset format above. The references in the CITREC TREC Genomics dataset are not disambiguated. Therefore we operate only the PubMed dataset for citation recommendation. For subject label recommendation, we used the the economics dataset EconBiz, provided by ZBW. The PubMed and EconBiz datasets are available here. For EconBiz, only titles are available and we are currently asserting that copyright issues do not prevent us from publishing the further metadata of the documents that we have used.

Further public datasets used were the DBLP-Citation-network V10 and ACM-Citation-network V9 datasets from the AMiner project, and the Reuters RCV1 corpora. We converted the provided XML dumps into the dataset format above, using the parse_reuters.py script.

We also run experiments with the Million Playlist Dataset (MPD), provided by Spotify, and IREON, provided by FIV, but we are not allowed to redistribute them. The MPD dataset was used only to participate to the RecSys Challenge 2018 (see more information here).

References and cite

Please see our papers for additional information on the models implemented and the experiments conducted:

If you use our code in your own work please cite one of these papers:

@article{Vagliano:2022,
    author    = {Iacopo Vagliano and
                 Lukas Galke and
                 Ansgar Scherp},
    title     = {Recommendations for Item Set Completion: On the Semantics of Item
                 Co-Occurrence With Data Sparsity, Input Size, and Input Modalities},
    journal   = {Inf Retrieval J},
    year      = {2022},
    publisher = {Springer Nature},
    url       = {https://doi.org/10.1007/s10791-022-09408-9},
    doi       = {10.1007/s10791-022-09408-9}
}

@inproceedings{Vagliano:2018,
     author = {Vagliano, Iacopo and Galke, Lukas and Mai, Florian and Scherp, Ansgar},
     title = {Using Adversarial Autoencoders for Multi-Modal Automatic Playlist Continuation},
     booktitle = {Proceedings of the ACM Recommender Systems Challenge 2018},
     series = {RecSys Challenge '18},
     year = {2018},
     isbn = {978-1-4503-6586-4},
     location = {Vancouver, BC, Canada},
     pages = {5:1--5:6},
     articleno = {5},
     numpages = {6},
     url = {http://doi.acm.org/10.1145/3267471.3267476},
     doi = {10.1145/3267471.3267476},
     acmid = {3267476},
     publisher = {ACM},
     address = {New York, NY, USA},
     keywords = {adversarial autoencoders, automatic playlist continuation, multi-modal recommender, music recommender systems, neural networks},
}

@inproceedings{Galke:2018,
     author = {Galke, Lukas and Mai, Florian and Vagliano, Iacopo and Scherp, Ansgar},
     title = {Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels},
     booktitle = {Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization},
     series = {UMAP '18},
     year = {2018},
     isbn = {978-1-4503-5589-6},
     location = {Singapore, Singapore},
     pages = {197--205},
     numpages = {9},
     url = {http://doi.acm.org/10.1145/3209219.3209236},
     doi = {10.1145/3209219.3209236},
     acmid = {3209236},
     publisher = {ACM},
     address = {New York, NY, USA},
     keywords = {adversarial autoencoders, multi-modal, neural networks, recommender systems, sparsity},
}

Name		Name	Last commit message	Last commit date
Latest commit History 542 Commits
aaerec		aaerec
eval		eval
irgan		irgan
results		results
tests		tests
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
compute_mi.bash		compute_mi.bash
compute_pairwise_mi.py		compute_pairwise_mi.py
main.py		main.py
main_pubmed_mesh.py		main_pubmed_mesh.py
main_pubmed_nomesh.py		main_pubmed_nomesh.py
mi.csv		mi.csv
mi.md		mi.md
nmi.csv		nmi.csv
nmi.txt		nmi.txt
pylintrc		pylintrc
requirements.txt		requirements.txt
setup.py		setup.py
split_on_set_size.py		split_on_set_size.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aae-recommender

Dependencies

Installation

Running

Dataset Format

Concrete datasets

References and cite

About

Releases

Packages

Contributors 3

Languages

License

lgalke/aae-recommender

Folders and files

Latest commit

History

Repository files navigation

aae-recommender

Dependencies

Installation

Running

Dataset Format

Concrete datasets

References and cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages