A Benchmark for Anomaly Detection in Natural Language Processing
To reproduce our experiments, you must clone the repository, download the data and then run the provided Python scripts, as explained below.
To download the preprocessed data, run the download_data.sh
script. It will pull the data from Google Drive and unzip it in the current directory. Althernatively, you can download the data manually at the following links:
- We will provide the links to the datasets after the anonimity period for the conference ends.
To redo our experiments, you must download the data and then run the following Python scripts:
run_baselines.py
, which trains and tests both OCSVM and Isolation Forest.run_cvdd.py
, which trains and tests CVDD.some_script.py
, to be added by Andrei.