Skip to content

Commit

Permalink
chore: open-sourcing rxn-neb.
Browse files Browse the repository at this point in the history
Signed-off-by: Matteo Manica <[email protected]>
  • Loading branch information
drugilsberg committed Feb 29, 2024
1 parent 1026584 commit 688218b
Show file tree
Hide file tree
Showing 19 changed files with 6,804 additions and 0 deletions.
20 changes: 20 additions & 0 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: "Running tests: ruff stylinh"

on: [push, pull_request]

jobs:
tests:
runs-on: ubuntu-latest
name: Style, mypy, pytest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.8
uses: actions/setup-python@v3
with:
python-version: 3.8
- name: Install poetry
run: pip install poetry==1.7.1
- name: Install Dependencies
run: poetry install
- name: Check style
run: poetry run ruff check . && poetry run ruff format --check .
127 changes: 127 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
src/rxn/neb/.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
../.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
conda_env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

# PyCharm
.idea/

# VSCode
.vscode/

# Pre-commit configuration
# .pre-commit-config.yaml

# Apple macOS
.DS_Store

# ruff
.ruff_cache

# custom
check_test.py
retrosynthesis_check_test_ref.json
test_retro.json
sandbox
rxnfp
51 changes: 51 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# rxn-neb

## Setup

To install the package run:

```console
poetry install
```

## Pre-process data for running rxn-neb

Here we assume to start from a JSON file reporting synthesis trees characterized by reactions SMILES represented using a pre-order traversal. A sample is provided [here](./sample-data/reaction_trees.json).

Additionally we assume a model for reaction fingerprints compatible with [`rxnfp`](https://github.com/rxn4chemistry/rxnfp) is available (see the repo for instructions on how to train your own on public or proprietary data).

To get the default model used in RXN for Chemistry simply clone the repo:

```console
git clone https://github.com/rxn4chemistry/rxnfp.git
```

You can directly use the default model available at `./rxnfp/rxnfp/models/transformers/bert_ft`.

Prepare the fingerprints from available synthesis trees:

```console
generate-fingerprints --reaction_trees_path "./sample-data/reaction_trees.json" --fingerprints_model_path "./rxnfp/rxnfp/models/transformers/bert_ft" --generated_fingerprints_path "./sandbox/generated_fingerprints.csv"
```

Prepare the PCA model for fingerprint compression and related indexes:

```console
generate-pca-compression-and-indices --reaction_trees_path "./sample-data/reaction_trees.json" --fingerprints_path "./sandbox/generated_fingerprints.csv" --pca_model_filename "./sandbox/pca.pkl" --tree_data_dict_pca_filename "./sandbox/tree_data_dict_pca.pkl"
```

NOTE: these examples are creating a `sandbox` folder where all outputs are stored.

## Usage

We assume you have a pair of single-step forward and backward model trained using [`rxn-onmt-models`](https://github.com/rxn4chemistry/rxn-onmt-models) (see the repo for a detailed guide on how to train them on public or proprietary data).

```console
run-neb-retrosynthesis --product "NS(=O)(=O)c1nn(-c2ccccn2)cc1Br" \
--forward_model_path "/path/to/forward_model.pt" \
--backward_model_path "/path/to/backward_model.pt" \
--fingerprints_model_path "./rxnfp/rxnfp/models/transformers/bert_ft" \
--pca_model_filename "./sandbox/pca.pkl" \
--tree_data_dict_pca_filename "./sandbox/tree_data_dict_pca.pkl" \
--output_path ./test_retro.json
```
Loading

0 comments on commit 688218b

Please sign in to comment.