Skip to content

Commit

Permalink
Merge pull request #2 from learningmatter-mit/refactoring
Browse files Browse the repository at this point in the history
Merge `refactoring` branch
  • Loading branch information
xiaochendu authored Jul 18, 2024
2 parents 0c22c69 + cdb65fa commit 4efaf92
Show file tree
Hide file tree
Showing 106 changed files with 23,031 additions and 8,485 deletions.
95 changes: 85 additions & 10 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,18 +1,93 @@
# Compiled source #
###################
*.com
*.class
*.dll
*.exe
*.o
*.pyc
*.so

# Packages #
############
# it's better to unpack these files and commit the raw source
# git has its own built in compression methods
*.7z
*.dmg
*.gz
*.iso
*.jar
*.rar
#*.tar
*.zip

# Logs and databases #
######################
*.log
*.sql
*.sqlite

# OS generated files #
######################
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# python/jupyter generated files
.ipynb_checkpoints
__pycache__
*.egg-info
.vscode
.pytest_cache
*runs*
*__pycache__*
*.pyc

# data files #
*.dat
*.data
*.xyz
*.pdb
*.csv
*.pkl
*.txt
*.mpg
*.traj
*.pickle
*.cif
*.in
*.out
*.data
*.png
*.lammps
__pycache__/
*tmp_files/*
*.png

# slurm output files
slurm*.out

# directory
log/
debug/
sandbox*/
backup
dist/
sandbox_excited/
build/
*runs*
tmp_files/

# test files should still be committed
!tests/data/*/*.pkl
!tests/data/*/*.cif

# tutorial files should still be committed
!tutorials/data/*/*.pkl
!tutorials/data/*/*.cif
!tutorials/data/*/*.txt
!tutorials/*/*.txt
!tutorials/data/*/**/*.csv

# static files should still be committed
!site/static/**.png

# test file should still be committed
!tests/resources/*.pkl
!tests/resources/*.cif
# util files should still be committed
!mcmc/utils/data/colors.txt
2 changes: 1 addition & 1 deletion .isort.cfg
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
[settings]
known_third_party =ase,catkit,lammps,matplotlib,nff,numpy,pytest,scipy
known_third_party =ase,catkit,lammps,matplotlib,monty,nff,numpy,pandas,pymatgen,pytest,scipy,seaborn,sklearn,torch,tqdm,typing_extensions
26 changes: 9 additions & 17 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,13 @@ repos:
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/asottile/seed-isort-config
rev: v2.2.0
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.2
hooks:
- id: seed-isort-config
- repo: https://github.com/pre-commit/mirrors-isort
rev: v5.10.1
hooks:
- id: isort
args: ["--profile", "black"]
- repo: https://github.com/ambv/black
rev: 22.3.0
hooks:
- id: black
language_version: python3.8
# - repo: https://github.com/PyCQA/flake8
# rev: 4.0.1
# hooks:
# - id: flake8
- id: ruff
types_or: [ python, pyi, jupyter ]
args: [ --fix ]
exclude: migrations/
- id: ruff-format
types_or: [ python, pyi, jupyter ]
exclude: migrations/
159 changes: 109 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,148 @@
# Virtual Surface Site Relaxation-Monte Carlo (VSSR-MC)
[![Tests](https://github.com/learningmatter-mit/surface-sampling/actions/workflows/tests.yml/badge.svg)](https://github.com/learningmatter-mit/surface-sampling/actions/workflows/tests.yml)
[![arXiv](https://img.shields.io/badge/arXiv-2305.07251-blue?logo=arXiv&logoColor=white&logoSize=auto)](https://arxiv.org/abs/2305.07251)
[![Zenodo](https://img.shields.io/badge/data-10.5281/zenodo.7758174-14b8a6?logo=zenodo&logoColor=white&logoSize=auto)](https://zenodo.org/doi/10.5281/zenodo.7758174)

## Contents
- [Overview](#overview)
- [System requirements](#system-requirements)
- [Setup](#setup)
- [Demo](#demo)
- [Scripts](#scripts)
- [Citation](#citation)
- [Development & Bugs](#development--bugs)


# Overview
This is the VSSR-MC algorithm for sampling surface reconstructions. VSSR-MC samples across both compositional and configurational spaces. It can interface with both a neural network potential (through [ASE](https://wiki.fysik.dtu.dk/ase/)) or a classical potential (through ASE or [LAMMPS](https://www.lammps.org/)). It is a key component of the Automatic Surface Reconstruction (AutoSurfRecon) pipeline described in the following work: [Machine-learning-accelerated simulations to enable automatic surface reconstruction](https://doi.org/10.1038/s43588-023-00571-7).

This is the VSSR-MC algorithm for sampling surface reconstructions. VSSR-MC samples across both compositional and configurational spaces. It can interface with both a neural network potential (through ASE) or a classical potential (through ASE or LAMMPS). It is a key component of the Automatic Surface Reconstruction (AutoSurfRecon) pipeline described in the following work:

"Machine-learning-accelerated simulations to enable automatic surface reconstruction", by X. Du, J.K. Damewood, J.R. Lunger, R. Millan, B. Yildiz, L. Li, and R. Gómez-Bombarelli. https://doi.org/10.1038/s43588-023-00571-7

Please cite us if you find this work useful. Let us know in `issues` if you encounter any problems or have any questions.

To start, run `git clone [email protected]:learningmatter-mit/surface-sampling.git` to your local directory or a workstation.

Read through the following in order before running our code.
![Cover image](site/static/vssr_cover_image.png)

# System requirements

## Hardware requirements
We recommend a computer with the following specs:

- RAM: 16+ GB
- CPU: 4+ cores, 3 GHz/core

We tested out the code on machines with 6+ CPU cores @ 3.0+ GHz/core with 64+ GB of RAM.
To run with a neural network force field, a GPU is recommended. We ran on a single NVIDIA GeForce RTX 2080 Ti 11 GB GPU. The code has been tested on *Linux* Ubuntu 20.04.6 LTS but we expect it to work on other *Linux* distributions.

# Setup
To start, run `git clone [email protected]:learningmatter-mit/surface-sampling.git` to your local directory or a workstation.

## Conda environment
We recommend creating a new [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html) environment. Following that, the Python dependencies for the code can be installed. In the `surface-sampling` directory, run the following commands:
```bash
conda create -n vssr-mc python=3.11
conda activate vssr-mc
conda install -c conda-forge kimpy lammps openkim-models
pip install -e .
```
> If you're intending to contribute to the code, you can `pip install -e '.[dev]'` instead to also install the development dependencies.
To run with a neural network force field, a GPU is recommended. We ran on a single NVIDIA GeForce RTX 2080 Ti 11 GB GPU.
To run with LAMMPS, add the following to `~/.bashrc` or equivalent with appropriate paths and then `source ~/.bashrc`. `conda` would have installed LAMMPS as a dependency.
```bash
export LAMMPS_COMMAND="/path/to/lammps/src/lmp"
export LAMMPS_POTENTIALS="/path/to/lammps/potentials/"
export ASE_LAMMPSRUN_COMMAND="$LAMMPS_COMMAND"
```

## Software requirements
The code has been tested up to commit `02820d339eed6291b6af6ccb809f154ad6244110` on the `master` branch.
The `LAMMPS_COMMAND` should point to the LAMMPS executable, which can be found here: `/path/to/[vssr-mc-env]/bin/lmp`.
The `LAMMPS_POTENTIALS` directory should contain the LAMMPS potential files, which can found here: `/path/to/[surface-sampling-repo]/mcmc/potentials/`.
The `ASE_LAMMPSRUN_COMMAND` should point to the same LAMMPS executable. More information can be found here: [ASE LAMMPS](https://wiki.fysik.dtu.dk/ase/ase/calculators/lammpsrun.html).

### Operating system
This package has been tested on *Linux* Ubuntu 20.04.6 LTS but we expect it to be agnostic to the *Linux* system version.
If the `conda` installed LAMMPS does not work, you might have to install LAMMPS from source. More information can be found here: [LAMMPS](https://lammps.sandia.gov/doc/Build.html).

### Conda environment
[Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html) is required. Either Miniconda or Anaconda should be installed.
You might have to re-open/re-login to your terminal shell for the new settings to take effect.

Following that, the Python dependencies for the code can be installed with the following command
# Demo
A toy demo and other examples can be found in the `tutorials/` folder.
```
conda env create -f environment.yml
tutorials/
├── example.ipynb
├── GaN_0001.ipynb
├── Si_111_5x5.ipynb
├── SrTiO3_001.ipynb
├── latent_space_clustering.ipynb
└── tutorials/prepare_surface.ipynb
```
More data/examples can be found in our [Zenodo dataset](https://doi.org/10.5281/zenodo.7758174).

Installation might take 10-20 minutes to resolve dependencies.
## Toy example of Cu(100)
A toy example to illustrate the use of VSSR-MC. It should only take about a few seconds to run. Refer to `tutorials/example.ipynb`.

### Additional software
1. [LAMMPS](https://docs.lammps.org/Install.html) for classical force field optimization
2. [NFF](https://github.com/learningmatter-mit/NeuralForceField) for neural network force field
## GaN(0001) surface sampling with Tersoff potential
This example could take a few minutes to run. Refer to `tutorials/GaN_0001.ipynb`.

# Setup
Assuming you have cloned our `surface-sampling` repo to `/path/to/surface-sampling`.
## Si(111) 5x5 surface sampling with modified Stillinger–Weber potential
This example could take a few minutes to run. Refer to `tutorials/Si_111_5x5.ipynb`.

Add the following to `~/.bashrc` or equivalent with appropriate paths and then `source ~/.bashrc`.
```
export SURFSAMPLINGDIR="/path/to/surface-sampling"
export PYTHONPATH="$SURFSAMPLINGDIR:$PYTHONPATH"
## SrTiO3(001) surface sampling with machine learning potential
Demonstrates the integration of VSSR-MC with a neural network force field. This example could take a few minutes to run. Refer to `tutorials/SrTiO3_001.ipynb`.

export LAMMPS_COMMAND="/path/to/lammps/src/lmp_serial"
export LAMMPS_POTENTIALS="/path/to/lammps/potentials/"
export ASE_LAMMPSRUN_COMMAND="$LAMMPS_COMMAND"
## Clustering MC-sampled surfaces in the latent space
Retrieves the neural network embeddings of VSSR-MC structures and performs clustering. This example should only take a minute to run. Refer to `tutorials/latent_space_clustering.ipynb`.

export NFFDIR="/path/to/NeuralForceField"
export PYTHONPATH=$NFFDIR:$PYTHONPATH
```
## Preparing surface from a bulk structure
This example demonstrates how to cut a surface from a bulk structure. Refer to `tutorials/prepare_surface.ipynb`.

You might have to re-open/re-login to your shell for the new settings to take effect.

# Demo
# Scripts
Scripts can be found in the `scripts/` folder, including:
```
scripts/
├── sample_surface.py
└── clustering.py
```

A toy demo and other examples can be found in the `tutorials/` folder. More data/examples can be found in our Zenodo dataset (https://doi.org/10.5281/zenodo.7758174).
The arguments for the scripts can be found by running `python scripts/sample_surface.py -h` or `python scripts/clustering.py -h`.

## Example usage:
### Original VSSR-MC with PaiNN model trained on SrTiO3(001) surfaces
```bash
python scripts/sample_surface.py --run_name "SrTiO3_001_painn" \
--starting_structure_path "tutorials/data/SrTiO3_001/SrTiO3_001_2x2_pristine_slab.pkl" \
--model_type "PaiNN" --model_paths "tutorials/data/SrTiO3_001/nff/model01/best_model" \
"tutorials/data/SrTiO3_001/nff/model02/best_model" \
"tutorials/data/SrTiO3_001/nff/model03/best_model" \
--settings_path "scripts/configs/sample_config_painn.json"
```

### Toy example of Cu(100)
A toy example to illustrate the use of VSSR-MC. It should only take about a minute to run. Refer to `tutorials/example.ipynb`.
### Pre-trained "foundational" CHGNet model on SrTiO3(001) surfaces
```bash
python scripts/sample_surface.py --run_name "SrTiO3_001_chgnet" \
--starting_structure_path "tutorials/data/SrTiO3_001/SrTiO3_001_2x2_pristine_slab.pkl" \
--model_type "CHGNetNFF" --settings_path "scripts/configs/sample_config_chgnet.json"
```

### GaN(0001) surface sampling with Tersoff potential
We explicitly generate surface sites using `pymatgen`. This example could take 5 minutes or more to run. Refer to `tutorials/GaN_0001.ipynb`.
### Latent space clustering
```bash
python scripts/clustering.py --file_paths "tutorials/data/SrTiO3_001/SrTiO3_001_2x2_mcmc_structures_100.pkl" \
--save_folder "SrTiO3_001/clustering" --nff_model_type "PaiNN" \
--nff_paths "tutorials/data/SrTiO3_001/nff/model01/best_model" \
"tutorials/data/SrTiO3_001/nff/model02/best_model" \
"tutorials/data/SrTiO3_001/nff/model03/best_model" \
--clustering_metric "force_std" --cutoff_criterion "distance" \
--clustering_cutoff 0.2 --nff_device "cuda"
```

### Si(111) 5x5 surface sampling with modified Stillinger–Weber potential
We explicitly generate surface sites using `pymatgen`. This example could take 5 minutes or more to run. Refer to `tutorials/Si_111_5x5.ipynb`.

### SrTiO3(001) surface sampling with machine learning potential
Demonstrates the integration of VSSR-MC with a neural network force field. This example could take 10 minutes or more to run. Refer to `tutorials/SrTiO3_001.ipynb`.
# Citation
```bib
@article{duMachinelearningacceleratedSimulationsEnable2023,
title = {Machine-Learning-Accelerated Simulations to Enable Automatic Surface Reconstruction},
author = {Du, Xiaochen and Damewood, James K. and Lunger, Jaclyn R. and Millan, Reisel and Yildiz, Bilge and Li, Lin and {G{\'o}mez-Bombarelli}, Rafael},
year = {2023},
month = dec,
journal = {Nature Computational Science},
pages = {1--11},
publisher = {Nature Publishing Group},
issn = {2662-8457},
doi = {10.1038/s43588-023-00571-7},
urldate = {2023-12-07},
keywords = {Computational methods,Computational science,Software,Surface chemistry}
}
```

### Clustering MC-sampled surfaces in the latent space
Retrieving the neural network embeddings of VSSR-MC structures and performing clustering. This example should only take a minute to run. Refer to `tutorials/latent_space_clustering.ipynb`.
# Development & Bugs
VSSR-MC is under active development, if you encounter any bugs in installation and usage,
please open an [issue](https://github.com/learningmatter-mit/surface-sampling/issues). We appreciate your contributions!
27 changes: 27 additions & 0 deletions citation.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
cff-version: 1.2.0
message: If you use this software, please cite it as below.
title: Machine-Learning-Accelerated Simulations to Enable Automatic Surface Reconstruction
authors:
- family-names: Du
given-names: Xiaochen
- family-names: Damewood
given-names: James K.
- family-names: Lunger
given-names: Jaclyn R.
- family-names: Millan
given-names: Reisel
- family-names: Yildiz
given-names: Bilge
- family-names: Li
given-names: Lin
- family-names: {G{\'o}mez-Bombarelli}
given-names: Rafael
date-released: 2023-11-08
repository-code: https://github.com/learningmatter-mit/surface-sampling
arxiv: https://arxiv.org/abs/2305.07251
doi: 10.1038/s43588-023-00571-7
type: software
keywords:
[monte carlo, neural network, force field, active learning]
version: 0.1.0 # replace with the version you use
journal: Nature Computational Science
26 changes: 3 additions & 23 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -1,28 +1,8 @@
name: surface_sampling
name:
- vssr-mc
channels:
- conda-forge
- pytorch
- nvidia
- defaults
dependencies:
- flake8
- python=3.8
- pytorch=2.0
- pytorch-cuda=11.7
- matplotlib
- numpy>=1.21.6,<=1.22.4
- pandas
- pre-commit
- pylint
- ipykernel
- notebook
- ase
- pymatgen=2023.5.10
- rdkit
- e3fp
- scikit-learn
- lammps
- kimpy
- openkim-models
- pip
- pip:
- git+https://github.com/SUNCAT-Center/CatKit.git
Loading

0 comments on commit 4efaf92

Please sign in to comment.