deploy: dda32e7

FAIR-Chem · Apr 14, 2024 · 71b71e9 · 71b71e9
1 parent 7f773ef
commit 71b71e9
Show file tree

Hide file tree

Showing 32 changed files with 1,771 additions and 2,723 deletions.
diff --git a/_downloads/5fdddbed2260616231dbf7b0d94bb665/train.txt b/_downloads/5fdddbed2260616231dbf7b0d94bb665/train.txt
@@ -1,17 +1,17 @@
-2024-04-14 19:19:53 (INFO): Project root: /home/runner/work/ocp/ocp
+2024-04-14 21:25:50 (INFO): Project root: /home/runner/work/ocp/ocp
 /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
   warnings.warn(
-2024-04-14 19:19:54 (WARNING): Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
-2024-04-14 19:19:54 (INFO): amp: true
+2024-04-14 21:25:51 (WARNING): Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
+2024-04-14 21:25:51 (INFO): amp: true
 cmd:
-  checkpoint_dir: fine-tuning/checkpoints/2024-04-14-19-20-32-ft-oxides
-  commit: 87e111f
+  checkpoint_dir: fine-tuning/checkpoints/2024-04-14-21-26-24-ft-oxides
+  commit: dda32e7
   identifier: ft-oxides
-  logs_dir: fine-tuning/logs/wandb/2024-04-14-19-20-32-ft-oxides
+  logs_dir: fine-tuning/logs/tensorboard/2024-04-14-21-26-24-ft-oxides
   print_every: 10
-  results_dir: fine-tuning/results/2024-04-14-19-20-32-ft-oxides
+  results_dir: fine-tuning/results/2024-04-14-21-26-24-ft-oxides
   seed: 0
-  timestamp_id: 2024-04-14-19-20-32-ft-oxides
+  timestamp_id: 2024-04-14-21-26-24-ft-oxides
 dataset:
   a2g_args:
     r_energy: true
@@ -35,7 +35,7 @@ eval_metrics:
     misc:
     - energy_forces_within_threshold
 gpus: 0
-logger: wandb
+logger: tensorboard
 loss_fns:
 - energy:
     coefficient: 1
@@ -142,37 +142,42 @@ val_dataset:
     r_forces: true
   src: val.db
 
-wandb: ERROR api_key not configured (no-tty). call wandb.login(key=[your_api_key])
+2024-04-14 21:25:51 (INFO): Loading dataset: ase_db
+2024-04-14 21:25:51 (INFO): rank: 0: Sampler created...
+2024-04-14 21:25:51 (INFO): Batch balancing is disabled for single GPU training.
+2024-04-14 21:25:51 (INFO): rank: 0: Sampler created...
+2024-04-14 21:25:51 (INFO): Batch balancing is disabled for single GPU training.
+2024-04-14 21:25:51 (INFO): rank: 0: Sampler created...
+2024-04-14 21:25:51 (INFO): Batch balancing is disabled for single GPU training.
+2024-04-14 21:25:51 (INFO): Loading model: gemnet_oc
+2024-04-14 21:25:51 (WARNING): Unrecognized arguments: ['symmetric_edge_symmetrization']
+2024-04-14 21:25:54 (INFO): Loaded GemNetOC with 38864438 parameters.
+2024-04-14 21:25:54 (WARNING): Model gradient logging to tensorboard not yet supported.
+2024-04-14 21:25:54 (WARNING): Using `weight_decay` from `optim` instead of `optim.optimizer_params`.Please update your config to use `optim.optimizer_params.weight_decay`.`optim.weight_decay` will soon be deprecated.
+2024-04-14 21:25:54 (INFO): Loading checkpoint from: /tmp/ocp_checkpoints/gnoc_oc22_oc20_all_s2ef.pt
+2024-04-14 21:25:54 (INFO): Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint. 
+/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch_geometric/data/collate.py:145: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
+  storage = elem.storage()._new_shared(numel)
+/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch_geometric/data/collate.py:145: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
+  storage = elem.storage()._new_shared(numel)
+/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch/amp/autocast_mode.py:250: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
+  warnings.warn(
+2024-04-14 21:26:06 (INFO): Evaluating on val.
+device 0:   0%|          | 0/2 [00:00<?, ?it/s]/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch_geometric/data/collate.py:145: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
+  storage = elem.storage()._new_shared(numel)
+/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch_geometric/data/collate.py:145: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
+  storage = elem.storage()._new_shared(numel)
+device 0:  50%|█████     | 1/2 [00:04<00:04,  4.27s/it]device 0: 100%|██████████| 2/2 [00:06<00:00,  3.21s/it]device 0: 100%|██████████| 2/2 [00:06<00:00,  3.43s/it]
+2024-04-14 21:26:13 (INFO): energy_forces_within_threshold: 0.0000, energy_mae: 2.8244, forcesx_mae: 0.0080, forcesy_mae: 0.0105, forcesz_mae: 0.0081, forces_mae: 0.0089, forces_cosine_similarity: 0.1907, forces_magnitude_error: 0.0127, loss: 2.8302, epoch: 0.0667
 Traceback (most recent call last):
   File "/home/runner/work/ocp/ocp/main.py", line 89, in <module>
     Runner()(config)
-  File "/home/runner/work/ocp/ocp/main.py", line 34, in __call__
-    with new_trainer_context(args=args, config=config) as ctx:
-  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/contextlib.py", line 137, in __enter__
-    return next(self.gen)
-           ^^^^^^^^^^^^^^
-  File "/home/runner/work/ocp/ocp/ocpmodels/common/utils.py", line 977, in new_trainer_context
-    trainer = trainer_cls(
-              ^^^^^^^^^^^^
-  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/ocp_trainer.py", line 95, in __init__
-    super().__init__(
-  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/base_trainer.py", line 176, in __init__
-    self.load()
-  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/base_trainer.py", line 197, in load
-    self.load_logger()
-  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/base_trainer.py", line 229, in load_logger
-    self.logger = registry.get_logger_class(logger_name)(self.config)
-                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-  File "/home/runner/work/ocp/ocp/ocpmodels/common/logger.py", line 65, in __init__
-    wandb.init(
-  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/wandb/sdk/wandb_init.py", line 1200, in init
-    raise e
-  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/wandb/sdk/wandb_init.py", line 1177, in init
-    wi.setup(kwargs)
-  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/wandb/sdk/wandb_init.py", line 301, in setup
-    wandb_login._login(
-  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/wandb/sdk/wandb_login.py", line 334, in _login
-    wlogin.prompt_api_key()
-  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/wandb/sdk/wandb_login.py", line 263, in prompt_api_key
-    raise UsageError("api_key not configured (no-tty). call " + directive)
-wandb.errors.UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key])
+  File "/home/runner/work/ocp/ocp/main.py", line 40, in __call__
+    self.task.run()
+  File "/home/runner/work/ocp/ocp/ocpmodels/tasks/task.py", line 51, in run
+    self.trainer.train(
+  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/ocp_trainer.py", line 200, in train
+    self.update_best(
+  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/base_trainer.py", line 667, in update_best
+    "mae" in primary_metric
+TypeError: argument of type 'NoneType' is not iterable

diff --git a/_downloads/819e10305ddd6839cd7da05935b17060/mass-inference.txt b/_downloads/819e10305ddd6839cd7da05935b17060/mass-inference.txt
@@ -0,0 +1,148 @@
+2024-04-14 21:28:11 (INFO): Project root: /home/runner/work/ocp/ocp
+/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
+  warnings.warn(
+2024-04-14 21:28:13 (WARNING): Detected old config, converting to new format. Consider updating to avoid potential incompatibilities.
+2024-04-14 21:28:13 (INFO): amp: true
+cmd:
+  checkpoint_dir: ./checkpoints/2024-04-14-21-28-32
+  commit: dda32e7
+  identifier: ''
+  logs_dir: ./logs/tensorboard/2024-04-14-21-28-32
+  print_every: 10
+  results_dir: ./results/2024-04-14-21-28-32
+  seed: 0
+  timestamp_id: 2024-04-14-21-28-32
+dataset:
+  a2g_args:
+    r_energy: false
+    r_forces: false
+  format: ase_db
+  key_mapping:
+    force: forces
+    y: energy
+  select_args:
+    selection: natoms>5,xc=PBE
+  src: data.db
+eval_metrics:
+  metrics:
+    energy:
+    - mae
+    forces:
+    - forcesx_mae
+    - forcesy_mae
+    - forcesz_mae
+    - mae
+    - cosine_similarity
+    - magnitude_error
+    misc:
+    - energy_forces_within_threshold
+gpus: 0
+logger: tensorboard
+loss_fns:
+- energy:
+    coefficient: 1
+    fn: mae
+- forces:
+    coefficient: 1
+    fn: l2mae
+model: gemnet_t
+model_attributes:
+  activation: silu
+  cbf:
+    name: spherical_harmonics
+  cutoff: 6.0
+  direct_forces: true
+  emb_size_atom: 512
+  emb_size_bil_trip: 64
+  emb_size_cbf: 16
+  emb_size_edge: 512
+  emb_size_rbf: 16
+  emb_size_trip: 64
+  envelope:
+    exponent: 5
+    name: polynomial
+  extensive: true
+  max_neighbors: 50
+  num_after_skip: 2
+  num_atom: 3
+  num_before_skip: 1
+  num_blocks: 3
+  num_concat: 1
+  num_radial: 128
+  num_spherical: 7
+  otf_graph: true
+  output_init: HeOrthogonal
+  rbf:
+    name: gaussian
+  regress_forces: true
+noddp: false
+optim:
+  batch_size: 16
+  clip_grad_norm: 10
+  ema_decay: 0.999
+  energy_coefficient: 1
+  eval_batch_size: 16
+  eval_every: 5000
+  force_coefficient: 1
+  loss_energy: mae
+  loss_force: atomwisel2
+  lr_gamma: 0.8
+  lr_initial: 0.0005
+  lr_milestones:
+  - 64000
+  - 96000
+  - 128000
+  - 160000
+  - 192000
+  max_epochs: 80
+  num_workers: 2
+  optimizer: AdamW
+  optimizer_params:
+    amsgrad: true
+  warmup_steps: -1
+outputs:
+  energy:
+    level: system
+  forces:
+    eval_on_free_atoms: true
+    level: atom
+    train_on_free_atoms: false
+slurm: {}
+task:
+  dataset: ase_db
+  prediction_dtype: float32
+test_dataset:
+  a2g_args:
+    r_energy: false
+    r_forces: false
+  select_args:
+    selection: natoms>5,xc=PBE
+  src: data.db
+trainer: ocp
+val_dataset: null
+
+2024-04-14 21:28:13 (INFO): Loading dataset: ase_db
+Traceback (most recent call last):
+  File "/home/runner/work/ocp/ocp/main.py", line 89, in <module>
+    Runner()(config)
+  File "/home/runner/work/ocp/ocp/main.py", line 34, in __call__
+    with new_trainer_context(args=args, config=config) as ctx:
+  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/contextlib.py", line 137, in __enter__
+    return next(self.gen)
+           ^^^^^^^^^^^^^^
+  File "/home/runner/work/ocp/ocp/ocpmodels/common/utils.py", line 977, in new_trainer_context
+    trainer = trainer_cls(
+              ^^^^^^^^^^^^
+  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/ocp_trainer.py", line 95, in __init__
+    super().__init__(
+  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/base_trainer.py", line 176, in __init__
+    self.load()
+  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/base_trainer.py", line 198, in load
+    self.load_datasets()
+  File "/home/runner/work/ocp/ocp/ocpmodels/trainers/base_trainer.py", line 281, in load_datasets
+    self.train_dataset = registry.get_dataset_class(
+                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/runner/work/ocp/ocp/ocpmodels/datasets/ase_datasets.py", line 114, in __init__
+    raise ValueError(
+ValueError: No valid ase data found!Double check that the src path and/or glob search pattern gives ASE compatible data: data.db
+Elapsed time = 3.8 seconds
diff --git a/_images/08a383fd08a3f3327eb8d2e7d56b5d0be09f9d562a56daf2ccc88992f5e0a329.png b/_images/08a383fd08a3f3327eb8d2e7d56b5d0be09f9d562a56daf2ccc88992f5e0a329.png
diff --git a/_images/6fbfddea93b00ab62c0af54adff01077570842b4e90542f142723a52d6384f2d.png b/_images/6fbfddea93b00ab62c0af54adff01077570842b4e90542f142723a52d6384f2d.png
diff --git a/_images/ae9b8a863b79dfdadb65581fcb9d6254058817d7b88b641775099deb6b298432.png b/_images/ae9b8a863b79dfdadb65581fcb9d6254058817d7b88b641775099deb6b298432.png
diff --git a/_images/b70fa6304c65c5aadb7094686d141220ada8c34b58b48827679aaa547e59c65e.png b/_images/b70fa6304c65c5aadb7094686d141220ada8c34b58b48827679aaa547e59c65e.png
diff --git a/_images/c366a22efd82a38dd69319abff2229d069780450e3b93036ad16cf0d03d30436.png b/_images/c366a22efd82a38dd69319abff2229d069780450e3b93036ad16cf0d03d30436.png
diff --git a/_images/f13d5eb708bb24b40b0c00fbc4ee1dd6e35343bdcae0a03c832c556133dcf306.png b/_images/f13d5eb708bb24b40b0c00fbc4ee1dd6e35343bdcae0a03c832c556133dcf306.png
diff --git a/_sources/core/fine-tuning/fine-tuning-oxides.md b/_sources/core/fine-tuning/fine-tuning-oxides.md
@@ -210,6 +210,7 @@ yml = generate_yml_config(checkpoint_path, 'config.yml',
                            'task.dataset': 'ase_db',
                            'optim.eval_every': 1,
                            'optim.max_epochs': 10,
+                            'logger':'tensorboard', # don't use wandb!
                            # Train data
                            'dataset.train.src': 'train.db',
                            'dataset.train.a2g_args.r_energy': True,

diff --git a/_sources/core/inference.md b/_sources/core/inference.md
@@ -26,25 +26,28 @@ You can retrieve the dataset below. In this notebook we learn how to do "mass in
 ! [ ! -f data.db ]  && wget https://figshare.com/ndownloader/files/11948267 -O data.db 
 ```
 
-```{code-cell} ipython3
-! ase db data.db
-```
+
 
 Inference on this file will be fast if we have a gpu, but if we don't this could take a while. To keep things fast for the automated builds, we'll just select the first 10 structures so it's still approachable with just a CPU. 
 Comment or skip this block to use the whole dataset!
 
 ```{code-cell} ipython3
-! cp data.db full_data.db
+! mv data.db full_data.db
+
 import ase.db
 import numpy as np
 
 with ase.db.connect('full_data.db') as full_db:
-  with ase.db.connect('data.db') as subset_db:
+  with ase.db.connect('data.db',append=False) as subset_db:
     for i in range(1, 10):
       subset_db.write(full_db.get_atoms(i))
 
 ```
 
+```{code-cell} ipython3
+! ase db data.db
+```
+
 You have to choose a checkpoint to start with. The newer checkpoints may require too much memory for this environment. 
 
 ```{code-cell} ipython3

diff --git a/_sources/tutorials/NRR/NRR_example.md b/_sources/tutorials/NRR/NRR_example.md
@@ -172,21 +172,22 @@ These steps are embarrassingly parallel, and can be launched that way to speed t
 
 The goal here is to relax each candidate adsorption geometry and save the results in a trajectory file we will analyze later. Each trajectory file will have the geometry and final energy of the relaxed structure. 
 
-It is somewhat time consuming to run this, so in this cell we only run one example.
+It is somewhat time consuming to run this, so in this cell we only run one example, and just the first 4 configurations for each adsorbate. 
 
 ```{code-cell} ipython3
 import time
 from tqdm import tqdm
 tinit = time.time()
 
-for bulk_src_id in tqdm(bulk_ids[1:2]): 
+# Note we're just doing the first bulk_id! 
+for bulk_src_id in tqdm(bulk_ids[:1]): 
     # Enumerate slabs and establish adsorbates
     bulk = Bulk(bulk_src_id_from_db=bulk_src_id, bulk_db_path="NRR_example_bulks.pkl")
     slab = Slab.from_bulk_get_specific_millers(bulk= bulk, specific_millers=(1, 1, 1))
 
-    # Perform heuristic placements
-    heuristic_adslabs_H = AdsorbateSlabConfig(slab[0], adsorbate_H, mode="heuristic")
-    heuristic_adslabs_NNH = AdsorbateSlabConfig(slab[0], adsorbate_NNH, mode="heuristic")
+    # Perform heuristic placements, note just 4 configs!
+    heuristic_adslabs_H = AdsorbateSlabConfig(slab[0], adsorbate_H, mode="heuristic")[:4]
+    heuristic_adslabs_NNH = AdsorbateSlabConfig(slab[0], adsorbate_NNH, mode="heuristic")[:4]
 
     #Run relaxations
     os.makedirs(f"data/{bulk_src_id}_H", exist_ok=True)

diff --git a/_sources/tutorials/advanced/embeddings.md b/_sources/tutorials/advanced/embeddings.md
@@ -320,5 +320,5 @@ found.get_distance(0, 2), found.get_distance(1, 2)
 
 ```{code-cell} ipython3
 from ase.visualize.plot import plot_atoms
-plot_atoms(found);
+plot_atoms(found)
 ```