Skip to content

Commit

Permalink
deploy: da93cfe
Browse files Browse the repository at this point in the history
  • Loading branch information
lbluque committed Oct 23, 2024
1 parent 6b79537 commit 68092e6
Show file tree
Hide file tree
Showing 317 changed files with 15,118 additions and 16,261 deletions.
137 changes: 67 additions & 70 deletions _downloads/5fdddbed2260616231dbf7b0d94bb665/train.txt

Large diffs are not rendered by default.

58 changes: 29 additions & 29 deletions _downloads/819e10305ddd6839cd7da05935b17060/mass-inference.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
2024-09-18 21:19:03 (INFO): Running in local mode without elastic launch (single gpu only)
2024-09-18 21:19:03 (INFO): Setting env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
2024-09-18 21:19:03 (INFO): Project root: /home/runner/work/fairchem/fairchem/src/fairchem
2024-10-23 20:39:14 (INFO): Running in local mode without elastic launch (single gpu only)
2024-10-23 20:39:14 (INFO): Setting env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
2024-10-23 20:39:14 (INFO): Project root: /home/runner/work/fairchem/fairchem/src/fairchem
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/escn/so3.py:23: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
_Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/scn/spherical_harmonics.py:23: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
Expand All @@ -15,17 +15,17 @@
@torch.cuda.amp.autocast(enabled=False)
/home/runner/work/fairchem/fairchem/src/fairchem/core/models/equiformer_v2/layer_norm.py:357: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@torch.cuda.amp.autocast(enabled=False)
2024-09-18 21:19:04 (INFO): amp: false
2024-10-23 20:39:15 (INFO): amp: false
cmd:
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-09-18-21-20-00
commit: '8226618'
checkpoint_dir: /home/runner/work/fairchem/fairchem/docs/core/checkpoints/2024-10-23-20-39-28
commit: da93cfe
identifier: ''
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/tensorboard/2024-09-18-21-20-00
logs_dir: /home/runner/work/fairchem/fairchem/docs/core/logs/tensorboard/2024-10-23-20-39-28
print_every: 10
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-09-18-21-20-00
results_dir: /home/runner/work/fairchem/fairchem/docs/core/results/2024-10-23-20-39-28
seed: 0
timestamp_id: 2024-09-18-21-20-00
version: 0.1.dev1+g8226618
timestamp_id: 2024-10-23-20-39-28
version: 0.1.dev1+gda93cfe
dataset: {}
evaluation_metrics:
metrics:
Expand Down Expand Up @@ -127,24 +127,24 @@ test_dataset:
trainer: ocp
val_dataset: {}

2024-09-18 21:19:04 (INFO): Loading model: gemnet_t
2024-09-18 21:19:06 (INFO): Loaded GemNetT with 31671825 parameters.
2024-09-18 21:19:06 (WARNING): log_summary for Tensorboard not supported
2024-09-18 21:19:06 (WARNING): Could not find dataset metadata.npz files in '[PosixPath('data.db')]'
2024-09-18 21:19:06 (WARNING): Disabled BalancedBatchSampler because num_replicas=1.
2024-09-18 21:19:06 (WARNING): Failed to get data sizes, falling back to uniform partitioning. BalancedBatchSampler requires a dataset that has a metadata attributed with number of atoms.
2024-09-18 21:19:06 (INFO): rank: 0: Sampler created...
2024-09-18 21:19:06 (INFO): Created BalancedBatchSampler with sampler=<fairchem.core.common.data_parallel.StatefulDistributedSampler object at 0x7f43e1273dd0>, batch_size=16, drop_last=False
2024-09-18 21:19:06 (INFO): Attemping to load user specified checkpoint at /tmp/fairchem_checkpoints/gndt_oc22_all_s2ef.pt
2024-09-18 21:19:06 (INFO): Loading checkpoint from: /tmp/fairchem_checkpoints/gndt_oc22_all_s2ef.pt
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/base_trainer.py:590: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2024-10-23 20:39:15 (INFO): Loading model: gemnet_t
2024-10-23 20:39:17 (INFO): Loaded GemNetT with 31671825 parameters.
2024-10-23 20:39:17 (WARNING): log_summary for Tensorboard not supported
2024-10-23 20:39:17 (WARNING): Could not find dataset metadata.npz files in '[PosixPath('data.db')]'
2024-10-23 20:39:17 (WARNING): Disabled BalancedBatchSampler because num_replicas=1.
2024-10-23 20:39:17 (WARNING): Failed to get data sizes, falling back to uniform partitioning. BalancedBatchSampler requires a dataset that has a metadata attributed with number of atoms.
2024-10-23 20:39:17 (INFO): rank: 0: Sampler created...
2024-10-23 20:39:17 (INFO): Created BalancedBatchSampler with sampler=<fairchem.core.common.data_parallel.StatefulDistributedSampler object at 0x7feb013bb710>, batch_size=16, drop_last=False
2024-10-23 20:39:17 (INFO): Attemping to load user specified checkpoint at /tmp/fairchem_checkpoints/gndt_oc22_all_s2ef.pt
2024-10-23 20:39:17 (INFO): Loading checkpoint from: /tmp/fairchem_checkpoints/gndt_oc22_all_s2ef.pt
/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/base_trainer.py:603: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=map_location)
2024-09-18 21:19:06 (INFO): Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
2024-09-18 21:19:06 (WARNING): Scale factor comment not found in model
2024-09-18 21:19:06 (INFO): Predicting on test.
device 0: 0%| | 0/3 [00:00<?, ?it/s]/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:451: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
2024-10-23 20:39:17 (INFO): Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
2024-10-23 20:39:17 (WARNING): Scale factor comment not found in model
2024-10-23 20:39:17 (INFO): Predicting on test.
device 0: 0%| | 0/3 [00:00<?, ?it/s]/home/runner/work/fairchem/fairchem/src/fairchem/core/trainers/ocp_trainer.py:453: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=self.scaler is not None):
device 0: 33%|███████████▋ | 1/3 [00:03<00:06, 3.26s/it]device 0: 67%|███████████████████████▎ | 2/3 [00:05<00:02, 2.64s/it]device 0: 100%|███████████████████████████████████| 3/3 [00:07<00:00, 2.20s/it]device 0: 100%|███████████████████████████████████| 3/3 [00:07<00:00, 2.39s/it]
2024-09-18 21:19:13 (INFO): Writing results to /home/runner/work/fairchem/fairchem/docs/core/results/2024-09-18-21-20-00/ocp_predictions.npz
2024-09-18 21:19:13 (INFO): Total time taken: 7.306535959243774
Elapsed time = 14.0 seconds
device 0: 33%|███████████▋ | 1/3 [00:03<00:06, 3.24s/it]device 0: 67%|███████████████████████▎ | 2/3 [00:06<00:02, 2.99s/it]device 0: 100%|███████████████████████████████████| 3/3 [00:08<00:00, 2.58s/it]device 0: 100%|███████████████████████████████████| 3/3 [00:08<00:00, 2.72s/it]
2024-10-23 20:39:26 (INFO): Writing results to /home/runner/work/fairchem/fairchem/docs/core/results/2024-10-23-20-39-28/ocp_predictions.npz
2024-10-23 20:39:26 (INFO): Total time taken: 8.327327013015747
Elapsed time = 15.7 seconds
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
81 changes: 81 additions & 0 deletions _sources/autoapi/core/_cli_hydra/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
core._cli_hydra
===============

.. py:module:: core._cli_hydra
.. autoapi-nested-parse::

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the
LICENSE file in the root directory of this source tree.



Attributes
----------

.. autoapisummary::

core._cli_hydra.logger


Classes
-------

.. autoapisummary::

core._cli_hydra.Submitit


Functions
---------

.. autoapisummary::

core._cli_hydra.map_cli_args_to_dist_config
core._cli_hydra.get_hydra_config_from_yaml
core._cli_hydra.runner_wrapper
core._cli_hydra.main


Module Contents
---------------

.. py:data:: logger
.. py:class:: Submitit
Bases: :py:obj:`submitit.helpers.Checkpointable`


Derived callable classes are requeued after timeout with their current
state dumped at checkpoint.

__call__ method must be implemented to make your class a callable.

.. note::

The following implementation of the checkpoint method resubmits the full current
state of the callable (self) with the initial argument. You may want to replace the method to
curate the state (dump a neural network to a standard format and remove it from
the state so that not to pickle it) and change/remove the initial parameters.


.. py:method:: __call__(dict_config: omegaconf.DictConfig, cli_args: argparse.Namespace) -> None
.. py:method:: checkpoint(*args, **kwargs)
Resubmits the same callable with the same arguments



.. py:function:: map_cli_args_to_dist_config(cli_args: argparse.Namespace) -> dict
.. py:function:: get_hydra_config_from_yaml(config_yml: str, overrides_args: list[str]) -> omegaconf.DictConfig
.. py:function:: runner_wrapper(config: omegaconf.DictConfig, cli_args: argparse.Namespace)
.. py:function:: main(args: argparse.Namespace | None = None, override_args: list[str] | None = None)
14 changes: 14 additions & 0 deletions _sources/autoapi/core/common/distutils/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Attributes

core.common.distutils.T
core.common.distutils.DISTRIBUTED_PORT
core.common.distutils.CURRENT_DEVICE_STR


Functions
Expand All @@ -39,6 +40,9 @@ Functions
core.common.distutils.all_reduce
core.common.distutils.all_gather
core.common.distutils.gather_objects
core.common.distutils.assign_device_for_local_rank
core.common.distutils.get_device_for_local_rank
core.common.distutils.setup_env_local


Module Contents
Expand All @@ -50,6 +54,10 @@ Module Contents
:value: 13356


.. py:data:: CURRENT_DEVICE_STR
:value: 'CURRRENT_DEVICE'


.. py:function:: os_environ_get_or_throw(x: str) -> str
.. py:function:: setup(config) -> None
Expand Down Expand Up @@ -79,3 +87,9 @@ Module Contents
Gather a list of pickleable objects into rank 0


.. py:function:: assign_device_for_local_rank(cpu: bool, local_rank: int)
.. py:function:: get_device_for_local_rank()
.. py:function:: setup_env_local()
50 changes: 50 additions & 0 deletions _sources/autoapi/core/common/logger/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Classes
core.common.logger.Logger
core.common.logger.WandBLogger
core.common.logger.TensorboardLogger
core.common.logger.WandBSingletonLogger


Module Contents
Expand Down Expand Up @@ -149,3 +150,52 @@ Module Contents
.. py:method:: log_artifact(name: str, type: str, file_location: str) -> None
.. py:class:: WandBSingletonLogger
Singleton version of wandb logger, this forces a single instance of the logger to be created and used from anywhere in the code (not just from the trainer).
This will replace the original WandBLogger.

We initialize wandb instance somewhere in the trainer/runner globally:

WandBSingletonLogger.init_wandb(...)

Then from anywhere in the code we can fetch the singleton instance and log to wandb,
note this allows you to log without knowing explicitly which step you are on
see: https://docs.wandb.ai/ref/python/log/#the-wb-step for more details

WandBSingletonLogger.get_instance().log({"some_value": value}, commit=False)


.. py:attribute:: _instance
:value: None



.. py:method:: init_wandb(config: dict, run_id: str, run_name: str, log_dir: str, project: str, entity: str, group: str | None = None) -> None
:classmethod:



.. py:method:: get_instance()
:classmethod:



.. py:method:: watch(model, log_freq: int = 1000) -> None
.. py:method:: log(update_dict: dict, step: int | None = None, commit=False, split: str = '') -> None
.. py:method:: log_plots(plots, caption: str = '') -> None
.. py:method:: log_summary(summary_dict: dict[str, Any])
.. py:method:: mark_preempting() -> None
.. py:method:: log_artifact(name: str, type: str, file_location: str) -> None
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ Module Contents
.. py:attribute:: otf_graph
:value: True



.. py:method:: get_energy_and_forces(apply_constraint: bool = True)
Expand Down
3 changes: 3 additions & 0 deletions _sources/autoapi/core/common/utils/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ Functions
core.common.utils.update_config
core.common.utils.get_loss_module
core.common.utils.load_model_and_weights_from_checkpoint
core.common.utils.get_timestamp_uid


Module Contents
Expand Down Expand Up @@ -297,3 +298,5 @@ Module Contents
.. py:function:: load_model_and_weights_from_checkpoint(checkpoint_path: str) -> torch.nn.Module
.. py:function:: get_timestamp_uid() -> str
65 changes: 65 additions & 0 deletions _sources/autoapi/core/components/runner/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
core.components.runner
======================

.. py:module:: core.components.runner
Classes
-------

.. autoapisummary::

core.components.runner.Runner
core.components.runner.MockRunner


Module Contents
---------------

.. py:class:: Runner
Represents an abstraction over things that run in a loop and can save/load state.
ie: Trainers, Validators, Relaxation all fall in this category.
This allows us to decouple away from a monolithic trainer class


.. py:method:: run() -> Any
:abstractmethod:



.. py:method:: save_state() -> None
:abstractmethod:



.. py:method:: load_state() -> None
:abstractmethod:



.. py:class:: MockRunner(x: int, y: int)
Bases: :py:obj:`Runner`


Represents an abstraction over things that run in a loop and can save/load state.
ie: Trainers, Validators, Relaxation all fall in this category.
This allows us to decouple away from a monolithic trainer class


.. py:attribute:: x
.. py:attribute:: y
.. py:method:: run() -> Any
.. py:method:: save_state() -> None
.. py:method:: load_state() -> None
1 change: 1 addition & 0 deletions _sources/autoapi/core/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,6 @@ Submodules
:maxdepth: 1

/autoapi/core/_cli/index
/autoapi/core/_cli_hydra/index


7 changes: 5 additions & 2 deletions _sources/autoapi/core/models/escn/escn/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Classes
Module Contents
---------------

.. py:class:: eSCN(use_pbc: bool = True, use_pbc_single: bool = False, regress_forces: bool = True, otf_graph: bool = False, max_neighbors: int = 40, cutoff: float = 8.0, max_num_elements: int = 90, num_layers: int = 8, lmax_list: list[int] | None = None, mmax_list: list[int] | None = None, sphere_channels: int = 128, hidden_channels: int = 256, edge_channels: int = 128, num_sphere_samples: int = 128, distance_function: str = 'gaussian', basis_width_scalar: float = 1.0, distance_resolution: float = 0.02, show_timing_info: bool = False, resolution: int | None = None)
.. py:class:: eSCN(use_pbc: bool = True, use_pbc_single: bool = False, regress_forces: bool = True, otf_graph: bool = False, max_neighbors: int = 40, cutoff: float = 8.0, max_num_elements: int = 90, num_layers: int = 8, lmax_list: list[int] | None = None, mmax_list: list[int] | None = None, sphere_channels: int = 128, hidden_channels: int = 256, edge_channels: int = 128, num_sphere_samples: int = 128, distance_function: str = 'gaussian', basis_width_scalar: float = 1.0, distance_resolution: float = 0.02, show_timing_info: bool = False, resolution: int | None = None, activation_checkpoint: bool | None = False)
Bases: :py:obj:`torch.nn.Module`, :py:obj:`fairchem.core.models.base.GraphModelMixin`

Expand Down Expand Up @@ -80,6 +80,9 @@ Module Contents
:type show_timing_info: bool


.. py:attribute:: activation_checkpoint
.. py:attribute:: regress_forces
Expand Down Expand Up @@ -195,7 +198,7 @@ Module Contents



.. py:class:: eSCNBackbone(use_pbc: bool = True, use_pbc_single: bool = False, regress_forces: bool = True, otf_graph: bool = False, max_neighbors: int = 40, cutoff: float = 8.0, max_num_elements: int = 90, num_layers: int = 8, lmax_list: list[int] | None = None, mmax_list: list[int] | None = None, sphere_channels: int = 128, hidden_channels: int = 256, edge_channels: int = 128, num_sphere_samples: int = 128, distance_function: str = 'gaussian', basis_width_scalar: float = 1.0, distance_resolution: float = 0.02, show_timing_info: bool = False, resolution: int | None = None)
.. py:class:: eSCNBackbone(use_pbc: bool = True, use_pbc_single: bool = False, regress_forces: bool = True, otf_graph: bool = False, max_neighbors: int = 40, cutoff: float = 8.0, max_num_elements: int = 90, num_layers: int = 8, lmax_list: list[int] | None = None, mmax_list: list[int] | None = None, sphere_channels: int = 128, hidden_channels: int = 256, edge_channels: int = 128, num_sphere_samples: int = 128, distance_function: str = 'gaussian', basis_width_scalar: float = 1.0, distance_resolution: float = 0.02, show_timing_info: bool = False, resolution: int | None = None, activation_checkpoint: bool | None = False)
Bases: :py:obj:`eSCN`, :py:obj:`fairchem.core.models.base.BackboneInterface`

Expand Down
Loading

0 comments on commit 68092e6

Please sign in to comment.