Skip to content

Commit

Permalink
Add partition command and SageMaker parameters to SageMaker docs (#514)
Browse files Browse the repository at this point in the history
*Issue #, if available:*

*Description of changes:*


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Jian Zhang (James) <[email protected]>
  • Loading branch information
2 people authored and Xiang Song committed Sep 29, 2023
1 parent ae6769f commit 708a759
Showing 1 changed file with 83 additions and 0 deletions.
83 changes: 83 additions & 0 deletions docs/source/scale/sagemaker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,89 @@ Users can use the following commands to check the corresponding outputs:
aws s3 ls s3://<PATH_TO_SAVE_GENERATED_NODE_EMBEDDING>/
aws s3 ls s3://<PATH_TO_SAVE_PREDICTION_RESULTS>/
Launch graph partitioning task
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If your data are in the `DGL chunked
format <https://docs.dgl.ai/guide/distributed-preprocessing.html#specification>`_
you can perform distributed partitioning using SageMaker to prepare your
data for distributed training.

.. code:: bash
python launch/launch_partition.py \
--graph-data-s3 ${DATASET_S3_PATH} \
--num-parts ${NUM_PARTITIONS} \
--instance-count ${NUM_PARTITIONS} \
--output-data-s3 ${OUTPUT_PATH} \
--instance-type ${INSTANCE_TYPE} \
--image-url ${IMAGE_URI} \
--region ${REGION} \
--role ${ROLE} \
--entry-point "run/partition_entry.py" \
--metadata-filename ${METADATA_FILE} \
--log-level INFO \
--partition-algorithm ${ALGORITHM}
Running the above will take the dataset in chunked format
from ``${DATASET_S3_PATH}`` as input and create a DistDGL graph with
``${NUM_PARTITIONS}`` under the output path, ``${OUTPUT_PATH}``.
Currently we only support ``random`` as the partitioning algorithm.

Passing additional arguments to the SageMaker Estimator/Processor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sometimes you might want to pass additional arguments to the constructor
of the SageMaker Estimator/Processor object we use to launch SageMaker
tasks, e.g. to set a max runtime, or set a VPC configuration. Our launch
scripts support forwarding arguments to the base class object through a
``kwargs`` dictionary.

To pass additional ``kwargs`` directly to the Estimator/Processor
constructor, you can use the ``--sm-estimator-parameters`` argument,
providing a string of space-separated arguments (enclosed in double
quotes ``"`` to ensure correct parsing) and the format
``<argname>=<value>`` for each argument.

``<argname>`` needs to be a valid SageMaker Estimator/Processor argument
name and ``<value>`` a value that can be parsed as a Python literal,
**without spaces**.

For example, to pass a specific max runtime, subnet list, and enable
inter-container traffic encryption for a train, inference, or partition
job you'd use:

.. code:: bash
python3 launch/launch_[infer|train|partition] \
<other arugments> \
--sm-estimator-parameters "max_run=3600 volume_size=100 encrypt_inter_container_traffic=True subnets=['subnet-1234','subnet-4567']"
Notice how we don't include any spaces in
``['subnet-1234','subnet-4567']`` to ensure correct parsing of the list.

The train, inference and partition scripts launch SageMaker Training
jobs that rely on the ``Estimator`` base class: For a full list of
``Estimator`` parameters see the `SageMaker Estimator documentation.
<https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase>`_

The GConstruct job will launch a SageMaker Processing job that relies on
the ``Processor`` base class, so its arguments are different,
e.g. ``volume_size_in_gb`` for the ``Processor`` vs. ``volume_size`` for
the ``Estimator``.

For a full list of ``Processor`` parameters see the `SageMaker Processor documentation.
<https://sagemaker.readthedocs.io/en/stable/api/training/processing.html>`_

Using ``Processor`` arguments the above example would become:

.. code:: bash
python3 launch/launch_gconstruct \
<other arugments> \
--sm-estimator-parameters "max_runtime_in_seconds=3600 volume_size_in_gb=100"
Run GraphStorm SageMaker with Docker Compose
..............................................
This section describes how to launch Docker compose jobs that emulate a SageMaker training execution environment. This can be used to develop and test GraphStorm model training and inference on SageMaker locally.
Expand Down

0 comments on commit 708a759

Please sign in to comment.