From 708a75981f6299aa3b6db1f9b3fb6606b406b2b8 Mon Sep 17 00:00:00 2001 From: Theodore Vasiloudis Date: Fri, 29 Sep 2023 09:55:11 -0700 Subject: [PATCH] Add partition command and SageMaker parameters to SageMaker docs (#514) *Issue #, if available:* *Description of changes:* By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: Jian Zhang (James) <6593865@qq.com> --- docs/source/scale/sagemaker.rst | 83 +++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/docs/source/scale/sagemaker.rst b/docs/source/scale/sagemaker.rst index 3718bc7e4b..32ccffb809 100644 --- a/docs/source/scale/sagemaker.rst +++ b/docs/source/scale/sagemaker.rst @@ -202,6 +202,89 @@ Users can use the following commands to check the corresponding outputs: aws s3 ls s3:/// aws s3 ls s3:/// +Launch graph partitioning task +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If your data are in the `DGL chunked +format `_ +you can perform distributed partitioning using SageMaker to prepare your +data for distributed training. + +.. code:: bash + + python launch/launch_partition.py \ + --graph-data-s3 ${DATASET_S3_PATH} \ + --num-parts ${NUM_PARTITIONS} \ + --instance-count ${NUM_PARTITIONS} \ + --output-data-s3 ${OUTPUT_PATH} \ + --instance-type ${INSTANCE_TYPE} \ + --image-url ${IMAGE_URI} \ + --region ${REGION} \ + --role ${ROLE} \ + --entry-point "run/partition_entry.py" \ + --metadata-filename ${METADATA_FILE} \ + --log-level INFO \ + --partition-algorithm ${ALGORITHM} + +Running the above will take the dataset in chunked format +from ``${DATASET_S3_PATH}`` as input and create a DistDGL graph with +``${NUM_PARTITIONS}`` under the output path, ``${OUTPUT_PATH}``. +Currently we only support ``random`` as the partitioning algorithm. + +Passing additional arguments to the SageMaker Estimator/Processor +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Sometimes you might want to pass additional arguments to the constructor +of the SageMaker Estimator/Processor object we use to launch SageMaker +tasks, e.g. to set a max runtime, or set a VPC configuration. Our launch +scripts support forwarding arguments to the base class object through a +``kwargs`` dictionary. + +To pass additional ``kwargs`` directly to the Estimator/Processor +constructor, you can use the ``--sm-estimator-parameters`` argument, +providing a string of space-separated arguments (enclosed in double +quotes ``"`` to ensure correct parsing) and the format +``=`` for each argument. + +```` needs to be a valid SageMaker Estimator/Processor argument +name and ```` a value that can be parsed as a Python literal, +**without spaces**. + +For example, to pass a specific max runtime, subnet list, and enable +inter-container traffic encryption for a train, inference, or partition +job you'd use: + +.. code:: bash + + python3 launch/launch_[infer|train|partition] \ + \ + --sm-estimator-parameters "max_run=3600 volume_size=100 encrypt_inter_container_traffic=True subnets=['subnet-1234','subnet-4567']" + +Notice how we don't include any spaces in +``['subnet-1234','subnet-4567']`` to ensure correct parsing of the list. + +The train, inference and partition scripts launch SageMaker Training +jobs that rely on the ``Estimator`` base class: For a full list of +``Estimator`` parameters see the `SageMaker Estimator documentation. +`_ + +The GConstruct job will launch a SageMaker Processing job that relies on +the ``Processor`` base class, so its arguments are different, +e.g. ``volume_size_in_gb`` for the ``Processor`` vs. ``volume_size`` for +the ``Estimator``. + +For a full list of ``Processor`` parameters see the `SageMaker Processor documentation. +`_ + +Using ``Processor`` arguments the above example would become: + +.. code:: bash + + python3 launch/launch_gconstruct \ + \ + --sm-estimator-parameters "max_runtime_in_seconds=3600 volume_size_in_gb=100" + + Run GraphStorm SageMaker with Docker Compose .............................................. This section describes how to launch Docker compose jobs that emulate a SageMaker training execution environment. This can be used to develop and test GraphStorm model training and inference on SageMaker locally.