-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model conversion process failed when deploying Mixtral 8x22B AWQ with djl-tensorrtllm to Sagemaker #3343
Comments
@ydm-amazon Please take a look. |
It seems that dji-tensorrtllm cannot convert an quantized model, not sure if it was the issue. Hence I tried mistralai/Mixtral-8x7B-Instruct-v0.1 and the conversion failed again with below message:
| 1721291393048 | [INFO ] LmiUtils - convert_py: Generating train split: 71%|??????? | 203409/287113 [00:02<00:01, 77139.42 examples/s] | |
Thanks for the detailed information; I will look into it more today! |
Description
Model conversion process failed with djl-tensorrtllm and below serving.properties:
Expected Behavior
(what's the expected behavior?)
Error Message
| 1721194930489 | [INFO ] LmiUtils - Detected mpi_mode: true, rolling_batch: trtllm, tensor_parallel_degree 4, for modelType: mixtral |
| 1721194930489 | [INFO ] ModelInfo - M-0001: Apply per model settings: job_queue_size: 1000 max_dynamic_batch_size: 1 max_batch_delay: 100 max_idle_time: 60 load_on_devices: * engine: MPI mpi_mode: true option.entryPoint: null option.tensor_parallel_degree: 4 option.max_rolling_batch_size: 8 option.quantize: awq option.mpi_mode: true option.max_num_tokens: 8192 option.model_id: MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ option.rolling_batch: trtllm |
| 1721194933027 | [INFO ] LmiUtils - Converting model to TensorRT-LLM artifacts |
| 1721194933027 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] PyTorch version 2.2.1 available. |
| 1721194933493 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] JAX version 0.4.30 available. |
| 1721194933493 | [INFO ] LmiUtils - convert_py: [TensorRT-LLM] TensorRT-LLM version: 0.9.0 |
| 1721194933493 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] Received kwargs for tensorrt_llm_toolkit.create_model_repo: dict_items([('engine', 'MPI'), ('model_id', 'MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ'), ('tensor_parallel_degree', 4), ('quantize', 'awq'), ('max_num_tokens', '8192'), ('max_rolling_batch_size', '8'), ('trt_llm_model_repo', '/tmp/.djl.ai/trtllm/c1e40db56ea23fb1ec359dff353cdb9a752a827c')]) |
| 1721194933493 | [INFO ] LmiUtils - convert_py: /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning:
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
. || 1721194933743 | [INFO ] LmiUtils - convert_py: warnings.warn( |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] Selecting ModelBuilder |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] Configuring model (will download if not available locally): MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] Using llama scripts for model type: mixtral |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] Compiling HuggingFace model into TensorRT engine... |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] Updating TRT config... |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][WARNING] The following overrides are final. Some of them are specifically set by LMI to provide the best compilation experience. |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][WARNING] Model Config Override: qformat=int4_awq |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][WARNING] Model Config Override: calib_size=512 |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][WARNING] Model Config Override: kv_cache_dtype=int8 |
| 1721194933743 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] Quantizing HF checkpoint to TRT checkpoint... |
| 1721194938596 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][INFO] Running command: python3 /usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/quantization/quantize.py --model_dir MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ --dtype float16 --output_dir /tmp/trtllm_llama_ckpt/ --qformat int4_awq --kv_cache_dtype int8 --calib_size 512 --batch_size 32 --tp_size 4 --awq_block_size 64 |
| 1721194939003 | [INFO ] LmiUtils - convert_py: [LMI TRTLLM Toolkit][152][ERROR] Exit code: 1 for command: python3 /usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/quantization/quantize.py --model_dir MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ --dtype float16 --output_dir /tmp/trtllm_llama_ckpt/ --qformat int4_awq --kv_cache_dtype int8 --calib_size 512 --batch_size 32 --tp_size 4 --awq_block_size 64 |
| 1721194939003 | [INFO ] LmiUtils - convert_py: [TensorRT-LLM] TensorRT-LLM version: 0.9.0 |
| 1721194939003 | [INFO ] LmiUtils - convert_py: /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning:
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
. || 1721194939003 | [INFO ] LmiUtils - convert_py: warnings.warn( |
| 1721194939003 | [INFO ] LmiUtils - convert_py: Initializing model from MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ |
| 1721194939003 | [INFO ] LmiUtils - convert_py: Traceback (most recent call last): |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/quantization/quantize.py", line 52, in |
| 1721194939003 | [INFO ] LmiUtils - convert_py: quantize_and_export(model_dir=args.model_dir, |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_ammo.py", line 268, in quantize_and_export |
| 1721194939003 | [INFO ] LmiUtils - convert_py: model = get_model(model_dir, dtype, device) |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_ammo.py", line 163, in get_model |
| 1721194939003 | [INFO ] LmiUtils - convert_py: model = AutoModelForCausalLM.from_pretrained(ckpt_path, |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained |
| 1721194939003 | [INFO ] LmiUtils - convert_py: return model_class.from_pretrained( |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3155, in from_pretrained |
| 1721194939003 | [INFO ] LmiUtils - convert_py: config.quantization_config = AutoHfQuantizer.merge_quantization_configs( |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/auto.py", line 149, in merge_quantization_configs |
| 1721194939003 | [INFO ] LmiUtils - convert_py: quantization_config = AutoQuantizationConfig.from_dict(quantization_config) |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/auto.py", line 79, in from_dict |
| 1721194939003 | [INFO ] LmiUtils - convert_py: return target_cls.from_dict(quantization_config_dict) |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/quantization_config.py", line 94, in from_dict |
| 1721194939003 | [INFO ] LmiUtils - convert_py: config = cls(**config_dict) |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/quantization_config.py", line 693, in init |
| 1721194939003 | [INFO ] LmiUtils - convert_py: self.post_init() |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/transformers/utils/quantization_config.py", line 746, in post_init |
| 1721194939003 | [INFO ] LmiUtils - convert_py: raise ValueError( |
| 1721194939003 | [INFO ] LmiUtils - convert_py: ValueError: You current version of
autoawq
does not support module quantization skipping, please upgradeautoawq
package to at least 0.1.8. || 1721194939003 | [INFO ] LmiUtils - convert_py: Traceback (most recent call last): |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 69, in |
| 1721194939003 | [INFO ] LmiUtils - convert_py: main() |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 65, in main |
| 1721194939003 | [INFO ] LmiUtils - convert_py: create_trt_llm_repo(properties, args) |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/opt/djl/partition/trt_llm_partition.py", line 33, in create_trt_llm_repo |
| 1721194939003 | [INFO ] LmiUtils - convert_py: create_model_repo(model_id_or_path, **kwargs) |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/init.py", line 61, in create_model_repo |
| 1721194939003 | [INFO ] LmiUtils - convert_py: model.compile_model() |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 128, in compile_model |
| 1721194939003 | [INFO ] LmiUtils - convert_py: self.quantize_checkpoint_from_lmi_config() |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmcheckpoint/checkpointbuilder.py", line 382, in quantize_checkpoint_from_lmi_config |
| 1721194939003 | [INFO ] LmiUtils - convert_py: self.quantize_checkpoint(lmi_args) |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmcheckpoint/checkpointbuilder.py", line 340, in quantize_checkpoint |
| 1721194939003 | [INFO ] LmiUtils - convert_py: exec_command(quantize_checkpoint_cmd) |
| 1721194939003 | [INFO ] LmiUtils - convert_py: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/utils/utils.py", line 168, in exec_command |
| 1721194939003 | [INFO ] LmiUtils - convert_py: raise subprocess.CalledProcessError(proc.returncode, proc.args) |
| 1721194939755 | [INFO ] LmiUtils - convert_py: subprocess.CalledProcessError: Command 'python3 /usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/quantization/quantize.py --model_dir MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ --dtype float16 --output_dir /tmp/trtllm_llama_ckpt/ --qformat int4_awq --kv_cache_dtype int8 --calib_size 512 --batch_size 32 --tp_size 4 --awq_block_size 64' returned non-zero exit status 1. |
| 1721194939755 | [ERROR] ModelServer - Failed register workflow |
| 1721194939755 | java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed! |
| 1721194939755 | #011at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) ~[?:?] |
| 1721194939755 | #011at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) [?:?] |
| 1721194939755 | #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770) [?:?] |
| 1721194939756 | #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) [?:?] |
| 1721194939756 | #011at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?] |
| 1721194939756 | #011at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?] |
| 1721194939756 | #011at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?] |
| 1721194939756 | #011at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?] |
| 1721194939756 | #011at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) [?:?] |
| 1721194939756 | Caused by: ai.djl.engine.EngineException: Model conversion process failed! |
| 1721194939756 | #011at ai.djl.serving.wlm.LmiUtils.buildTrtLlmArtifacts(LmiUtils.java:338) ~[wlm-0.28.0.jar:?] |
| 1721194939756 | #011at ai.djl.serving.wlm.LmiUtils.convertTrtLLM(LmiUtils.java:133) ~[wlm-0.28.0.jar:?] |
| 1721194939756 | #011at ai.djl.serving.wlm.ModelInfo.initialize(ModelInfo.java:538) ~[wlm-0.28.0.jar:?] |
| 1721194939756 | #011at ai.djl.serving.models.ModelManager.lambda$registerWorkflow$2(ModelManager.java:105) ~[serving-0.28.0.jar:?] |
| 1721194939756 | #011at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?] |
| 1721194941759 | #11... 6 more |
| 1721194941759 | [INFO ] ModelServer - Model server stopped. |
| 1721194941759 | [ERROR] ModelServer - Unexpected error |
| 1721194941759 | ai.djl.serving.http.ServerStartupException: Failed to initialize startup models and workflows |
| 1721194941759 | #011at ai.djl.serving.ModelServer.start(ModelServer.java:210) ~[serving-0.28.0.jar:?] |
| 1721194941759 | #011at ai.djl.serving.ModelServer.startAndWait(ModelServer.java:174) ~[serving-0.28.0.jar:?] |
| 1721194941759 | #011at ai.djl.serving.ModelServer.main(ModelServer.java:143) [serving-0.28.0.jar:?] |
| 1721194941759 | Caused by: java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed!
How to Reproduce?
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Environment Info
Please run the command
./gradlew debugEnv
from the root directory of DJL (if necessary, clone DJL first). It will output information about your system, environment, and installation that can help us debug your issue. Paste the output of the command below:The text was updated successfully, but these errors were encountered: