ASC23-LLM inference optimization

The dataset and baseline code for ASC23 LLM inference optimization challenge.

Challenge description

This competition focuses on the LLM inference optimization, which requires participating teams to build an inference engine based on LLaMA-70B to achieve high throughput on the 10,000-sample dataset provided by the ASC24 Committees.

Dateset

The dataset used in the preliminary has 10k samples with prompts and prompt length. This dataset has the following characteristics:

Multi-domain Coverage: The dataset contains text data from various domains, including news, encyclopedias, novels, forums, and more, and covering different topics, styles, and viewpoints, enabling the model to have better generalization across different domain tasks.
Multilingual Support: The dataset includes text data from multiple languages such as English, Chinese, Korean, Spanish, etc., which allows the model to understand and generate text across different languages.
Large-scale Data: The dataset is sampled from a massive amount of text data, which helps improve the language understanding and generation capabilities of the model.
Length Diversity: Too long and too short sequences are filtered out. The dataset contains 10k samples with length range from 4 to 1024, covering a vast majority of the length range for everyday use.

In summary, the dataset provides high-quality language samples with multi-domain coverage, multilingual support, large-scale data, and diversity characteristics.

Baseline code

The ASC24 committees supply a baseline code to benchmark throughout and total tokens. And participants could start from it and modified it for high inference performance.

Usage：After download the dataset one can use the following example to run the script. More info can be obtained using -h.

The parameter --num-samples is only used for test. The participants should test with the whole 10k dataset for this challenge.

CUDA_VISIBLE_DEVICES=0 python baseline.py --dataset /your_data_path/scrambled_sampled_dataset.json --model /your_model_path/hf_model_weights --num-samples=10

Besides, the model weight of LLaMA2-70B can be downloaded from: https://huggingface.co/meta-llama/Llama-2-70b.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
engine		engine
model		model
sampler		sampler
scheduler		scheduler
sequence		sequence
utils		utils
.gitignore		.gitignore
README.md		README.md
llama_offline_inference.py		llama_offline_inference.py
run.sh		run.sh
run_model.slurm		run_model.slurm
scrambled_sampled_dataset.json		scrambled_sampled_dataset.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASC23-LLM inference optimization

Challenge description

Dateset

Baseline code

About

Releases

Packages

Contributors 4

Languages

heptagonhust/ASC24-Preliminary-llama2

Folders and files

Latest commit

History

Repository files navigation

ASC23-LLM inference optimization

Challenge description

Dateset

Baseline code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages