-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b536cd3
commit d3eb82e
Showing
11 changed files
with
125 additions
and
289 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,4 @@ | ||
|
||
from internlm.simulator.profiler.perf_comm import gen_perf | ||
|
||
if __name__ == "__main__": | ||
gen_perf() | ||
gen_perf() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# InternLM Simulator | ||
|
||
|
||
## 1. Introduction | ||
The solver mainly consists of two components: | ||
1. `profiling`: Collects the time consumption of each stage during the model training process in advance and saves it as data files and image files. | ||
2. `simulation`: Simulates the model training process based on the collected data files and outputs the time consumption of each stage during the training process. | ||
|
||
## 2. Usage | ||
|
||
### 2.1 Generate profiling data | ||
|
||
There are two types of profiling data: | ||
1. '`linear`' profiling data, include: [`LINEAR`] | ||
2. '`Communication`' profiling data, include: [`ALL2ALL`, `ALLREDUCE`, `REDUCESCATTER`, `ALLGATHER`, `BROADCAST`] | ||
|
||
|
||
Note: | ||
1. It is recommended to use more than 64 GPUs for data collection to ensure more accurate communication data. | ||
2. `Flash Attention` information is not collected in advance but is collected on the fly during the simulation and stored in the cache. This is because there are many variables that affect the performance of flash attention, and collecting in advance cannot cover all variables. | ||
|
||
```python | ||
# generate profiling data | ||
torchrun --nproc-per-node=8 gen_profiler_data.py | ||
|
||
# the profiling data will be saved in the following path | ||
./prof_data | ||
├── data.pt | ||
└── pics | ||
├── cal | ||
│ └── linear.jpg | ||
└── comm | ||
├── all2all_intra_2_inter_1.jpg | ||
├── all2all_intra_4_inter_1.jpg | ||
├── all_gather_intra_2_inter_1.jpg | ||
├── all_gather_intra_4_inter_1.jpg | ||
├── all_reduce_intra_2_inter_1.jpg | ||
├── all_reduce_intra_4_inter_1.jpg | ||
├── broadcast_intra_2_inter_1.jpg | ||
├── broadcast_intra_4_inter_1.jpg | ||
├── reduce_scatter_intra_2_inter_1.jpg | ||
└── reduce_scatter_intra_4_inter_1.jpg | ||
|
||
``` | ||
|
||
### 2.2 Run simulation | ||
```python | ||
python simulation_train_formulaic.py --pre_profiling_data_path ./data/profiling_data.json --config configs/exp_simluator.py | ||
|
||
``` | ||
|
||
|
||
|
||
## 4. 贡献 |
Oops, something went wrong.