-
Notifications
You must be signed in to change notification settings - Fork 101
FAQ
1. What is UCC?
2. What are the important components of UCC reference implementation?
3. How can I participate?
4. How to compile and run UCC with Open MPI?
5. How to compile and run UCC with PyTorch?
6. What is TL scoring and how to select a certain TL?
7. What are the dependencies for UCC?
8. How to compile all TLs?
9. How to compile a specific TL?
10. How to compile and run UCC with OpenSHMEM Applications?
11. How to implement new TL for UCC?
UCC is a collective communication operations API and library that is flexible, complete, and feature-rich for current and emerging programming models and runtimes.
Please refer https://github.com/openucx/ucc/blob/master/docs/images/ucc_components.png
- Propose features, discuss issues, review design and code on GitHub
- Participate in the weekly working group meetings
- Mailing list: https://elist.ornl.gov/mailman/listinfo/ucx-group)
Please refer: https://github.com/openucx/ucc#open-mpi-and-ucc-collectives
env var pattern: UCC_<TL/CL>_<NAME>_TUNE=token1#token2#...#tokenn, '#'
separated list of tokens
where token=coll_type:msg_range:mem_type:team_size:score:alg - a ':'
separated list of qualifiers.
Each qualifier is optional. The only requirement is that either "score" or "alg" is provided.
Qualifiers:
- coll_type = coll_type_1,coll_type_2,...,coll_type_n - a ',' separated list of coll_types
- msg_range = m_start_1-m_end_1,m_start_2-m_end_2,..,m_start_n-m_end_n - a ',' separated list of msg ranges, where each range is represented by "start" and "end" values separated by "-". Values can be numbers with "Size" characters, e.g. 128, 256b, 4K, 1M. Special value "inf" means MAX msg size.
- mem_type = m1,m2,..,mn - ',' separated list of memory types
- team_size = [t_start_1-t_end_1,t_start_2-t_end_2,...,t_start_n-t_end_n] - a ',' separated list of team size ranges enclosed with [].
- score = , a int value from 0 to "inf"
- alg = @<value|str> - character @ followed by either int number of string representing the collective algorithm.
Examples:
- UCC_TL_NCCL_TUNE=0 - disable all the NCCL collectives (score 0 is applied to ALL collectives since qualifier is not specified, similarly to ALL memory types, to default [0-inf] msg range and [0-inf] team size).
- UCC_TL_NCCL_TUNE=allreduce:cuda:inf,alltoall:0 - force NCCL allreduce for "cuda" buffers and disable alltoall
- UCC_TL_UCP_TUNE=bcast:0-4K:cuda:0,bcast:65k-1M:[25-100]:cuda:inf - disable UCP bcast on cuda buffers for msg sizes 0-4K and force UCP bcast on cuda buffers for msg sizes 65K-1M only for teams with 25-100 ranks
- UCC_TL_UCP_TUNE=allreduce:0-4K:@0,allreduce:4K-inf:@sra_knomial - for TL_UCP set allreduce algorithm to 0 for msg range 0-4K and to 1 (sra_knomial) for 4k-inf.
It depends on the system configuration, the workload that uses UCC, and TLs/CLs the user wants to enable.
- UCX
- NCCL
- Doxygen
All available TLs are compiled by default (--with-tls=all)
User can specify a list of specific TLs to be compiled, e.g. --with-tls=ucp: enables the only "ucp" tl build; --with-tls=sharp,nccl: enables build of tl/sharp and tl/nccl
For compilation instructions using OSHMEM with Open-MPI, please refer to: https://github.com/openucx/ucc#open-mpi-and-ucc-collectives
To run OpenSHMEM applications:
$ oshrun -np 2 --mca scoll_ucc_enable 1 --mca scoll_ucc_priority 100 ./my_openshmem_app
To run OpenSHMEM applications with one-sided collectives (i.e., Alltoall):
$ oshrun -np 2 --mca scoll_ucc_enable 1 --mca scoll_ucc_priority 100 -x UCC_TL_UCP_TUNE=alltoall:0-inf:@onesided ./my_openshmem_app