generated from riscv-admin/template-group-admin
-
Notifications
You must be signed in to change notification settings - Fork 2
IME TG Minutes
Guido Araujo edited this page Aug 11, 2024
·
136 revisions
- Slides
- Video
- Agenda
- Update on TG schedule
- [Guido] Quick overview on IME bounds
- [Earl Killian] Proposal on IME architecture
- Discussion Guido presented two propositions that prove that (a) CI is maximized when m = n; and (b) that the CI is limited by the number of available vector registers assigned as accumulators. Earl presented an engine model aiming at minimizing energy consumption and maximizing the CI. The model does not use vector registers and proposes a new set of ISA accumulator registers to perform matrix multiplication operations. Steve raised the point that since we are discussing a matrix operation engine, we should also pay attention to other matrix operations that are very relevant to HPC workloads.
- Slides
- Video(TBD)
- Agenda
- Update on working groups and TG schedule
- Revisiting options
- [Abel Bernabeu] Updates on Option D
- Discussion
- The group discussed the Computational Intensity (CI) scalability. Steven and Greg raised the point that one should consider the memory subsystem constraints. Phillip mentioned that this should not be a concern, given that the TG focus is on the IME ISA definition and that architecture implementation aspects should not be a matter of concern. Greg mentioned that architecture implementation aspects like memory and datapath width would constrain performance and might limit scalability. Erich claimed that even with the current memory subsystem, achieving very high CI (of the order of ~60) is possible for very large VLEIN like 16K. The group agreed that discussing other metrics would be useful. Abel presented updates on Option D.
- Slides
- Video
- Agenda *
- Discussion
- Jose proposed a variation of Option C (called C*) in which tiles (size lambaˆ2) are encoded into vector registers. It allows the microkernel to load as many matrix tiles as possible depending on the size of the architected vector registers. Tile multiplication takes vector-encoded tiles as operands and performs outer-product multiple-accumulate. Tile loads are controlled using a set of three nesting loops, which use a special instruction to determine the number of ml, nl, and kl elements to load, zeroing the remaining positions of the vector registers. This approach may enable portability across architecture generations.
- Slides
- Video
- Agenda
- Update on TG Chair/Vice-chair selection
- Update on TG schedule
- Kick-off on matrix data type and geometry configuration
- Discussion
- The group roadmap was revisited and updated.
- Abel and Greg pointed out that new Matrix CSRs (MCSRs) add new architectural state and do not reuse what is already in RVV. Options would be: (a) adopt larger instructions to enable matrix type/shape encoding; (b) try harder to re-use what is in RVV by, for example, passing type/shape information through scalar registers (Guido's group is working on this).
- Jose suggested calling Matrix Operation Shape (ms). Steve suggested generalizing that to any matrix operation. The slides have been updated as suggested. Steve also suggested the possibility that, at execution time, the architecture would assign physical registers to logical registers depending on the number of physical vector registers available.
- Slides
- Video
- Agenda
- Definition of the group's schedule for 2024
- Update on workgroups definition
- Update on workloads and benchmarking
- [CN.Ke] Andes presentation on Cache and DRAM metrics
- [CN.Ke] Cache and DRAM evaluation tool
- Discussion
- A schedule for the group's work was proposed and discussed.
- Workloads for the architecture evaluation were discussed. Greg proposed using MLPerf, and the group agreed to use it as a reference for ML workloads. ConvBench and IBM POWER10 ML model profiling will still be available for those interested. As for HPC, GEMM-based OpenBLIS was discussed, and it was agreed that it should also be a reference.
- Memory access evaluation. CN.Ke extended the work on burst analysis to caches and discussed the impact on the various architectural options. It also proposed a new matrix transpose instruction.
- Slides
- Video (TBD)
- Agenda
- Revisiting what we have achieved so far
- Definition of gaps, agenda, and working groups
- Slides
- Video
- Agenda
-
NEC + BSC Presentation (cont.)
- Matrix Tile Extension: Portable ISA For Vector-Integrated Matrix Unit
- Erich Focht (NEC) and Marc Casas (BSC)
-
NEC + BSC Presentation (cont.)
- Slides
- Video
- Agenda
- Chair/Vice-chair selection update
- Moving forward on qualitative analysis
- Computational Intensity
- Locality evaluation
-
NEC + BSC Presentation
- Matrix Tile Extension: Portable ISA For Vector-Integrated Matrix Unit
- Erich Focht (NEC) and Marc Casas (BSC)
- Slides
- Video
- Agenda
- Slides
- Video
- Agenda
- Revisiting the IME preliminary options
- Qualitative vs Quantitative approaches
- Metrics and workloads
Before the creation of the IME TG, we had a number of meetings at the SIG Vector group that covered material related to the IME architecture. In order to avoid people jumping from one TG to another, we copied those SIG Vector minutes related to IME architecture below.
- Video
- Agenda
- Update on status of Integrated Matrix Extensions (IME) Task Group proposal.
- Thoughts on variants for RISC-V matrix extensions
- Thoughts on sparsity support for RISC-V matrix extensions
- Video
- Agenda
- Update on status of Integrated Matrix Facility (IMF) Task Group proposal.
- Thoughts on variants for RISC-V matrix extensions
- Thoughts on sparsity support for RISC-V matrix extensions
- Slides
- Video
- Agenda
- Vote for proposing the creation of the Integrated Matrix Facility (IMF) Task Group
- Presentation by Abel Bernabeu on Option D for IMF
- Slides
- Video
- Agenda
- Continue overview of possible vector-matrix extension approaches - That is, matrix extensions that only use the current architected vector registers to store matrices; time permitting include comparison with attached matrix facility approach
- Slides
- Video
- Agenda
- Continue overview of possible vector-matrix extension approaches - That is, matrix extensions that only use the current architected vector registers to store matrices; time permitting include comparison with attached matrix facility approach
- Slides
- Agenda
- Overview of possible vector-matrix extension approaches - That is, matrix extensions that only use the current architected vector registers to store matrices