From 4eec7c61cbe4e89aa08d6f97c2bed763ab7a0c07 Mon Sep 17 00:00:00 2001 From: xuchenhan-tri Date: Fri, 12 Jul 2024 14:35:25 -0700 Subject: [PATCH] Typo fix in `branchless.md` --- content/english/hpc/pipelining/branchless.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/english/hpc/pipelining/branchless.md b/content/english/hpc/pipelining/branchless.md index 31bd5a39..d651eec0 100644 --- a/content/english/hpc/pipelining/branchless.md +++ b/content/english/hpc/pipelining/branchless.md @@ -228,7 +228,7 @@ for (int i = 0; i < N; i++) s += a[i]; ``` -It now works in ~0.3 per element, which is mainly [bottlenecked by the memory](/hpc/cpu-cache/bandwidth). +It now works in ~0.3 cycles per element, which is mainly [bottlenecked by the memory](/hpc/cpu-cache/bandwidth). The compiler is usually able to vectorize any loop that doesn't have branches or dependencies between the iterations — and some specific small deviations from that, such as [reductions](/hpc/simd/reduction) or simple loops that contain just one if-without-else. Vectorization of anything more complex is a very nontrivial problem, which may involve various techniques such as [masking](/hpc/simd/masking) and [in-register permutations](/hpc/simd/shuffling).