From 4eec7c61cbe4e89aa08d6f97c2bed763ab7a0c07 Mon Sep 17 00:00:00 2001
From: xuchenhan-tri <xuchen.han@tri.global>
Date: Fri, 12 Jul 2024 14:35:25 -0700
Subject: [PATCH] Typo fix in `branchless.md`

---
 content/english/hpc/pipelining/branchless.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/english/hpc/pipelining/branchless.md b/content/english/hpc/pipelining/branchless.md
index 31bd5a39..d651eec0 100644
--- a/content/english/hpc/pipelining/branchless.md
+++ b/content/english/hpc/pipelining/branchless.md
@@ -228,7 +228,7 @@ for (int i = 0; i < N; i++)
         s += a[i];
 ```
 
-It now works in ~0.3 per element, which is mainly [bottlenecked by the memory](/hpc/cpu-cache/bandwidth).
+It now works in ~0.3 cycles per element, which is mainly [bottlenecked by the memory](/hpc/cpu-cache/bandwidth).
 
 The compiler is usually able to vectorize any loop that doesn't have branches or dependencies between the iterations — and some specific small deviations from that, such as [reductions](/hpc/simd/reduction) or simple loops that contain just one if-without-else. Vectorization of anything more complex is a very nontrivial problem, which may involve various techniques such as [masking](/hpc/simd/masking) and [in-register permutations](/hpc/simd/shuffling).