Core: Fix the distribution of Options.Range.triangular()

The original implementation was converting a continuous distribution of [a, b] to integers by rounding, but this results in the smallest and largest integers having half as many float values that round to them compared to other integers in the range. For example, given the continuous range [0, 3]: 0.0 <= x < 0.5 rounds to 0 -> width of 0.5 0.5 <= x < 1.5 rounds to 1 -> width of 1.0 1.5 <= x < 2.5 rounds to 2 -> width of 1.0 2.5 <= x <= 3.0 rounds to 3 -> width of 0.5 (kind of plus an infinitesimal bit extra) To convert to 4 integers uniformly given a uniform continuous distribution, the width of the continuous distribution would have to be 4, e.g [-0.5, 3.5] or [0, 4]. This patch fixes the distribution of Options.Range.triangular() by increasing the width of the continuous distribution by 1. This requires adjusting the mode (`tri`) of the distribution to the new width of the distribution and accounting for the near zero chance of `random.triangular(a, b+1, adjusted_tri)` returning exactly `b+1`.
ArchipelagoMW · Nov 29, 2024 · dd55eb5 · dd55eb5
1 parent ce210cd
commit dd55eb5
Showing 1 changed file with 45 additions and 1 deletion.
diff --git a/Options.py b/Options.py
@@ -740,7 +740,51 @@ def __str__(self) -> str:
 
     @staticmethod
     def triangular(lower: int, end: int, tri: typing.Optional[int] = None) -> int:
-        return int(round(random.triangular(lower, end, tri), 0))
+        if lower == end:
+            return lower
+
+        if lower > end:
+            # Swap the two so that `lower` is always smaller. This simplifies later code.
+            lower, end = end, lower
+
+        if tri is not None and (tri < lower or tri > end):
+            # random.triangular allows this for performance reasons, but it is not well-defined/documented behaviour, so
+            # we'll reject this scenario for simplicity.
+            raise Exception(f"Triangular distribution mode {tri} is outside the allowed range {lower}-{end}")
+
+        # To produce integers from [a, b] from a continuous distribution, it is easier to start with a continuous
+        # distribution that is [a, b+1). For example, for lower=0 and end=2, the continuous distribution of [0, 3) can
+        # be split into 3 groups: 0 <= x < 1, 1 <= x < 2 and 2 <= x < 3.
+        new_end = end + 1
+        if tri is not None:
+            # `tri` needs to be remapped from the original [lower, end) range to the new [lower, new_end) range.
+            # Normalize to the range [0, 1).
+            # '[lower, end)' - lower = '[0, end - lower)'
+            # '[0, end - lower)' / (end - lower) = '[0, 1)'
+            tri_normalized = (tri - lower) / (end - lower)
+            # Scale up to fit the new range and then offset back by lower.
+            # '[0, 1)' * (new_end - lower) = '[0, new_end - lower)'
+            # '[0, new_end - lower)' + lower = '[lower, new_end)'
+            tri_rescaled = tri_normalized * (new_end - lower) + lower
+        else:
+            tri_rescaled = None
+
+        # To produce integers from these floats, truncate towards `lower` by using `math.floor`.
+        # Truncating with `int(my_float)` truncates towards `0`, so would not work correctly with `lower < 0`.
+        # Given the previous example for a continuous distribution of [0, 3):
+        # 0 <= x < 1 -> 0
+        # 1 <= x < 2 -> 1
+        # 2 <= x < 3 -> 2
+        r_int = math.floor(random.triangular(lower, new_end, tri_rescaled))
+
+        # Unlike, `random.random()` which is a-inclusive to b-exclusive, [a, b), `random.triangular()` is a-inclusive to
+        # b-inclusive, [a, b], so there is a chance of getting exactly b.
+        # With 1 million calls of `random.triangular(0.9999999999, 1, 1)` might return `1` a single time. `lower` and
+        # `end` are integers so are always at least 1 apart, so the chance of getting exactly `end` should be very
+        # small.
+        # Because `tri` has been limited to `lower <= tri <= end` and `lower` and `end` have been swapped if `lower`
+        # was greater than `end`, `r_int` only needs to be checked for being larger than `end`.
+        return min(end, r_int)
 
 
 class NamedRange(Range):