-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fix the distribution of Options.Range.triangular() #4283
base: main
Are you sure you want to change the base?
Core: Fix the distribution of Options.Range.triangular() #4283
Conversation
The original implementation was converting a continuous distribution of [a, b] to integers by rounding, but this results in the smallest and largest integers having half as many float values that round to them compared to other integers in the range. For example, given the continuous range [0, 3]: 0.0 <= x < 0.5 rounds to 0 -> width of 0.5 0.5 <= x < 1.5 rounds to 1 -> width of 1.0 1.5 <= x < 2.5 rounds to 2 -> width of 1.0 2.5 <= x <= 3.0 rounds to 3 -> width of 0.5 (kind of plus an infinitesimal bit extra) To convert to 4 integers uniformly given a uniform continuous distribution, the width of the continuous distribution would have to be 4, e.g [-0.5, 3.5] or [0, 4]. This patch fixes the distribution of Options.Range.triangular() by increasing the width of the continuous distribution by 1. This requires adjusting the mode (`tri`) of the distribution to the new width of the distribution and accounting for the near zero chance of `random.triangular(a, b+1, adjusted_tri)` returning exactly `b+1`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, it's not every day I see a math heavy PR. I did some number crunching and can confirm that the previous version was biased against the endpoints, and that this PR would remove that bias.
Options.py
Outdated
@@ -740,7 +740,51 @@ def __str__(self) -> str: | |||
|
|||
@staticmethod | |||
def triangular(lower: int, end: int, tri: typing.Optional[int] = None) -> int: | |||
return int(round(random.triangular(lower, end, tri), 0)) | |||
if lower == end: | |||
return lower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This early return should go below the exception so that tri < lower
or tri > end
always raises an Exception, even when lower == end
. I'll move it tomorrow.
So, this PR is correct mathematically from what I can tell Now I like me some verbose code for some important functions, but this feels a bit overkill to me for what it is. I was expecting to open the diff and see like, a Are all of these additions necessary? Is end < lower an actual case we have? |
I think it should be questioned whether this change is good or not. I've used random-low and random-high before a lot, and I wasn't working under the assumption that it's perfect triangular distribution. |
Currently |
I haven't really stepped through anything, but do you happen to know how well this considers negative numbers ? |
I prefer to add more comments than I think are necessary for maths stuff to help those that are less mathematically inclined (or me when half asleep). Partially why I provided test code output and some graphs as a visual aid. Core does not have end < lower, but none of these functions are marked as private, so removing that functionality results in a breaking API change (more so than removing tri > end or tri < lower which is already a breaking change). Personally, I would do more breaking changes and rewrite @staticmethod
def triangular(lower: int, end: int, tri: float = 0.5) -> int:
"""
Integer triangular distribution for `lower` inclusive to `end` inclusive.
Expects `lower <= end` and `0 <= tri <= 1`. The result of other inputs is undefined.
"""
# Use the continuous range [lower, end + 1) to produce an integer result in [lower, end].
# random.triangular is actually [a, b] and not [a, b), so there is a very small chance of getting exactly b, so
# ensure the result is never more than `end`.
return min(end, math.floor(random.triangular(0, 1, tri) * (end - lower + 1) + lower)) Adding a if end < lower:
end, lower = lower, end at the start would additionally add support for end < lower.
The current distribution is pretty close to triangular for larger ranges, but more inaccurate for smaller ranges. If a random-low distribution that is more heavily weighted towards lower numbers, but doesn't weight the lowest number the highest could be preferred, then that could be discussed, but I think it is outside the scope of this PR. |
Line 714 in 6f2464d
Yes, either |
Then how can lower > end ? |
A world can define their own options or use other code that calls Range.triangular() directly. |
Options.py
Outdated
if tri is not None and (tri < lower or tri > end): | ||
# random.triangular allows this for performance reasons, but it is not well-defined/documented behaviour, so | ||
# we'll reject this scenario for simplicity. | ||
raise Exception(f"Triangular distribution mode {tri} is outside the allowed range {lower}-{end}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an OptionError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OptionError
doesn't seem to be used much in this file. Range.__init__
, Range.weighted_range
and Range.custom_range
can all raise Exception
, so that's why I went with Exception
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's just a lack of updating to use the new error
My personal two cents: this should not be a concern or something to try to handle. This would cut down a lot on these changes too. I get if you don't want to though |
Now expects `lower <= end` and `tri` as a float: `0.0 <= tri <= 1.0`. No warnings or exceptions will be raised if these expectations are not met.
It's expected that we can get `end`, it's the b in [a, b] that we don't want.
def triangular(lower: int, end: int, tri: float = 0.5) -> int: | ||
""" | ||
Integer triangular distribution for `lower` inclusive to `end` inclusive. | ||
|
||
Expects `lower <= end` and `0.0 <= tri <= 1.0`. The result of other inputs is undefined. | ||
""" | ||
# Use the continuous range [lower, end + 1) to produce an integer result in [lower, end]. | ||
# random.triangular is actually [a, b] and not [a, b), so there is a very small chance of getting exactly b even | ||
# when a != b, so ensure the result is never more than `end`. | ||
return min(end, math.floor(random.triangular(0.0, 1.0, tri) * (end - lower + 1) + lower)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"lower" was getting confusing for me, you could use "start" instead, maybe? I think "mode" is also more descriptive than "tri" for this distribution.
def triangular(lower: int, end: int, tri: float = 0.5) -> int: | |
""" | |
Integer triangular distribution for `lower` inclusive to `end` inclusive. | |
Expects `lower <= end` and `0.0 <= tri <= 1.0`. The result of other inputs is undefined. | |
""" | |
# Use the continuous range [lower, end + 1) to produce an integer result in [lower, end]. | |
# random.triangular is actually [a, b] and not [a, b), so there is a very small chance of getting exactly b even | |
# when a != b, so ensure the result is never more than `end`. | |
return min(end, math.floor(random.triangular(0.0, 1.0, tri) * (end - lower + 1) + lower)) | |
def triangular(start: int, end: int, mode: float = 0.5) -> int: | |
""" | |
Integer triangular distribution from `start` inclusive to `end` inclusive. | |
Expects `start <= end` and `0.0 <= mode <= 1.0`. The result of other inputs is undefined. | |
""" | |
# Use the continuous range [start, end + 1) to produce an integer result in [start, end]. | |
# random.triangular is actually [a, b] and not [a, b), so there is a very small chance of getting exactly b even | |
# when a != b, so ensure the result is never more than `end`. | |
return min(end, math.floor(random.triangular(0.0, 1.0, mode) * (end - start + 1) + start)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be on board with this personally, however since tri is a kwarg, we'd have to make sure it's not passed as e.g. tri=0.8
anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? That seems like an intended thing to do. The default is 0,5, and then other values, like 0.8, a world could use itself when calling this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some test trials and saw the desired results/distribution come out. Also analyzed the PDF directly.
What is this fixing or adding?
The original implementation was converting a continuous distribution of [a, b] to integers by rounding, but this results in the smallest and largest integers having half as many float values that round to them compared to other integers in the range.
For example, given the continuous range [0, 3]:
0.0 <= x < 0.5 rounds to 0 -> width of 0.5
0.5 <= x < 1.5 rounds to 1 -> width of 1.0
1.5 <= x < 2.5 rounds to 2 -> width of 1.0
2.5 <= x <= 3.0 rounds to 3 -> width of 0.5 (kind of plus an infinitesimal bit extra)
To convert to 4 integers uniformly, given a uniform continuous distribution, the width of the continuous distribution would have to be 4, e.g [-0.5, 3.5] or [0, 4].
This patch fixes the distribution of Options.Range.triangular() by increasing the width of the continuous distribution by 1. For simplicity, this changes the mode (
tri
) of the distribution to a float in [0.0, 1.0] where0.0
corresponds tolower
and1.0
corresponds toend
.There is a near zero chance of
random.triangular(a, b)
returning exactlyb
whena != b
, but this is accounted for by ensuring the returned value is never more thanend
.How was this tested?
Before
(low=0, high=5, tri=low).
1 and 2 end up more common than 0.
After
(low=0, high=5, tri=low).
0 is correctly the most common value and if you were to plot the points, you would get a straight line.
I used math.floor over the range [0,6] in the end so the diagram doesn't match with the code, but it should still illustrate the point.