Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Fix the distribution of Options.Range.triangular() #4283

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Mysteryem
Copy link
Contributor

@Mysteryem Mysteryem commented Nov 29, 2024

What is this fixing or adding?

The original implementation was converting a continuous distribution of [a, b] to integers by rounding, but this results in the smallest and largest integers having half as many float values that round to them compared to other integers in the range.

For example, given the continuous range [0, 3]:
0.0 <= x < 0.5 rounds to 0 -> width of 0.5
0.5 <= x < 1.5 rounds to 1 -> width of 1.0
1.5 <= x < 2.5 rounds to 2 -> width of 1.0
2.5 <= x <= 3.0 rounds to 3 -> width of 0.5 (kind of plus an infinitesimal bit extra)

To convert to 4 integers uniformly, given a uniform continuous distribution, the width of the continuous distribution would have to be 4, e.g [-0.5, 3.5] or [0, 4].

This patch fixes the distribution of Options.Range.triangular() by increasing the width of the continuous distribution by 1. For simplicity, this changes the mode (tri) of the distribution to a float in [0.0, 1.0] where 0.0 corresponds to lower and 1.0 corresponds to end.

There is a near zero chance of random.triangular(a, b) returning exactly b when a != b, but this is accounted for by ensuring the returned value is never more than end.

How was this tested?

from collections import Counter
from Options import Range
# Change these values to your liking.
low = 0
high = 5
# For random-range-low, `tri=low`.
# For random-range-middle, `tri=None`,
# For random-range-high, `tri=high`.
tri = low
counts = Counter(Range.triangular(low, high, tri) for _ in range(1_000_000))
min_count = min(counts.values())
# Scale everything down such that the most infrequent value has a count of 1.0.
scaled_sorted_counts = {k: v/min_count for k, v in sorted(counts.items(), reverse=True, key=lambda t: t[1])}
print(*scaled_sorted_counts.items(), sep="\n")

Before

(low=0, high=5, tri=low).
1 and 2 end up more common than 0.
image

(1, 32.88061538461538)
(2, 24.562358974358975)
(0, 19.511794871794873)
(3, 16.38871794871795)
(4, 8.220615384615385)
(5, 1.0)

After

(low=0, high=5, tri=low).
0 is correctly the most common value and if you were to plot the points, you would get a straight line.
image
I used math.floor over the range [0,6] in the end so the diagram doesn't match with the code, but it should still illustrate the point.

(0, 11.004496888153398)
(1, 9.00525236536317)
(2, 6.9815087959132285)
(3, 4.989711119905026)
(4, 2.994136057847969)
(5, 1.0)

The original implementation was converting a continuous distribution of
[a, b] to integers by rounding, but this results in the smallest and
largest integers having half as many float values that round to them
compared to other integers in the range.

For example, given the continuous range [0, 3]:
0.0 <= x < 0.5 rounds to 0 -> width of 0.5
0.5 <= x < 1.5 rounds to 1 -> width of 1.0
1.5 <= x < 2.5 rounds to 2 -> width of 1.0
2.5 <= x <= 3.0 rounds to 3 -> width of 0.5 (kind of plus an
infinitesimal bit extra)

To convert to 4 integers uniformly given a uniform continuous
distribution, the width of the continuous distribution would have to be
4, e.g [-0.5, 3.5] or [0, 4].

This patch fixes the distribution of Options.Range.triangular() by
increasing the width of the continuous distribution by 1. This requires
adjusting the mode (`tri`) of the distribution to the new width of the
distribution and accounting for the near zero chance of
`random.triangular(a, b+1, adjusted_tri)` returning exactly `b+1`.
@github-actions github-actions bot added affects: core Issues/PRs that touch core and may need additional validation. waiting-on: peer-review Issue/PR has not been reviewed by enough people yet. labels Nov 29, 2024
Copy link
Contributor

@Bicoloursnake Bicoloursnake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, it's not every day I see a math heavy PR. I did some number crunching and can confirm that the previous version was biased against the endpoints, and that this PR would remove that bias.

Options.py Outdated
@@ -740,7 +740,51 @@ def __str__(self) -> str:

@staticmethod
def triangular(lower: int, end: int, tri: typing.Optional[int] = None) -> int:
return int(round(random.triangular(lower, end, tri), 0))
if lower == end:
return lower
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early return should go below the exception so that tri < lower or tri > end always raises an Exception, even when lower == end. I'll move it tomorrow.

@NewSoupVi
Copy link
Member

NewSoupVi commented Nov 29, 2024

So, this PR is correct mathematically from what I can tell

Now I like me some verbose code for some important functions, but this feels a bit overkill to me for what it is.

I was expecting to open the diff and see like, a + 1 and a - 0.5, or a + 1 and replacing round with floor. And then maybe a 1-2 line comment. :D

Are all of these additions necessary? Is end < lower an actual case we have?

@beauxq
Copy link
Collaborator

beauxq commented Nov 29, 2024

I think it should be questioned whether this change is good or not.
We have to remember that what users want, and what makes users happy, is more important than what is "correct".
If players have been using a different distribution for a long time, and they like that distribution, and we change that distribution to something they don't enjoy as much, then we're making it worse. It doesn't matter what's more "correct".

I've used random-low and random-high before a lot, and I wasn't working under the assumption that it's perfect triangular distribution.
I think it might make better gameplay experiences to lessen the extreme endpoints a bit.

@Exempt-Medic Exempt-Medic added the is: bug/fix Issues that are reporting bugs or pull requests that are fixing bugs. label Nov 29, 2024
@Exempt-Medic
Copy link
Member

Exempt-Medic commented Nov 29, 2024

So, this PR is correct mathematically from what I can tell

Now I like me some verbose code for some important functions, but this feels a bit overkill to me for what it is.

I was expecting to open the diff and see like, a + 1 and a - 0.5, or a + 1 and replacing round with floor. And then maybe a 1-2 line comment. :D

Are all of these additions necessary? Is end < lower an actual case we have?

Currently random-range-low-10-1 skews towards 10 which... I feel like we should just error and not "fix" it, since it's unclear what the user actually wanted. But I'm also very "error-happy"

@Exempt-Medic
Copy link
Member

Exempt-Medic commented Nov 29, 2024

I haven't really stepped through anything, but do you happen to know how well this considers negative numbers ?

@Mysteryem
Copy link
Contributor Author

So, this PR is correct mathematically from what I can tell

Now I like me some verbose code for some important functions, but this feels a bit overkill to me for what it is.

I was expecting to open the diff and see like, a + 1 and a - 0.5, or a + 1 and replacing round with floor. And then maybe a 1-2 line comment. :D

Are all of these additions necessary? Is end < lower an actual case we have?

I prefer to add more comments than I think are necessary for maths stuff to help those that are less mathematically inclined (or me when half asleep). Partially why I provided test code output and some graphs as a visual aid.

Core does not have end < lower, but none of these functions are marked as private, so removing that functionality results in a breaking API change (more so than removing tri > end or tri < lower which is already a breaking change).

Personally, I would do more breaking changes and rewrite Range.triangular and its usages so that it is expecting tri to always be in the range [0,1] and then the function could be squashed down to

    @staticmethod
    def triangular(lower: int, end: int, tri: float = 0.5) -> int:
        """
        Integer triangular distribution for `lower` inclusive to `end` inclusive.
        
        Expects `lower <= end` and `0 <= tri <= 1`. The result of other inputs is undefined.
        """
        # Use the continuous range [lower, end + 1) to produce an integer result in [lower, end].
        # random.triangular is actually [a, b] and not [a, b), so there is a very small chance of getting exactly b, so
        # ensure the result is never more than `end`.
        return min(end, math.floor(random.triangular(0, 1, tri) * (end - lower + 1) + lower))

Adding a

        if end < lower:
            end, lower = lower, end

at the start would additionally add support for end < lower.


I think it should be questioned whether this change is good or not. We have to remember that what users want, and what makes users happy, is more important than what is "correct". If players have been using a different distribution for a long time, and they like that distribution, and we change that distribution to something they don't enjoy as much, then we're making it worse. It doesn't matter what's more "correct".

I've used random-low and random-high before a lot, and I wasn't working under the assumption that it's perfect triangular distribution. I think it might make better gameplay experiences to lessen the extreme endpoints a bit.

The current distribution is pretty close to triangular for larger ranges, but more inaccurate for smaller ranges. If a random-low distribution that is more heavily weighted towards lower numbers, but doesn't weight the lowest number the highest could be preferred, then that could be discussed, but I think it is outside the scope of this PR.

image

@Mysteryem
Copy link
Contributor Author

Currently random-range-low-10-1 skews towards 10 which... I feel like we should just error and not "fix" it, since it's unclear what the user actually wanted. But I'm also very "error-happy"

Range.custom_range() sorts the two values before passing them to Range.triangular, so I think it would actually skew towards 1.

random_range.sort()

I haven't really stepped through anything, but do you happen to know how well this considers negative numbers ?

Yes, either lower or end being negative, or both is accounted for. This is why the code uses math.floor() to always truncate towards lower rather than int() which truncates towards 0.

@Exempt-Medic
Copy link
Member

Range.custom_range() sorts the two values before passing them to Range.triangular, so I think it would actually skew towards 1.

random_range.sort()

Then how can lower > end ?

@Mysteryem
Copy link
Contributor Author

Range.custom_range() sorts the two values before passing them to Range.triangular, so I think it would actually skew towards 1.

random_range.sort()

Then how can lower > end ?

A world can define their own options or use other code that calls Range.triangular() directly.

Options.py Outdated
if tri is not None and (tri < lower or tri > end):
# random.triangular allows this for performance reasons, but it is not well-defined/documented behaviour, so
# we'll reject this scenario for simplicity.
raise Exception(f"Triangular distribution mode {tri} is outside the allowed range {lower}-{end}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an OptionError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OptionError doesn't seem to be used much in this file. Range.__init__, Range.weighted_range and Range.custom_range can all raise Exception, so that's why I went with Exception here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just a lack of updating to use the new error

@Exempt-Medic
Copy link
Member

Then how can lower > end ?

A world can define their own options or use other code that calls Range.triangular() directly.

My personal two cents: this should not be a concern or something to try to handle. This would cut down a lot on these changes too. I get if you don't want to though

Now expects `lower <= end` and `tri` as a float: `0.0 <= tri <= 1.0`.
No warnings or exceptions will be raised if these expectations are not
met.
It's expected that we can get `end`, it's the b in [a, b] that we don't
want.
Comment on lines +742 to +751
def triangular(lower: int, end: int, tri: float = 0.5) -> int:
"""
Integer triangular distribution for `lower` inclusive to `end` inclusive.

Expects `lower <= end` and `0.0 <= tri <= 1.0`. The result of other inputs is undefined.
"""
# Use the continuous range [lower, end + 1) to produce an integer result in [lower, end].
# random.triangular is actually [a, b] and not [a, b), so there is a very small chance of getting exactly b even
# when a != b, so ensure the result is never more than `end`.
return min(end, math.floor(random.triangular(0.0, 1.0, tri) * (end - lower + 1) + lower))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lower" was getting confusing for me, you could use "start" instead, maybe? I think "mode" is also more descriptive than "tri" for this distribution.

Suggested change
def triangular(lower: int, end: int, tri: float = 0.5) -> int:
"""
Integer triangular distribution for `lower` inclusive to `end` inclusive.
Expects `lower <= end` and `0.0 <= tri <= 1.0`. The result of other inputs is undefined.
"""
# Use the continuous range [lower, end + 1) to produce an integer result in [lower, end].
# random.triangular is actually [a, b] and not [a, b), so there is a very small chance of getting exactly b even
# when a != b, so ensure the result is never more than `end`.
return min(end, math.floor(random.triangular(0.0, 1.0, tri) * (end - lower + 1) + lower))
def triangular(start: int, end: int, mode: float = 0.5) -> int:
"""
Integer triangular distribution from `start` inclusive to `end` inclusive.
Expects `start <= end` and `0.0 <= mode <= 1.0`. The result of other inputs is undefined.
"""
# Use the continuous range [start, end + 1) to produce an integer result in [start, end].
# random.triangular is actually [a, b] and not [a, b), so there is a very small chance of getting exactly b even
# when a != b, so ensure the result is never more than `end`.
return min(end, math.floor(random.triangular(0.0, 1.0, mode) * (end - start + 1) + start))

Copy link
Member

@NewSoupVi NewSoupVi Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be on board with this personally, however since tri is a kwarg, we'd have to make sure it's not passed as e.g. tri=0.8 anywhere

Copy link
Member

@Exempt-Medic Exempt-Medic Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? That seems like an intended thing to do. The default is 0,5, and then other values, like 0.8, a world could use itself when calling this

Copy link
Member

@Exempt-Medic Exempt-Medic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some test trials and saw the desired results/distribution come out. Also analyzed the PDF directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects: core Issues/PRs that touch core and may need additional validation. is: bug/fix Issues that are reporting bugs or pull requests that are fixing bugs. waiting-on: peer-review Issue/PR has not been reviewed by enough people yet.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants