Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] ] util.py function std_positive(mean, std, minimal) not working as intended. #36

Open
basvdl97 opened this issue Dec 2, 2022 · 3 comments

Comments

@basvdl97
Copy link
Contributor

basvdl97 commented Dec 2, 2022

So I was looking at the std_positive function because I was wondering about the impact of the minimal on the distribution, and I noticed a weird effect when the minimal gets close to the mean.

import numpy as np
import random
import matplotlib.pyplot as plt

NEG_SCALE = -50
SCALE = 250
N = 10000

MEAN = 100
STD = 33
MINIMAL = 90

def std_positive(mean, std, minimal):
    if minimal > mean:
        raise Exception('minimal must be lower then mean')
    sample = np.random.normal(mean, std)
    if sample < minimal:
        sample += (minimal-sample) + (mean - minimal)*random.random()
    return sample

counts = {i: 0 for i in range(NEG_SCALE, SCALE)}
for i in range(N): 
    counts[int(std_positive(MEAN, STD, MINIMAL))] += 1
y = [counts[i] for i in range(NEG_SCALE, SCALE)]
x = [i for i in range(NEG_SCALE, SCALE)]

colors = [i for i in range(SCALE-NEG_SCALE)]
plt.scatter(x, y, c=colors, alpha=0.5)
plt.title(f"Distrubution of std_positive, mean={MEAN}, std={STD}, min={MINIMAL}")

Figure_1

@basvdl97
Copy link
Contributor Author

basvdl97 commented Dec 2, 2022

Currently working on submitting a solution. Adding a random number until ~3.33 std's get too jagged.

# -||-

def std_positive(mean, std, minimal):
    if minimal > mean:
        raise Exception('minimal must be lower then mean')
    sample = np.random.normal(mean, std)
    if sample < minimal:
        sample += (minimal-sample) + (3.333*std)*random.random()
    return sample

# -||-

Figure_2

Maybe we can find something that smooths it out, and doesn't shift the mean of the distribution too far.

@droefs
Copy link
Owner

droefs commented Dec 12, 2022

That's a very thorough investigation, thank you!

Maybe we can add some more randomness to solve the problem :)

sample += (minimal-sample) + (5 * random.random() * std) * random.random()

more_randomness

Complete code:

import numpy as np
import random
import matplotlib.pyplot as plt

NEG_SCALE = -50
SCALE = 250
N = 10000

MEAN = 100
STD = 33
MINIMAL = 90

def std_positive(mean, std, minimal):
    if minimal > mean:
        raise Exception('minimal must be lower then mean')
    sample = np.random.normal(mean, std)
    if sample < minimal:
        sample += (minimal-sample) + (3.333 * random.random() * std) * random.random()
    return sample

counts = {i: 0 for i in range(NEG_SCALE, SCALE)}
for i in range(N): 
    counts[int(std_positive(MEAN, STD, MINIMAL))] += 1
y = [counts[i] for i in range(NEG_SCALE, SCALE)]
x = [i for i in range(NEG_SCALE, SCALE)]

colors = [i for i in range(SCALE-NEG_SCALE)]
plt.scatter(x, y, c=colors, alpha=0.5)
plt.title(f"Distrubution of std_positive, mean={MEAN}, std={STD}, min={MINIMAL}")
plt.show()

@basvdl97
Copy link
Contributor Author

I like the smoothness of the solution, however I do wonder about your opinion on the missing rounding at the top which is expected from a normal distribution. Using your solution it is there when the minimal is further away from the mean. Like in this example:

image

But as you can see in your example it is missing, where as it should be there. I used this (dreaded while loop) example to show that there should be some rounding.

def std_positive(mean, std, minimal):
    if minimal > mean:
        raise Exception('minimal must be lower then mean')
    sample = np.random.normal(mean, std)
    while sample < minimal:
        sample = np.random.normal(mean, std)
    return sample

image

We would need a different way then the while solution to solve it cleanly.

Personally I would be fine with settling with the solution you provided but I don't know how thorough you want to be. I think the solution provided by you still contains the essence of the purpose of using the normal distribution aka "numbers closer to the mean occur more often".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants