-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keyerror with higher degrees #21
Comments
Hi, thanks for your feedback. In terms of the keyError, can you provide more information? For example, if this keyError still gets raised when epsilon=0? And the complete error report from the python interpreter? Or any information that may be helpful. |
Hmmm very strange. Now I try to recreate the error and it seems to run. However, it takes several hours for one degree 4 Bayesian Network. Is this normal? Here is another error I faced when trying to recreate the error. Epsilon = 1 and degree of Bayesian Network = 5
|
Hi, the error is traced to def normalize_given_distribution(frequencies):
try:
distribution = np.array(frequencies, dtype=float)
distribution = distribution.clip(0) # replace negative values with 0
summation = distribution.sum()
if summation > 0:
if np.isinf(summation):
return normalize_given_distribution(np.isinf(distribution))
else:
return distribution / summation
else:
return np.full_like(distribution, 1 / distribution.size)
except:
raise Exception(f'An error happens when frequencies={frequencies}') |
I'm also getting key errors. `================ Constructing Bayesian Network (BN) ================
|
Please check out the latest code (commit 9f476eb), and see if this KeyError is fixed. |
@haoyueping How do I choose the value of K for constructing a bayesian network because my csv file contains 40 attributes |
@hamzanaeem1999 In theory, a higher value of k makes the Bayesian network more accurate, while a lower value of k reduces the time and space complexity. In practice, you can start with k = 1 and gradually increase k until you find a proper k. |
Thanks , kindly answer one more question that If my data is already in
numerical form , should there is a need of Categorical attributes to be
used .
What about epsilon should i increase it too ?
…On Thu, 25 Mar 2021, 23:44 Haoyue Ping, ***@***.***> wrote:
@hamzanaeem1999 <https://github.com/hamzanaeem1999> In theory, a higher
value of k makes the Bayesian network more accurate, while a lower value of
k reduces the time and space complexity. In practice, you can start with k
= 1 and gradually increase k until you find a proper k.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK4UDDGBOMVWLMPH5LXZSMLTFN773ANCNFSM4M5BN2RQ>
.
|
@hamzanaeem1999 DataSynthesizer works the best for categorical attributes. When it handles numerical values, it uses histograms to model the distribution, so it won't be accurate within each bin of the histogram. A greater epsilon value corresponds to less noise. So you need to try different epsilon values to make a tradeoff between privacy and utility. |
Then why use use only Education for categorical in your git while there are
other columns too which are in categorical !
…On Fri, 26 Mar 2021, 00:32 Haoyue Ping, ***@***.***> wrote:
@hamzanaeem1999 <https://github.com/hamzanaeem1999> DataSynthesizer works
the best for categorical attributes. When it handles numerical values, it
uses histograms to model the distribution, so it won't be accurate within
each bin of the histogram.
A greater epsilon value corresponds to less noise. So you need to try
different epsilon values to make a tradeoff between privacy and utility.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK4UDDACICZMJWGEIEXK7QDTFOFTPANCNFSM4M5BN2RQ>
.
|
@hamzanaeem1999 DataSynthesizer identifies categorical attributes by the parameter |
Hi,
Thank you so much for this! It's been a life saver. I got your model to run on one of my datasets, but I ran into a problem with higher degrees. With k = 2 and k = 3 models on my dataset, the code ran without bugs at several epsilons up to 2.5, but with k = 4 and higher, for all epsilons, this runs:
================ Constructing Bayesian Network (BN) ================
Adding ROOT accrued_holidays
Adding attribute org
Adding attribute office
Adding attribute start_date
Adding attribute bonus
Adding attribute birth_date
Adding attribute salary
Adding attribute title
Adding attribute gender
========================== BN constructed ==========================
But then the cell just freezes there until keyError (6,5,0,0) occurs
The text was updated successfully, but these errors were encountered: