Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate Data to ARFF Format #3

Open
mbernste opened this issue Dec 8, 2013 · 1 comment
Open

Translate Data to ARFF Format #3

mbernste opened this issue Dec 8, 2013 · 1 comment
Assignees

Comments

@mbernste
Copy link
Owner

mbernste commented Dec 8, 2013

Create script to translate data sets to ARFF format where continuous attributes are binned and missing values are handled (either imputed using expectation-maximization or simply discarded).

@ghost ghost assigned schulzca Dec 8, 2013
@schulzca
Copy link
Collaborator

schulzca commented Dec 9, 2013

I finished the script. I put it in data/src/data/arff/. Things I did that are open for discussion:

  • If an instance has a missing value, that instance is discarded (if we want to impute, this has to be done after the net is created. I think..)
  • Bins are only created if a feature has 15 or more unique, numeric values
  • 4 - 5 bins are created (depends on the number of unique values for the feature)
  • Bins are named 'X_Y' where X and Y are the range values of the bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants