Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could order of columns affect performance of synthetic data quality? #65

Open
efstathios-chatzikyriakidis opened this issue Mar 3, 2024 · 2 comments

Comments

@efstathios-chatzikyriakidis
Copy link

efstathios-chatzikyriakidis commented Mar 3, 2024

Hi @avsolatorio!

Could order of columns (first categorical, then numerical/datetime) or the opposite (first numerical/datetime, then categorical) could affect quality of synthetic data? Furthermore in categorical could be ordered more by cardinality. Correlations exist on all columns and I am thinking if putting first the categoricals or not, or sorting categoricals by ascending or descending will allow better learning or not.

Thanks!

@echatzikyriakidis
Copy link

I have done some tests and it seems that it doesn't matter. Similar results observed for each possible case of first or last categorical columns and with increasing and decreasing cardinality as well.

@echatzikyriakidis
Copy link

Can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants