Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update schema with missing values #154

Open
gperonato opened this issue Nov 26, 2020 · 2 comments
Open

Update schema with missing values #154

gperonato opened this issue Nov 26, 2020 · 2 comments

Comments

@gperonato
Copy link
Contributor

I have a source dataset with missing values corresponding to NULL.
In my flow, I use:
update_schema(None, missingValues=["NULL"])
The resulting datapackage.json has the missingValues field set as above, while the dumped files have empty fields (if I use CSV) or null (if I use JSON). Now I cannot parse the dumped file using the datapackage.json, as its schema corresponds to the original source file. Is this the expected behavior? Or is there another way of dealing with missing values?
I am sorry, this is probably a basic understanding question. Hope that someone can help.

@akariv
Copy link
Member

akariv commented Nov 26, 2020 via email

@gperonato
Copy link
Contributor Author

gperonato commented Nov 26, 2020

I was considering leaving the datapackage.json unchanged (i.e., with the updated schema), and preserving the missingValues in the dumped files. This because, if I update the schema in my flow I'd like to see that change in the output datapackage.json. But the disadvantage would be to have non-standard missingValues. So probably your approach gives a cleaner result. Either way would be ok, as long as it gives a schema that allows the parsing of the dumped file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants