-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update schema with missing values #154
Comments
You are right -
I think the correct behaviour here should be to clear the missingValues
field in the schema prior to writing the datapackage.json file.
wdyt?
…On Thu, Nov 26, 2020 at 11:25 AM Giuseppe Peronato ***@***.***> wrote:
I have a source dataset with missing values corresponding to NULL.
In my flow, I use:
update_schema(None, missingValues=["NULL"])
The resulting datapackage.json has the missingValues field set as above,
while the dumped files have empty fields (if I use CSV) or null (if I use
JSON). Now I cannot parse the dumped file using the datapackage.json, as
its schema corresponds to the original source file. Is this the expected
behavior? Or is there another way of dealing with missing values?
I am sorry, this is probably a basic understanding question. Hope that
someone can help.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#154>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACAY5PFLW345BWUE22YPUTSRYNIPANCNFSM4UDO57FQ>
.
|
I was considering leaving the datapackage.json unchanged (i.e., with the updated schema), and preserving the missingValues in the dumped files. This because, if I update the schema in my flow I'd like to see that change in the output datapackage.json. But the disadvantage would be to have non-standard missingValues. So probably your approach gives a cleaner result. Either way would be ok, as long as it gives a schema that allows the parsing of the dumped file. |
I have a source dataset with missing values corresponding to
NULL
.In my flow, I use:
update_schema(None, missingValues=["NULL"])
The resulting
datapackage.json
has themissingValues
field set as above, while the dumped files have empty fields (if I use CSV) ornull
(if I use JSON). Now I cannot parse the dumped file using thedatapackage.json
, as its schema corresponds to the original source file. Is this the expected behavior? Or is there another way of dealing with missing values?I am sorry, this is probably a basic understanding question. Hope that someone can help.
The text was updated successfully, but these errors were encountered: