Update schema with missing values #154

gperonato · 2020-11-26T09:25:11Z

I have a source dataset with missing values corresponding to NULL.
In my flow, I use:
update_schema(None, missingValues=["NULL"])
The resulting datapackage.json has the missingValues field set as above, while the dumped files have empty fields (if I use CSV) or null (if I use JSON). Now I cannot parse the dumped file using the datapackage.json, as its schema corresponds to the original source file. Is this the expected behavior? Or is there another way of dealing with missing values?
I am sorry, this is probably a basic understanding question. Hope that someone can help.

The text was updated successfully, but these errors were encountered:

akariv · 2020-11-26T09:45:00Z

You are right - I think the correct behaviour here should be to clear the missingValues field in the schema prior to writing the datapackage.json file. wdyt?

…

On Thu, Nov 26, 2020 at 11:25 AM Giuseppe Peronato ***@***.***> wrote: I have a source dataset with missing values corresponding to NULL. In my flow, I use: update_schema(None, missingValues=["NULL"]) The resulting datapackage.json has the missingValues field set as above, while the dumped files have empty fields (if I use CSV) or null (if I use JSON). Now I cannot parse the dumped file using the datapackage.json, as its schema corresponds to the original source file. Is this the expected behavior? Or is there another way of dealing with missing values? I am sorry, this is probably a basic understanding question. Hope that someone can help. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#154>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACAY5PFLW345BWUE22YPUTSRYNIPANCNFSM4UDO57FQ> .

gperonato · 2020-11-26T09:53:23Z

I was considering leaving the datapackage.json unchanged (i.e., with the updated schema), and preserving the missingValues in the dumped files. This because, if I update the schema in my flow I'd like to see that change in the output datapackage.json. But the disadvantage would be to have non-standard missingValues. So probably your approach gives a cleaner result. Either way would be ok, as long as it gives a schema that allows the parsing of the dumped file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update schema with missing values #154

Update schema with missing values #154

gperonato commented Nov 26, 2020

akariv commented Nov 26, 2020 via email

gperonato commented Nov 26, 2020 •

edited

Loading

Update schema with missing values #154

Update schema with missing values #154

Comments

gperonato commented Nov 26, 2020

akariv commented Nov 26, 2020 via email

gperonato commented Nov 26, 2020 • edited Loading

gperonato commented Nov 26, 2020 •

edited

Loading