Fix xlsx parsing of empty cells with no formatting #591
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I found a spreadsheet in the wild with a bunch of empty-value cells.
Because there was no
format
specified for the cell, it had at least one child XML node, and there was no value, the default behavior was to treat the cell as anumber
type and try to parse the value (which was converted to empty-string). Using the kernel-level methodInteger()
, this threw an exception. The exception was also surprisingly hard to trace; it didn't have a full stacktrace in the wild and didn't appear in$!
in IRB; only got a trace once I got it into RSpec.Apple's "Numbers" program converts the cell to zero; Google Sheets leaves it blank. I'm not sure what Excel itself does with this kind of XML. This sheet was produced by KendoUI, some angular framework with tables, so it's probably not very common.
This change adds a nil check to the typecast, and returns
0
or0.0
on empty-string values. I've included a spec and an example file. The file from the wild contains sensitive data, but I reproduced the empty-cell XML by hand-editing an export from Google Sheets.I'm not 100% sure this is the right behavior, and I'm open to alternatives. Maybe we could check in
SheetDoc
and return anEmpty
cell type instead? Or add an option for "throw exception" vs "default to zero?" But it feels like since Numbers and Sheets can open these files, Roo probably shouldn't throw an exception here.