Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix xlsx parsing of empty cells with no formatting #591

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

agius
Copy link

@agius agius commented Apr 4, 2023

I found a spreadsheet in the wild with a bunch of empty-value cells.

<c r="K58" s="2" t="s">
  <v>65</v>
</c>
<c r="L58">
  <v/>
</c>
<c r="M58" t="s">
  <v>158</v>
</c>

Because there was no format specified for the cell, it had at least one child XML node, and there was no value, the default behavior was to treat the cell as a number type and try to parse the value (which was converted to empty-string). Using the kernel-level method Integer() , this threw an exception. The exception was also surprisingly hard to trace; it didn't have a full stacktrace in the wild and didn't appear in $! in IRB; only got a trace once I got it into RSpec.

Apple's "Numbers" program converts the cell to zero; Google Sheets leaves it blank. I'm not sure what Excel itself does with this kind of XML. This sheet was produced by KendoUI, some angular framework with tables, so it's probably not very common.

This change adds a nil check to the typecast, and returns 0 or 0.0 on empty-string values. I've included a spec and an example file. The file from the wild contains sensitive data, but I reproduced the empty-cell XML by hand-editing an export from Google Sheets.

I'm not 100% sure this is the right behavior, and I'm open to alternatives. Maybe we could check in SheetDoc and return an Empty cell type instead? Or add an option for "throw exception" vs "default to zero?" But it feels like since Numbers and Sheets can open these files, Roo probably shouldn't throw an exception here.

I found a spreadsheet in the wild with a bunch of empty-value cells.

```
<c r="K58" s="2" t="s">
  <v>65</v>
</c>
<c r="L58">
  <v/>
</c>
<c r="M58" t="s">
  <v>158</v>
</c>
```

Because there was no `format` specified for the cell, it was not an empty cell, and there was no value, the default behavior was to treat the cell as a `number` type and try to parse the value (which was converted to empty-string). Using the kernel-level method `Integer()` , this threw an exception. The exception was also surprisingly hard to trace; it didn't have a full stacktrace in the wild and didn't appear in `$!` in IRB.

Apple's "Numbers" program converts the cell to zero; Google Sheets leaves it blank. I'm not sure what Excel itself does with this kind of XML. This sheet was [produced by KendoUI](https://www.telerik.com/kendo-angular-ui/components/excel-export/), some angular framework with tables, so it's probably not very common.

This change adds a nil check to the typecast, and returns `0` or `0.0` on empty-string values. I'm not 100% sure this is the right behavior, and I'm open to alternatives. Maybe we could check in `SheetDoc` and return an `Empty` cell type instead? Or add an option for "throw exception" vs "default to zero?"

But it feels like since Numbers and Sheets can open these files, Roo probably shouldn't throw an exception here.
@agius
Copy link
Author

agius commented Apr 12, 2023

cc @patrickkulling ? Not sure who to ask for review on this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant