-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a "full series" option to the stack transforms #351
Comments
… by number of cylinders The bins are sorted by decreasing r, so that they are all visible. The example would benefit from stackR (#197). It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).
* This example plot computes the median of cars' economy (mpg), grouped by number of cylinders The bins are sorted by decreasing r, so that they are all visible. The example would benefit from stackR (#197). It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard). * Update test/plots/cars-mpg.js Co-authored-by: Mike Bostock <[email protected]> * Update test/plots/cars-mpg.js Co-authored-by: Mike Bostock <[email protected]> * zero, not filter * group, not bin * remove console.log * stroke, not fill Co-authored-by: Mike Bostock <[email protected]>
Datadog calls this “default zero” interpolation: https://docs.datadoghq.com/dashboards/functions/interpolation/#default-zero I wonder to what degree this is specific to time series. I can certainly imagine cases where it’s not specific to time series, but when it is, it seems like the bin transform with filter: null is an option for fixing the missing data. Edit: Okay, the example histogram you made is pretty convincing that we shouldn’t think of this as only a time-series problem. (Also in a related irony, this Cloud Costs notebook demonstrates the problem, but has another problem of time being represented as ordinal strings.) |
It's worse than this: using the empty bin approach is necessary for a continuous ("binnable") domain, but far from sufficient—as soon as you have z or facets, you need a point (real or fake) for each element of the domain times each of the series. |
But the bin domain is the same across all groups and facets, so as long as you have at least one data point in a given group, you’ll get all the bins? |
In https://observablehq.com/d/f6a7975f2ad4519a there is just one empty bin (4,750 in in the chinstrap facet), which should be mapped to 0 for data fidelity. If we push up the number of bins to 200, we start to see that issue creeping everywhere — outlined in red in the image below, all the areas should drop to zero since there is no data point in this position. (I'm not sure that we can find a generic way to do both operations, maybe imputing missing values is something that should be left to the data-wrangling section?) |
But that example doesn’t use filter: null on the bin transform, right? |
closing since this example is solved with filter: null. |
See discussion at #325 and #348 (comment)
The text was updated successfully, but these errors were encountered: