Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort and filter for bin and group #495

Merged
merged 20 commits into from
Aug 11, 2021
Merged

sort and filter for bin and group #495

merged 20 commits into from
Aug 11, 2021

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Aug 11, 2021

This introduces a filter output for the bin transform which defaults to count such that by default only non-empty bins are returned. By setting the filter to null you can return all bins; by setting it to a function you can apply whatever test you like on the bins before any other output channels are evaluated. Like the other reducers, this is specified on outputs rather than options.

I’m not sure this is strictly better than #489. It does allow more fine-grained control over which bins to return, and it is faster than applying the filter transform after binning (since it skips evaluating the other channels if the bin will be dropped). However, it is likely (a tiny bit?) slower than #489 since we cannot special-case the test for empty bins. And it is less explicit than a dedicated empty option.

Supersedes #491.
Supersedes #489.

@mbostock mbostock requested a review from Fil August 11, 2021 00:11
@mbostock mbostock changed the title bin filter reducer bin filter output Aug 11, 2021
@mbostock mbostock changed the title bin filter output bin output filter Aug 11, 2021
@mbostock
Copy link
Member Author

This works well in the athletesHeightWeightBinStroke example: rather than setting the strokeOpacity to zero or one, we can filter the elements we don’t want to show.

This was referenced Aug 11, 2021
@mbostock
Copy link
Member Author

I see two limitations of this filter option vis à vis the sort option you propose in #334:

  1. I only added support to the bin transform, but we should probably also support the filter option on the group transform with the same semantics (although it still won’t be possible to return empty groups yet for the full Cartesian product).
  2. The filter option always takes the data as input; you can’t specify a filter input channel to pass to the reducer (e.g., if you want the sum of weights to be greater than zero).

I think both of these are possible and would love assistance.

Fil added a commit that referenced this pull request Aug 11, 2021
… by number of cylinders

The bins are sorted by decreasing r, so that they are all visible.

The example would benefit from stackR (#197).

It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).
@Fil Fil mentioned this pull request Aug 11, 2021
@Fil
Copy link
Contributor

Fil commented Aug 11, 2021

I like this: it solves the "availability" example which triggered the question, and makes athletesHeightWeightBinStroke better. (Note that the "data availability" test plot works equally well both in the "broken line" mode and "0 value" mode for empty bins; I kept the broken line version.)

wrt remark 1: In what situation would we generate empty groups?

  • When grouping on 1 dimension; if we're using z, we could want all the z subgroups to share a common base of groups, which would generate empty groups for certain values of z—but that's not working either with bins, as I illustrate in the cars mpg example of Cars MPG example plot #496.
  • When grouping on 2 dimensions; the cartesian product (x,y) (or (x,y,z) ?

I tend to think that it would be desirable in all these cases to be able to generate empty groups, as we would want to solve #351.

Remark 2 seems a bit similar to #472 (comment) ; but it makes me think, what if we did something totally different and passed the actual bin to a transform option?

  • the default might be transform: bin => bin.length ? bin : null which would remove empty bins.
  • using transform: bin => bin or transform: "identity" ("keep-empty"), would keep empty bins
  • using transform: bin => bin.length < 20 ? null : bin would do for athletesHeightWeightBinStroke
    I explore this in alternative to bin{filter} : bin {transform} #497

@mbostock
Copy link
Member Author

The transform option feels like new semantics (and could arguably be called a map option); if we’re going to support the sort option #334 then the filter option here feels more simpatico. And we can make filter and sort more similar if allow an input channel to the filter reducer, and extend the filter option to the group transform (though as I said, for group setting filter: null won’t have any effect because you can’t generate empty groups, but you could still use it to suppress other groups).

mbostock added a commit that referenced this pull request Aug 11, 2021
* This example plot computes the median of cars' economy (mpg), grouped by number of cylinders

The bins are sorted by decreasing r, so that they are all visible.

The example would benefit from stackR (#197).

It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).

* Update test/plots/cars-mpg.js

Co-authored-by: Mike Bostock <[email protected]>

* Update test/plots/cars-mpg.js

Co-authored-by: Mike Bostock <[email protected]>

* zero, not filter

* group, not bin

* remove console.log

* stroke, not fill

Co-authored-by: Mike Bostock <[email protected]>
mbostock and others added 5 commits August 11, 2021 09:10
* uses the filter option of the bin transform
* uses an explicit null if empty, or sum, accessor to create a broken line
@mbostock mbostock force-pushed the mbostock/empty-bins branch from 21aaf4d to eb9deba Compare August 11, 2021 16:10
@mbostock mbostock changed the title bin output filter sort and filter for bin and group Aug 11, 2021
@mbostock
Copy link
Member Author

The only thing that this is now missing is supporting the sort option being specified as a comparator rather than reducer. I think that’s possible: you’d compute the bins or groups for the sort’s input channel (or the data), and then you’d pass those arrays to the comparator during sorting. But it would require some extra finagling to get this to work, and it doesn’t seem urgent since the common case will be things like sorting by count or some other accessor. So, I’ll punt! Forward!

@mbostock mbostock merged commit 9a4d55a into main Aug 11, 2021
@mbostock mbostock deleted the mbostock/empty-bins branch August 11, 2021 19:46
This was referenced Aug 11, 2021
@Fil
Copy link
Contributor

Fil commented Aug 20, 2021

An image for the CHANGELOG
penguins-bins

@mbostock
Copy link
Member Author

I was thinking we could use the sfCovidDeaths example because that shows the necessity of returning empty bins.

@Fil
Copy link
Contributor

Fil commented Aug 20, 2021

Yes. But, can we use both? this example is a bit contrived but it shows the filter as a function that receives the bins.

@mbostock
Copy link
Member Author

I prefer less contrived if possible but I’ll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants