Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group aesthetics #761

Merged
merged 6 commits into from
Feb 17, 2022
Merged

group aesthetics #761

merged 6 commits into from
Feb 17, 2022

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Feb 16, 2022

This PR allows areas and lines to have varying colors and other aesthetics, such as titles. For example, here is a line with varying stroke and stroke width:

Screen Shot 2022-02-16 at 2 06 41 PM

And here is an area with varying fill:

Screen Shot 2022-02-16 at 2 35 26 PM

This works by first grouping the index by Z, as before, and then iterating over the values. If the aesthetics for the current value do not match the previous value, a new group is created. The current index is repeated in the old and new groups. In effect this means that the aesthetics for the current value apply to the current value and the curve extending to the following value. (In the future we could perhaps have an option to control this behavior.)

A side-effect of this change is that when some data is undefined, multiple path elements are generated. This is because undefined data and varying aesthetics are handled within the same loop, rather than relying on d3-shape’s line.defined and area.defined. It would probably be slightly better to use a single path element where possible, but this would complicate the grouping. (We’d need some other way to signal to D3 how to interrupt the path.) I figured out how to do this using -1 to indicate undefined data.

The area and line marks now implement their own filtering for all channels, rather than just for the positional channels. I’m not super happy with this aspect of the design and would like to think about whether I can make it better. This also necessitated the axis marks implementing mark.filter, which I did by way of introducing a Decoration base class. I took another pass at this and am now reasonably satisfied.

Fixes #760. Fixes #724. Fixes #155.

@mbostock mbostock requested a review from Fil February 16, 2022 22:56
@mbostock mbostock changed the title group aeshetics group aesthetics Feb 16, 2022
@mbostock mbostock force-pushed the mbostock/group-aesthetics branch from 6f8a77c to 0650bef Compare February 16, 2022 22:56
@mbostock mbostock force-pushed the mbostock/group-aesthetics branch 2 times, most recently from c6eef2e to 6b3a578 Compare February 16, 2022 23:32
@mbostock mbostock force-pushed the mbostock/group-aesthetics branch from 6b3a578 to 35c0589 Compare February 16, 2022 23:38
@mbostock mbostock force-pushed the mbostock/group-aesthetics branch from c0c96e5 to 0cab3cf Compare February 17, 2022 00:45
@mbostock
Copy link
Member Author

mbostock commented Feb 17, 2022

I could maybe make this smarter regarding z if I defer the implicit z determination until render. (As an added bonus, this means that it won’t compute z separately from the channel that it is fill or stroke channel it is derived from.) In render, if the z option was undefined (but not null), then check the fill or stroke channel and also test whether the color scale is ordinal: the fill or stroke will only be promoted to z if the color scale is ordinal. That might be a little magical, though…

Update: I got pretty close with this.

diff --git a/src/marks/area.js b/src/marks/area.js
index ed0c1562..594108c5 100644
--- a/src/marks/area.js
+++ b/src/marks/area.js
@@ -1,7 +1,7 @@
 import {area as shapeArea, create} from "d3";
 import {Curve} from "../curve.js";
 import {Mark} from "../plot.js";
-import {indexOf, maybeZ} from "../options.js";
+import {indexOf} from "../options.js";
 import {applyDirectStyles, applyIndirectStyles, applyTransform, applyGroupedChannelStyles, groupIndex} from "../style.js";
 import {maybeIdentityX, maybeIdentityY} from "../transforms/identity.js";
 import {maybeStackX, maybeStackY} from "../transforms/stack.js";
@@ -15,7 +15,7 @@ const defaults = {
 
 export class Area extends Mark {
   constructor(data, options = {}) {
-    const {x1, y1, x2, y2, curve, tension} = options;
+    const {x1, y1, x2, y2, z, curve, tension} = options;
     super(
       data,
       [
@@ -23,21 +23,23 @@ export class Area extends Mark {
         {name: "y1", value: y1, scale: "y"},
         {name: "x2", value: x2, scale: "x", optional: true},
         {name: "y2", value: y2, scale: "y", optional: true},
-        {name: "z", value: maybeZ(options), optional: true}
+        {name: "z", value: z, optional: true}
       ],
       options,
       defaults
     );
+    this.z = z;
     this.curve = Curve(curve, tension);
   }
-  render(I, {x, y}, channels, dimensions) {
+  render(I, scales, channels, dimensions) {
+    const {x, y} = scales;
     const {x1: X1, y1: Y1, x2: X2 = X1, y2: Y2 = Y1} = channels;
     const {dx, dy} = this;
     return create("svg:g")
         .call(applyIndirectStyles, this, dimensions)
         .call(applyTransform, x, y, dx, dy)
         .call(g => g.selectAll()
-          .data(groupIndex(I, [X1, Y1, X2, Y2], channels))
+          .data(groupIndex(I, [X1, Y1, X2, Y2], this, scales, channels))
           .join("path")
             .call(applyDirectStyles, this)
             .call(applyGroupedChannelStyles, this, channels)
diff --git a/src/marks/line.js b/src/marks/line.js
index 540ac3eb..1e278722 100644
--- a/src/marks/line.js
+++ b/src/marks/line.js
@@ -1,7 +1,7 @@
 import {create, line as shapeLine} from "d3";
 import {Curve} from "../curve.js";
 import {Mark} from "../plot.js";
-import {indexOf, identity, maybeTuple, maybeZ} from "../options.js";
+import {indexOf, identity, maybeTuple} from "../options.js";
 import {applyDirectStyles, applyIndirectStyles, applyTransform, applyGroupedChannelStyles, offset, groupIndex} from "../style.js";
 import {applyGroupedMarkers, markers} from "./marker.js";
 
@@ -16,28 +16,30 @@ const defaults = {
 
 export class Line extends Mark {
   constructor(data, options = {}) {
-    const {x, y, curve, tension} = options;
+    const {x, y, z, curve, tension} = options;
     super(
       data,
       [
         {name: "x", value: x, scale: "x"},
         {name: "y", value: y, scale: "y"},
-        {name: "z", value: maybeZ(options), optional: true}
+        {name: "z", value: z, optional: true}
       ],
       options,
       defaults
     );
+    this.z = z;
     this.curve = Curve(curve, tension);
     markers(this, options);
   }
-  render(I, {x, y}, channels, dimensions) {
+  render(I, scales, channels, dimensions) {
+    const {x, y} = scales;
     const {x: X, y: Y} = channels;
     const {dx, dy} = this;
     return create("svg:g")
         .call(applyIndirectStyles, this, dimensions)
         .call(applyTransform, x, y, offset + dx, offset + dy)
         .call(g => g.selectAll()
-          .data(groupIndex(I, [X, Y], channels))
+          .data(groupIndex(I, [X, Y], this, scales, channels))
           .join("path")
             .call(applyDirectStyles, this)
             .call(applyGroupedChannelStyles, this, channels)
diff --git a/src/scales.js b/src/scales.js
index 73e5e8df..060ace17 100644
--- a/src/scales.js
+++ b/src/scales.js
@@ -55,6 +55,7 @@ export function Scales(channels, {
           scale.insetTop = +insetTop;
           scale.insetBottom = +insetBottom;
         }
+        scale.scale.type = scale.type; // allow marks to check scale type
         scales[key] = scale;
       }
     }
diff --git a/src/style.js b/src/style.js
index 751eed74..e1abc958 100644
--- a/src/style.js
+++ b/src/style.js
@@ -2,6 +2,7 @@ import {group, isoFormat, namespaces} from "d3";
 import {defined, nonempty} from "./defined.js";
 import {formatNumber} from "./format.js";
 import {string, number, maybeColorChannel, maybeNumberChannel, isTemporal, isNumeric, keyof} from "./options.js";
+import {isOrdinalScale} from "./scales.js";
 
 export const offset = typeof window !== "undefined" && window.devicePixelRatio > 1 ? 0 : 0.5;
 
@@ -180,14 +181,21 @@ function groupAesthetics({ariaLabel: AL, title: T, fill: F, fillOpacity: FO, str
   return [AL, T, F, FO, S, SO, SW, O, H].filter(c => c !== undefined);
 }
 
-export function* groupIndex(I, position, channels) {
-  const {z: Z} = channels; // group channel
+function groupZ({z}, {color}, {z: Z, fill: F, stroke: S}) {
+  return Z !== undefined ? Z
+      : z !== undefined || !color || !isOrdinalScale(color) ? null
+      : F !== undefined ? F
+      : S;
+}
+
+export function* groupIndex(I, position, mark, scales, channels) {
+  const Z = groupZ(mark, scales, channels); // group channel
   const A = groupAesthetics(channels); // aesthetic channels
   const C = [...position, ...A]; // all channels
 
   // Group the current index by Z (if any).
   for (const G of Z ? group(I, i => Z[i]).values() : [I]) {
-    let Ag; // the A-values (aesthetics) of the current group, if any
+    let Ag; // the aesthetic values of the current group (a map of A), if any
     let Gg; // the current group index (a subset of G, and I), if any
     out: for (const i of G) {
 

However, the problem is that by deferring the detection of z until render, the scales have already been applied to the fill and stroke channel. And if you have an ordinal color scale, it can recycle values. So, you’ll think you only have 10 series when in fact you should have 15.

It would be possible to do value-based inference in the Line and Area constructors: they could greedily compute the values of the fill or stroke channel if z is not defined, and check whether they appear to be ordinal. But I’m not a big fan of this approach as scales should be the primary determinant of how data is interpreted.

So, I think it’s best to leave the behavior as-is, and expect folks to set z explicitly if they want varying aesthetics.

@Fil
Copy link
Contributor

Fil commented Feb 17, 2022

Re: previous comment: is this "z determination" issue an indication that the way we scale channels (by replacing the data with the scaled data) should be updated? We could pass both versions (scaled and unscaled) to render, layout etc. This could be done without breaking the current model (for example the unscaled x could be accessed through _x).

Similarly, doesn't the scale.scale.type = scale.type; // allow marks to check scale type change indicate that we should pass the scale descriptors rather than the scale functions? (It would be a slightly breaking API change, but I think it would allow "smarter" layouts and renders.)

As a more general comment on this PR, I love the clean and radical genericity of the approach!

A few caveats:

  Plot.line(data, Plot.map({stroke: data => new Float64Array(data.length).fill(d3.median(data))}, {x: "date", y: "unemployment", stroke: "unemployment", z: "division"})),

(in my understanding #724 is about deciding on a better API to express this type of "map/reduce uniform channel", and secondarily about optimization).

  • It takes a moment to remember that if you specify a variable fill (especially a continuous value) you need to specify z explicitly or the chart might display nothing (or group on quantized colors); maybe this could be added as a warning? (add to More warnings #755)

  • Having to specify x: {round: true} to avoid the anti-aliasing slivers sort of works for areaY, but the way the lines are disjoint doesn't work well with a large stroke-width: in the example below, the line is colored by y and has strokeWidth: 12:

Capture d’écran 2022-02-17 à 12 00 37

  • Once segmented by fill color, an area cannot use a stroke; this is easy to fix by adding a separate area mark on top, with fill: "none", but then the curves might be different (some curves hold: bumpX, step… others need a few points "before" and "after" the first and last points). More generally, the loss of some curves is a bit uncomfortable; I wonder if we could sample the (full) curve and create intermediary control points?

To try and address the two latter issues, I think we should consider having groupIndex return nested z-groups of segments (#764). Even if we don't change the area and line mark, it would make it easier to create an alternative line mark (maybe one that respects the curves, or even just to manipulate the segmented lines one by one with D3 or in an SVG editor).

@Fil Fil mentioned this pull request Feb 17, 2022
@mbostock
Copy link
Member Author

doesn't the scale.scale.type = scale.type; // allow marks to check scale type change indicate that we should pass the scale descriptors rather than the scale functions

Maybe. Previously the channel values were not passed-in as scaled, so the marks had to apply the scales and I wanted more concise syntax. Now that the channels are passed to marks already scaled, it’s probably equally convenient (and more informational) to pass the whole scale descriptor. However, we shouldn’t pass the internal scale descriptor as-is because that is an internal API—we would need to make it match the same public API that we return in plot.scale.

I don't think it closes #724 and #155

It does in the sense that we won’t need to support these features. The idea is that now that area and line no longer do reducing, you don’t need to specify a reducer. I suppose we could still support these features, but it seems less important if the area and line marks no longer arbitrarily apply the first reducer.

It takes a moment to remember that if you specify a variable fill (especially a continuous value) you need to specify z explicitly or the chart might display nothing (or group on quantized colors); maybe this could be added as a warning?

That would be nice, but I’m not sure what heuristic we would use. This has similar challenges to making the implicit z smarter based on whether the fill or stroke is ordinal. We shouldn’t warn for all cases of implicit z; it’s only a problem if the fill or stroke is continuous and there’s an undeclared channel that is also ordinal or categorical. You could have a warning for a line that has non-monotonic x-values to detect when it loops back, but in the case of connected scatterplots this behavior is expected.

the lines are disjoint doesn't work well with a large stroke-width

This is the expected behavior and you can fix it with a round strokeLinecap. It’s worth considering whether round should be the default (and perhaps whether round should also be the default strokeLinejoin).

Once segmented by fill color, an area cannot use a stroke

True, but using stroke with area is pretty rare; more commonly a separate line is used for stroke. I think this is fine.

More generally, the loss of some curves is a bit uncomfortable; I wonder if we could sample the (full) curve and create intermediary control points?

I don’t think this is worth the effort/complexity. Some curves already have problems with series that have a small number of data points. (Some curves also have other problems, like failing to preserve the monotonicity of data.) I prefer to keep this behavior as simple as possible both for us in terms of maintenance and for the user in terms of understanding what’s happening.

@mbostock
Copy link
Member Author

As a more general comment on this PR, I love the clean and radical genericity of the approach!

Thank you! And thank you for the detailed review.

It takes a moment to remember that if you specify a variable fill (especially a continuous value) you need to specify z explicitly or the chart might display nothing (or group on quantized colors); maybe this could be added as a warning?

I thought of a possible warning heuristic: we could look at the cardinality of the implicit z channel, and if it’s ≥50% of the size of the dataset, that would suggest that it’s likely to be wrong and the user needs to set an explicit z channel (possibly to null).

@Fil
Copy link
Contributor

Fil commented Feb 17, 2022

whether round should be the default (and perhaps whether round should also be the default strokeLinejoin).

Yes, I think so. I don't see a drawback.

@mbostock
Copy link
Member Author

I thought of a possible warning heuristic: we could look at the cardinality of the implicit z channel, and if it’s ≥50% of the size of the dataset, that would suggest that it’s likely to be wrong and the user needs to set an explicit z channel (possibly to null).

I have implemented this warning.

@mbostock mbostock force-pushed the mbostock/group-aesthetics branch from bfc5e87 to d15050b Compare February 17, 2022 18:02
@mbostock
Copy link
Member Author

When we use a color scale there will typically be a maximum of 256 different colors—if your data has more than 512 values this test will never pass?

This is being computed on the z channel which is not passed through the color scale (even though it is an implicit z that uses the same source definition as the fill or stroke channel).

Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only remaining question for me is the grouping or path segments by z, #764, but it can be done later.

@mbostock mbostock merged commit 978c54e into main Feb 17, 2022
@mbostock mbostock deleted the mbostock/group-aesthetics branch February 17, 2022 18:31
@nspessot
Copy link

@mbostock do you have a sample of Plot.line with multi color depending on the "y" axis value, like if temperature is 80 on day 1, color red, t=60 day=2 color blue, etc.

@Fil
Copy link
Contributor

Fil commented Mar 14, 2022

@nspessot I think this example might be what you're looking for?

Fil added a commit that referenced this pull request Jun 2, 2023
…61 (comment))) is the following: when a mark only¹ makes sense with grouping—currently area, line, linearRegression and density—, we want to warn when the number of groups (G.size) is higher than the smallest possible number of _meaningful_ groups, which is I.length >> 1 (obtained when the groups are pairs).

However when there is an odd number of elements, having 1 element that falls out is to be expected and should not invalidate the chart.

Closes #1667

¹ It's not a hard rule: the areaY mark with a stroke, the lineY mark with a marker, and the density mark tolerate single points. But these are not the default usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants