Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better ordinal axes with intervals #1790

Merged
merged 23 commits into from
Aug 24, 2023
Merged

better ordinal axes with intervals #1790

merged 23 commits into from
Aug 24, 2023

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Aug 4, 2023

This applies the nice multi-line time axis to ordinal scales that have time intervals. For example:

image

Plot.plot({
  x: {interval: "month"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

Still to-do:

  • Figure out if the time interval is UTC or local time?
  • Make sure we don’t choose a time interval for ticks that isn’t compatible with the domain?

I’m excited about this usability improvements since people often want to represent time ordinally.

Fixes #1789.

@mbostock mbostock requested a review from Fil August 4, 2023 23:17
@mbostock
Copy link
Member Author

mbostock commented Aug 4, 2023

Here’s an example where it breaks and I’m not sure yet what to do:

Screenshot 2023-08-04 at 4 23 20 PM
Plot.plot({
  x: {interval: "4 weeks", ticks: "months"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

It breaks because the 4 weeks interval doesn’t align with the months interval (because weeks by default start on Sunday, and months start on whatever day they start at). Perhaps, for example, we have to floor the generated tick values according to the other interval?

@mbostock
Copy link
Member Author

mbostock commented Aug 4, 2023

Okay, I’ve done what I suggested above.

Screenshot 2023-08-04 at 4 38 39 PM
Plot.plot({
  x: {interval: "4 weeks", ticks: "year"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

It’s a little surprising, perhaps, since that 2013 actually corresponds to December 29, 2013.

Screenshot 2023-08-04 at 4 39 21 PM
Plot.plot({
  x: {interval: "4 weeks", ticks: "year", tickFormat: Plot.formatIsoDate},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

That’s because January 1, 2014 rounds down to December 29, 2013 when using the 4 weeks interval, but the default time format sees the ticks about a year apart, and hence only shows the year. It might be better therefore to round up rather than down, but technically our RangeImplementation interface only guarantees that interval.floor is available. But I think we could reasonably introduce a requirement that you implement interval.ceil, too, or at the very least use it if available.

@mbostock
Copy link
Member Author

mbostock commented Aug 4, 2023

With interval.ceil, it’s more consistent with the default behavior.

Screenshot 2023-08-04 at 4 44 37 PM
Plot.plot({
  x: {interval: "4 weeks", ticks: "year"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

Although now I notice that the default ticks are broken with the 4 weeks interval… 😬

Screenshot 2023-08-04 at 4 46 10 PM
Plot.plot({
  x: {interval: "4 weeks"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

@mbostock mbostock force-pushed the mbostock/ordinal-time-axis branch 2 times, most recently from 25f2a52 to 2afe52a Compare August 4, 2023 23:47
@mbostock
Copy link
Member Author

mbostock commented Aug 5, 2023

A nice property would be if e.g. “2018” always meant the same thing in the default time axis, i.e. 2018-01-01. So perhaps a better strategy here would be:

  1. Compute the positive integer n such that taking every nth value from the scale’s domain produces as close as possible to the desired number of ticks. For example, if the domain has 100 values and 5 ticks are desired, n = 20.
  2. Compute the median step s between adjacent values from the scale’s sorted domain, and use this to determine the standard time interval i that is closest to the median step s times n (per 1). For example, if the scale’s interval is day and n = 20, then i = month; if the scale’s interval is day and n = 7, then i = week.
  3. If the standard interval i (per 2) is subsumed by the scale’s interval, i.e. if the ticks generated by i are not altered by the scale’s interval, then use the standard interval i to generate ticks with the default multi-line conditional format. For example, if the scale’s interval is day and i = week, use week ticks.
  4. Otherwise, use n (per 1) to select every nth value from the scale’s domain, and apply the default ISO date format. For example, if the scale’s interval is week and i = month and n = 3, use every third week as YYYY-MM-DD.

The nice thing about this strategy is that it avoids needing to know exactly what the scale’s actual interval is; the scale’s interval can be a black box, and we can infer everything from applying the interval and looking at domain values.

@mbostock
Copy link
Member Author

Got some pieces working here: https://observablehq.com/d/61f1586d16d3ec0c

@mbostock mbostock force-pushed the mbostock/ordinal-time-axis branch 6 times, most recently from 11d50fd to 054ed6e Compare August 16, 2023 23:40
@mbostock
Copy link
Member Author

Now also addresses part of #74 when a numeric interval is specified.

@mbostock mbostock marked this pull request as ready for review August 17, 2023 23:38
@mbostock
Copy link
Member Author

mbostock commented Aug 18, 2023

I went through several iterations on this and I feel this is now ready this is still not quite ready.

I explored dropping tick labels instead of dropping ticks. I feel this produces a better result in some cases:

image
image
image

However, I decided against it because it feels in consistent. For example, if you specify ticks as an array of values, we only draw those, but if you specify ticks as an interval or a number, or you use tickSpacing, you get ticks for every value in the domain but only some of them are labeled. And if you specify the tickFormat option, suddenly all the ticks appear regardless of the interval option? I could see there being a version of this approach that works in the future, but I think we need more options, or to redesign the existing options, to make it work.

I also explored how we handled “misaligned” intervals. For example, what do we do if the scale interval is 4 weeks, but the tick interval is year? These two intervals are not aligned (there are 365 or 366 days in a year, which is not a multiple of 28). One strategy that worked fairly well is to draw the first tick in each interval. So, the first 4 week interval within each year interval has a tick, like so:

Screenshot 2023-08-17 at 4 52 45 PM

The logic for this is:

data.filter((d, i) => (i === 0 || ticks.floor(d) > ticks.floor(data[i - 1]));

However, I think computing the intersection of the intervals is simpler and easier to understand:

data.filter((d) => ticks.floor(d) >= d);

And perhaps more importantly, it avoids the problem of the first and second tick being likely to overlap.

But I also realized that there’s a more general problem: even if the intervals are aligned, the tick interval could be much farther apart than a “standard” time interval, making the default multi-line time format inappropriate. For example, if you have the 52 week tick interval on top of a 4 week scale interval, and you try to display that using the default multi-line format for 4 week (which is day, %-d %b), you get this:

Screenshot 2023-08-17 at 4 57 12 PM

I attempted to avoid this problem by checking if most of the ticks were dropped by applying the tick interval, and switching to the default ISO format. But writing this up, I realize we have the same problem even if you don’t specify the tick interval at all and you just specify a scale interval of 52 weeks:

Screenshot 2023-08-17 at 4 58 57 PM

So… I’ll put this back in draft mode. 🤔 I’m guessing maybe the multi-line format has to be smarter about comparing with the previous tick value; it can’t rely solely on whether the partial format is the same (because e.g. “Nov” could refer to Nov 2014 or Nov 2015).

And there’s still an outstanding problem that we don’t have a way of indicating whether intervals associated with an ordinal scale are in local or UTC time, so for the most part we default to UTC time. This means if you have an ordinal time axis with hourly data, there’s no easy way to get the multi-line format to show local time. I think this will require a new option, or perhaps exposing the multi-line format.

@mbostock mbostock marked this pull request as draft August 18, 2023 00:04
@mbostock mbostock marked this pull request as ready for review August 18, 2023 01:20
@mbostock
Copy link
Member Author

Okay! Ready to go! 🚀

@mbostock mbostock changed the title default ordinal time axis better ordinal axes with intervals Aug 18, 2023
@mbostock
Copy link
Member Author

There’s still some room for improvement here, but I wonder if it’s a blocker. In the example in the top post, we no longer generate yearly ticks, but instead generate monthly ticks every 8 months:

Screenshot 2023-08-17 at 6 38 57 PM
Plot.plot({
  x: {interval: "month"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

It would be nicer in this case if we could use aligned 6 months ticks instead:

Screenshot 2023-08-17 at 6 40 39 PM
Plot.plot({
  x: {interval: "month", ticks: "6 months"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

Or, if you prefer, year ticks:

Screenshot 2023-08-17 at 6 41 11 PM
Plot.plot({
  x: {interval: "month", ticks: "year"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

The challenge is that it can be hard to know which interval is specified, and therefore how to generalize it. It seems obvious in the case that the interval option is month (the string), but you could set it to d3.utcMonth, or d3.utcMonth.every(3), or some such, and we’d have a harder time detecting that this is a “standard” interval and therefore know how to generalize it automatically to reduce the number of ticks. That’s the appeal of simply showing every nth tick. But we could perhaps treat named intervals specially.

@mbostock
Copy link
Member Author

Okay! The latest commit detects and generalizes standard time intervals, so we get the best possible result for the common case where the scale’s interval is something common like month or day.

Screenshot 2023-08-18 at 10 34 54 AM
Plot.plot({
  x: {interval: "month"},
  marks: [Plot.barY(aapl, Plot.groupX({y: "median", title: "min"}, {title: "Date", x: "Date", y: "Close"}))]
})

I believe I’m done polishing this now. I feel like even though this was a long journey, the resulting logic isn’t too complicated. Let me know if I can clarify anything, @Fil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this chart is a bit surprising now; maybe we'd need to replace the y grid by a ruleY?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can say grid: 10 if you want more grid lines:

Screenshot 2023-08-23 at 12 56 30 PM

Feels okay to me. I think it would be better if the 2 was formatted as 2.0, and that should be possible to detect from the interval: 0.1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like…

    : tickFormat === undefined && data && isNumeric(data) && scale.interval?.[intervalDuration]
    ? format(`.${`${scale.interval?.[intervalDuration] % 1}`.length - 2}f`)

but hopefully not so messy. 😁

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can say grid: 10 if you want more grid lines:

Screenshot 2023-08-23 at 12 56 30 PM Feels okay to me. I think it would be better if the `2` was formatted as `2.0`, and that should be possible to detect from the `interval: 0.1`.

How did you remove the tick marks from y-axis in this plot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add more ticks in this case, but it's nice to show what it gets automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so much better!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test loses an empty text

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and this test gains an empty text; not a blocker at all, but out of curiosity what's the logic here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that too but didn’t investigate. 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!!

@Fil
Copy link
Contributor

Fil commented Aug 22, 2023

None of the above comments is blocking.

I poked at the downloadsOrdinal example a bit and maliciously tried x: {interval: 1} instead of "day". My browser didn't let me create an ordinal scale of a bazillion instants (one per ms), which is fortunate. I wonder though if there should be more of a control on this, beyond the browser balking with a RangeError: Invalid array length. Maybe not, but the error message is not super friendly.

(Note that x: {interval: "second"} takes a dozen of seconds but ends up returning a chart — except of course that the bars are too thin to be visible.)

@Fil
Copy link
Contributor

Fil commented Aug 22, 2023

Using temporal intervals & tickSpacing on the horizontal scale is really flowing. I'm adding a test that uses it on the fy facet scale.

@mbostock mbostock force-pushed the mbostock/ordinal-time-axis branch from 90f71ce to 701a8b7 Compare August 24, 2023 00:42
@mbostock mbostock merged commit e3e794e into main Aug 24, 2023
@mbostock mbostock deleted the mbostock/ordinal-time-axis branch August 24, 2023 00:53
chaichontat pushed a commit to chaichontat/plot that referenced this pull request Jan 14, 2024
* ordinal time axis

* filter ordinal ticks with numeric intervals

* checkpoint

* simplify hasTimeTicks

* fix nullish check

* filter approach

* inferTimeFormat

* tidy

* prune redundant formats

* tidy

* comment

* filter ticks, not just text

* warn on misaligned intervals

* dense grid for sparseCell

* add missing test snapshot

* more robust inferTimeFormat

* detect and generalize standard time intervals

* test: temporal interval on the facet scale

* improve temporal scales, too

* better edge cases

* tweak comment

* move tickFormat function detection

* minimize diff

---------

Co-authored-by: Philippe Rivière <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-line time axis for ordinal time scales
3 participants