Skip to content

Latest commit

 

History

History
418 lines (282 loc) · 22.1 KB

README.md

File metadata and controls

418 lines (282 loc) · 22.1 KB

∧p∨d

Area-Proportional Venn Diagram generator (WIP)

Demo: gradient descent toward target region sizes

Live app: runsascoded.com/apvd

Here's a sample run with targets reflecting the proportion of natural numbers that are divisible by 3, 5, 7, or any combination thereof:

fizz.buzz.bazz.example.mp4

It's fairly "alpha" (see /issues), but I believe it's "state of the art" at creating area-proportional Venn diagrams using up to 4 ellipses.

Examples

4 Ellipses

Here's Fig. 3, along with the best layout I've found using ∧p∨d:

Venn Diagram comprised of 4 ellipses

The error is just under 0.2%: about 1/500th of the overall area is in the wrong region.

Here's a closer look at the center, with region-size labels:

The "1.86" (Red ∩ Blue ∩ Yellow) should be 0, and there a missing { Red ∩ Green ∩ Yellow } of size 1. I use a rudimentary penalty that moves shapes closer together when a region is missing (the converse happens naturally via gradient descent), but it could definitely be made smarter.

Definitely lots of low-hanging UX improvements as well!

Fig. 7, paper and ∧p∨d versions:

From the supplement:

∧p∨d struggles a bit with this one:

Error is 22.9 (4.64%); half of that is { Red ∩ Yellow ∩ Blue }, which is 1.56 instead of 12. Incorporating each region's relative error would likely produce more intuitive results (see shapes#9).

(Also, the "182" for "TP53 only" is on the { TP53 ∩ STK11 } region; see #7)

3 Circles

TODO: show ∧p∨d solutions to these…

Prior art

Area-proportional + Ellipses

eulerAPE

Java applet, up to 3 ellipses, area-proportional:

Here is its solution to a simple 3-set example featured below:

Their paper is a great reference, and includes many examples from published papers.

Web app, "Euler Diagrams Drawn with Ellipses Area-Proportionally (Edeap)":

The area-proportionality seems pretty imprecise, in this example and others. It seems to use a force-directed algorithm that doesn't actually seek a perfect/converged solutions.

Area-proportional + Circles

Several tools generate area-proportional Venn diagrams using circles, but this doesn't allow for modeling most inputs.

For example, here's ∧p∨d trying to fit the first example below using three circles:

benfred.circles_1M.mp4

(initial layout, best approximation; the reported "7.36%" error is "earth-mover distance", as a percentage of the overall diagram size)

Allowing just one set to be an ellipse (even "aligned" to the axes, with no rotation) is enough for this diagram to converge:

"Venn Diagrams with D3.js"

Web app, 3 circles:

Venn Diagram comprised of 3 circles, with region areas displayed

Note that the solution above is imprecise: $|A \cap B \cap C|$ should be equal to $|A \cap B \setminus C|$ and $|A \cap C \setminus B|$, and twice the size of $|B \cap C \setminus A|$, but it's smaller than all three. See discussion above about ellipses supporting convergence for this example.

Web app by the BioVenn author, up to 10 sets, circles only:

Windows executable, 2-3 circles:

Non-area-proportional

Not area-proportional, but will draw up to 6 sets:

Web app, up to 4 ellipses, not area-proportional:

Web app, up to 4 rectangles, not area-proportional:

R package, 2-7 sets, not area-proportional:

Web app, up to 5 sets, not area-proportional:

Status

The demo at runsascoded.com/apvd supports up to 4 ellipses (including allowing them to rotate relative to the axes), and arbitrary initial layouts and target region-sizes. It can gradient-descend for 100,000's of steps and converge, or reach negligible error levels, on most examples I've tested it on.

Future work could involve:

  • command-line / server-based version (evolving models from multiple initial layouts in parallel)
  • more shapes (rectangles, polygons, splines)
  • more ability to configure examples, and save and share state

See /issues for more info.

Background

When I first saw diagrams like this in the wild, I hadn't realized that 4 ellipses can intersect to form all 15 ($2^4-1$) possible regions. I knew it was impossible with 4 circles, but the added power from slightly relaxing the shapes was intriguing!

The wildly disproportionate areas bugged me, though. I read up on it a bit, and found some relatively accessible open problems related to generating "area-proportional Venn diagrams".

I believe this library pushes the state of the art forward a bit (e.g. by supporting 4 ellipses), and hopefully provides a platform for further progress.

Methods

This repo is primarily the "frontend" / web app; the interesting computation occurs in runsascoded/shapes.

The basic approach is to:

  1. model a "scene" with various intersecting shapes
  2. compute intersection points
  3. infer regions and sizes for each subset of the set of shapes
  4. compute error ($|desired size - actual size|$, for all regions)
  5. gradient-descend the shapes' coordinates (e.g. center x/y and radius x/y) in the direction of decreasing error

Several of these steps turn out to be nontrivial, especially:

  • computing intersection points between 2 ellipses (involves solving a quartic equation, as ellipses can intersect in up to 4 points)
  • propagating useful gradients through all calculations (requires an autodiff abstraction, and some care to avoid numeric instability)

Quartic equation-solving for ellipse intersections

Computing ellipse intersections can be represented as solving a system of equations that looks like:

$$ \begin{eqnarray} A_1x^2 + B_1xy + C_1y^2 + D_1x + E_1y + F_1 &=& 0 \\ A_2x^2 + B_2xy + C_2y^2 + D_2x + E_2y + F_2 &=& 0 \\ \end{eqnarray} $$

With some algebraic manipulation, this can be represented as a quartic equation in one variable (reflectin the fact that two ellipses can intersect at up to 4 points). I initially used the Rust roots crate, but hit issues (#29, #30), and wrote quartic.rs instead.

The nb/ folder contains notebooks with relevant math and derivations, especially:

Autodiff

Most of the computations related to shapes' intersections and corresponding region areas are differentiable. I use a thin wrapper (dual.rs) around num_dual's DualDVec64 for this.

shapes#1 describes moving to statically-sized Duals, which would likely be faster, and make the code much cleaner.

Future directions

Better "missing region" penalty

The "missing region penalty" should be proportional to the distance each shape is from the required region existing. This would allow shapes to rotate to get closer, whereas currently I just move their centers closer together, which often ends up thrashing against added error in other regions.

See shapes#10.

Proportional errors

Currently, absolute difference between target and actual region sizes is all that is considered. Incorporating relative error is appealing, but it raises questions about what to do with missing or erroneous (shouldn't exist but do) regions.

A mix of absolute and relative may be most intuitive/aesthetic; see shapes#9.

More shape types

See also: shapes#11.

Polygons

My approach should be extensible to arbitrary polygons (including non-convex), which would allow for intersections of 5 (or more?) sets.

I suspect the usefulness of even a perfect layout at that level to be limited, but it would be nice to have the option, and 5-set diagrams exist in the wild:

Venn Diagram comprised of 5 nonconvex blobs

Polygons with rounded corners should also be doable, and might be more aesthetically pleasing.

Splines

Cubic-bezier splines would be ideal, but the math is much harder; I'm not sure how to compute it exactly (using my autodiff-based approach).

This SO answer quotes a lost forum post outlining how to use the divergence theorem to compute the area of a poly-bezier loop. It's possible that (or some approximation that is nevertheless autodiff-able) would work…

Other notes/references

Earlier versions of apvd

Draggable ellipses demo

I previously implemented an interface for computing ellipse intersections, which included draggable and resizable shapes:

drag-ellipses.mp4

(live demo: runsascoded.com/apvd/ellipses)

It's implemented in JS, pre-shapes, and computes intersections and region sizes, but isn't differentiable / can't gradient-descend.

Scala.js port

There's also a partial Scala.js implementation in this repo's @scala branch, including cubic and quartic equation solvers.

combinatorics.org Survey

https://www.combinatorics.org/files/Surveys/ds5/ds5v3-2005/VennEJC.html

5 symmetric triangles

https://www.combinatorics.org/files/Surveys/ds5/ds5v3-2005/VennSymmExamples.html

6 triangles

https://www.combinatorics.org/files/Surveys/ds5/ds5v3-2005/VennTriangleEJC.html

Polyominoes

https://www.combinatorics.org/files/Surveys/ds5/ds5v3-2005/VennPoly67EJC.html

Shown below is a 6-Venn diagram formed entirely from curves drawn from axis-aligned edges. It is a minimum-area diagram; that is, each region is composed of a single square of unit area. Note that many edges overlap, so the diagram is infinitely intersecting. As with many other diagrams in these pages, regions are coloured by weight. The diagrams on this page are from [CR05].

The six component curves of the diagram, overlaid on a grayed-out version of the entire diagram:

This is a 7-Venn diagram formed entirely from curves drawn from axis-aligned edges. Like the above it is minimum-area and infinitely intersecting.

The seven component curves:

Related libraries

These libraries aim to convey set relationships:

Some other relevant libraries I've used or studied:

Rust

Dual / Autodiff libraries:

JS

quartic.js (last release 2008) Core code is from a web solver written by David Binner