Condensing JGF #1255

jameshcorbett · 2024-07-25T00:02:47Z

JGF is verbose, and Rabbit-y JGF on elcap systems can become very large. We discussed offline several ways to shrink JGF, both while maintaining the same format and compatibility with the standard (which?) and breaking the standard to achieve greater reductions in size.

Performance tests are necessary to see just how bad the problems are, so we can decide how radical of changes we need to make.

trws · 2024-07-29T21:01:56Z

To get this going, here's the JGF specification website: https://jsongraphformat.info/

We're currently using JGF v1, and I don't think we necessarily want to switch to v2 unless there's a material win for doing so. The main things that I can think of (@milroy I'm pretty sure I'm missing at least one of your suggestions, please correct me) that we've discussed are:

we send a lot of default data over the wire. Here's a vertex in flux JGF:
```
{
  "id": "2",
  "metadata": {
    "type": "node",
    "basename": "node",
    "name": "node0",
    "id": 0,
    "uniq_id": 2,
    "rank": -1,
    "exclusive": false,
    "unit": "",
    "size": 1,
    "paths": {
      "containment": "/tiny0/rack0/node0"
    }
  }
},
```
Just off top of my head:
1. no need to send default values
  1. basename defaults to type
  2. name defaults to {type}{id}
  3. rank defaults to -1 when resource doesn't belong to a rank, we could
    also use -2 to say "same as ancestor" to avoid sending it in almost
    all cases
  4. exclusive is almost never used except in allocated JGF, and then
    defaults to false
  5. unit is almost never used, and defaults to "each" or "empty"
  6. size is almost always 1, except for ram
  7. paths can be calculated entirely from edges, which we're sending
    separately anyway as long as we guarantee that jgf always includes all
    ancestor vertices up to the root
  8. uniq_id always matches the outer "id" field
2. the inner "id" is the logical id under the parent vertex, if we have the
  uniq_id, not sure we need that
With just that, we could send this as essentially equivalent:
```
{
  "id": "2",
  "metadata": {
    "type": "node",
  }
},
```
Edges are a bit harder, but since we're using a bidirectional graph finally
we should be able to remove the value of the name object, and containment
would be the default, so we could take these:
```
{
  "source": "2",
  "target": "4",
  "metadata": {
    "name": {
      "containment": "contains"
    }
  }
},
{
  "source": "1",
  "target": "5",
  "metadata": {
    "name": {
      "power": "supplies_to"
    }
  }
},
```
and make them
```
{
  "source": "2",
  "target": "4",
},
{
  "source": "1",
  "target": "5",
  "metadata": {
    "type": "power"
  }
},
```
All of that is still JGF compatible
Structural sharing, or expansion like GRUG does, as @milroy noted could help
a lot. JGF doesn't support this natively, but we could use the metadata to
include the information, or use subgraphs with references. Upside to this is
the vast majority of our vertices are identical below some point, so we could
take say a node, make an exemplar graph as a separate graph, then have a node
that says "template these based on this hostlist" or something and we're
sending relatively little more than with RV1
If we're willing to throw JGF the spec out the window, the cheapest, easiest
win would be to switch the edge storage from an array of objects with keys to
an array of tuple-like arrays like this:
```
[
  // [source, target, (optional) type]
  [2,4],
  [1,5,"power"]
]
```
Or even switch to a stream of json objects like some other formats use so we
can do the same to vertices and process them without having to build the
whole object in memory, but this will break any and all JGF parsers.

trws · 2024-07-29T21:02:15Z

CC: @garlick, @grondo

jameshcorbett · 2024-07-31T01:09:50Z

As I understand it I'm on the hook to try out elcap-sized JGF and see what the performance is like, which I intend to do as soon as I get some pressing rabbit issues out of the way.

Problem: as described in flux-framework#1255, JGF is too large and unwieldy, with lots of redundant and repeated data. To begin to cut back, make most vertex metadata optional.

jameshcorbett · 2024-09-19T05:05:12Z

OK so on a representative system:

Before the changes introduced by #1293 and #1297, JGF was 8458557 bytes. After the changes in those two PRs, it dropped to 2122574 bytes. If the path field of vertices could be dropped in all cases (currently it has to be supplied in all cases), that would shrink it to 1298046 bytes. That makes it around 1/7 the original size. Not bad.

Problem: as described in flux-framework#1255, JGF is too large and unwieldy, with lots of redundant and repeated data. To begin to cut back, make most vertex metadata optional.

jameshcorbett · 2024-10-01T02:29:43Z

If #1299 goes in, the last obvious work I'm aware of will be to drop each vertex's .metadata.paths.containment which in theory should be simple, however I'm not quite sure how to actually implement it. Also, there is some code that fetches all the paths to SSD vertices in JGF in order to be able to mark the vertices as UP or DOWN via set-status RPCs so that would need some sort of replacement.

jameshcorbett · 2024-10-08T18:43:16Z

On Hetchy, JGF for jobs appears to be around 15K per node.

jameshcorbett self-assigned this Jul 25, 2024

jameshcorbett mentioned this issue Sep 21, 2024

Flux picks computes that are attached to missing rabbits NearNodeFlash/NearNodeFlash.github.io#159

Open

jameshcorbett mentioned this issue Oct 1, 2024

Marking rabbits as down without paths to vertices flux-framework/flux-coral2#225

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Condensing JGF #1255

Condensing JGF #1255

jameshcorbett commented Jul 25, 2024

trws commented Jul 29, 2024

trws commented Jul 29, 2024

jameshcorbett commented Jul 31, 2024

jameshcorbett commented Sep 19, 2024

jameshcorbett commented Oct 1, 2024 •

edited

Loading

jameshcorbett commented Oct 8, 2024

Condensing JGF #1255

Condensing JGF #1255

Comments

jameshcorbett commented Jul 25, 2024

trws commented Jul 29, 2024

trws commented Jul 29, 2024

jameshcorbett commented Jul 31, 2024

jameshcorbett commented Sep 19, 2024

jameshcorbett commented Oct 1, 2024 • edited Loading

jameshcorbett commented Oct 8, 2024

jameshcorbett commented Oct 1, 2024 •

edited

Loading