Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Condensing JGF #1255

Open
jameshcorbett opened this issue Jul 25, 2024 · 6 comments
Open

Condensing JGF #1255

jameshcorbett opened this issue Jul 25, 2024 · 6 comments
Assignees

Comments

@jameshcorbett
Copy link
Member

JGF is verbose, and Rabbit-y JGF on elcap systems can become very large. We discussed offline several ways to shrink JGF, both while maintaining the same format and compatibility with the standard (which?) and breaking the standard to achieve greater reductions in size.

Performance tests are necessary to see just how bad the problems are, so we can decide how radical of changes we need to make.

@jameshcorbett jameshcorbett self-assigned this Jul 25, 2024
@trws
Copy link
Member

trws commented Jul 29, 2024

To get this going, here's the JGF specification website: https://jsongraphformat.info/

We're currently using JGF v1, and I don't think we necessarily want to switch to v2 unless there's a material win for doing so. The main things that I can think of (@milroy I'm pretty sure I'm missing at least one of your suggestions, please correct me) that we've discussed are:

  1. we send a lot of default data over the wire. Here's a vertex in flux JGF:

    {
      "id": "2",
      "metadata": {
        "type": "node",
        "basename": "node",
        "name": "node0",
        "id": 0,
        "uniq_id": 2,
        "rank": -1,
        "exclusive": false,
        "unit": "",
        "size": 1,
        "paths": {
          "containment": "/tiny0/rack0/node0"
        }
      }
    },

    Just off top of my head:

    1. no need to send default values
      1. basename defaults to type
      2. name defaults to {type}{id}
      3. rank defaults to -1 when resource doesn't belong to a rank, we could
        also use -2 to say "same as ancestor" to avoid sending it in almost
        all cases
      4. exclusive is almost never used except in allocated JGF, and then
        defaults to false
      5. unit is almost never used, and defaults to "each" or "empty"
      6. size is almost always 1, except for ram
      7. paths can be calculated entirely from edges, which we're sending
        separately anyway as long as we guarantee that jgf always includes all
        ancestor vertices up to the root
      8. uniq_id always matches the outer "id" field
    2. the inner "id" is the logical id under the parent vertex, if we have the
      uniq_id, not sure we need that

    With just that, we could send this as essentially equivalent:

    {
      "id": "2",
      "metadata": {
        "type": "node",
      }
    },

    Edges are a bit harder, but since we're using a bidirectional graph finally
    we should be able to remove the value of the name object, and containment
    would be the default, so we could take these:

    {
      "source": "2",
      "target": "4",
      "metadata": {
        "name": {
          "containment": "contains"
        }
      }
    },
    {
      "source": "1",
      "target": "5",
      "metadata": {
        "name": {
          "power": "supplies_to"
        }
      }
    },

    and make them

    {
      "source": "2",
      "target": "4",
    },
    {
      "source": "1",
      "target": "5",
      "metadata": {
        "type": "power"
      }
    },

    All of that is still JGF compatible

  2. Structural sharing, or expansion like GRUG does, as @milroy noted could help
    a lot. JGF doesn't support this natively, but we could use the metadata to
    include the information, or use subgraphs with references. Upside to this is
    the vast majority of our vertices are identical below some point, so we could
    take say a node, make an exemplar graph as a separate graph, then have a node
    that says "template these based on this hostlist" or something and we're
    sending relatively little more than with RV1

  3. If we're willing to throw JGF the spec out the window, the cheapest, easiest
    win would be to switch the edge storage from an array of objects with keys to
    an array of tuple-like arrays like this:

    [
      // [source, target, (optional) type]
      [2,4],
      [1,5,"power"]
    ]

    Or even switch to a stream of json objects like some other formats use so we
    can do the same to vertices and process them without having to build the
    whole object in memory, but this will break any and all JGF parsers.

@trws
Copy link
Member

trws commented Jul 29, 2024

CC: @garlick, @grondo

@jameshcorbett
Copy link
Member Author

As I understand it I'm on the hook to try out elcap-sized JGF and see what the performance is like, which I intend to do as soon as I get some pressing rabbit issues out of the way.

jameshcorbett added a commit to jameshcorbett/flux-sched that referenced this issue Sep 5, 2024
Problem: as described in flux-framework#1255, JGF is too large and unwieldy,
with lots of redundant and repeated data.

To begin to cut back, make most vertex metadata optional.
jameshcorbett added a commit to jameshcorbett/flux-sched that referenced this issue Sep 18, 2024
Problem: as described in flux-framework#1255, JGF is too large and unwieldy,
with lots of redundant and repeated data.

To begin to cut back, make most vertex metadata optional.
jameshcorbett added a commit to jameshcorbett/flux-sched that referenced this issue Sep 18, 2024
Problem: as described in flux-framework#1255, JGF is too large and unwieldy,
with lots of redundant and repeated data.

To begin to cut back, make most vertex metadata optional.
jameshcorbett added a commit to jameshcorbett/flux-sched that referenced this issue Sep 18, 2024
Problem: as described in flux-framework#1255, JGF is too large and unwieldy,
with lots of redundant and repeated data.

To begin to cut back, make most vertex metadata optional.
jameshcorbett added a commit to jameshcorbett/flux-sched that referenced this issue Sep 18, 2024
Problem: as described in flux-framework#1255, JGF is too large and unwieldy,
with lots of redundant and repeated data.

To begin to cut back, make most vertex metadata optional.
@jameshcorbett
Copy link
Member Author

OK so on a representative system:

Before the changes introduced by #1293 and #1297, JGF was 8458557 bytes. After the changes in those two PRs, it dropped to 2122574 bytes. If the path field of vertices could be dropped in all cases (currently it has to be supplied in all cases), that would shrink it to 1298046 bytes. That makes it around 1/7 the original size. Not bad.

jameshcorbett added a commit to jameshcorbett/flux-sched that referenced this issue Sep 19, 2024
Problem: as described in flux-framework#1255, JGF is too large and unwieldy,
with lots of redundant and repeated data.

To begin to cut back, make most vertex metadata optional.
jameshcorbett added a commit to jameshcorbett/flux-sched that referenced this issue Sep 24, 2024
Problem: as described in flux-framework#1255, JGF is too large and unwieldy,
with lots of redundant and repeated data.

To begin to cut back, make most vertex metadata optional.
jameshcorbett added a commit to jameshcorbett/flux-sched that referenced this issue Sep 30, 2024
Problem: as described in flux-framework#1255, JGF is too large and unwieldy,
with lots of redundant and repeated data.

To begin to cut back, make most vertex metadata optional.
@jameshcorbett
Copy link
Member Author

jameshcorbett commented Oct 1, 2024

If #1299 goes in, the last obvious work I'm aware of will be to drop each vertex's .metadata.paths.containment which in theory should be simple, however I'm not quite sure how to actually implement it. Also, there is some code that fetches all the paths to SSD vertices in JGF in order to be able to mark the vertices as UP or DOWN via set-status RPCs so that would need some sort of replacement.

@jameshcorbett
Copy link
Member Author

On Hetchy, JGF for jobs appears to be around 15K per node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants