-
I need to pass a dict or another complex object as argument to a composite_solid in a Dagster project. This dict or another complex object, is an output of another solid or composite_solid, and it contains a lot of information , and is organized as Russian doll, container within container within container. I just discovered that there seems to be a major limitation in Dagster that seems to prohibit doing this for composite_solid. It is a perfectly good practice for a solid, but not for a composite_solid, it seems. The problem was discussed here: under the question "How to use dictionary yielded from other solid in a composite Solid?", and the answer was given by Gregory Van Seghbroeck. The resume of the answer in my interpretation is "this cannot be done, composite_solid is different from regular solid in how it treats its arguments, so dict arguments to composite_solid's are not allowed". My question to Dagster developers on this issue is two-fold:
Let me explain what I mean by second question. In my mind , a big benefit from a system like Dagster would come from large Python data handling algorithmic projects which ingest a lot of different data and do a lot of data transformations and computations with them. Which is incidentally what is happening in my area of applicability (statistical modeling/data science). For these type of applications, one needs to ingest heterogeneous data sets and do lots of computations on them. It is critically important to organize the DAG hierarchically in a Russian doll manner, which would allow for visual representation in Dagit, browsing up and down the hierarchy, and general logical clarity. This is due to the fact that we cannot put say 100 solids in one pipeline and browse it all using Dagit, such diagram will make no sense and have no value . In order for visual representation to bring real value , it has to fit on one screen, so it should be no more than 10 (more likely 6) blocks (solids or composite solids) on one diagram. So what I would expect to see in a Dagster representation of any complex system is one high level pipeline which consists of 5 to 10 complex computational sub-blocks (composite_solid's in Dagster terminology) that exchange complex data sets between them by connecting output datasets from one sub-block (composite_solid) as input to another sub-block. Each of these sub-blocks would itself contain a set of computational sub-blocks (between 5 and 10 of them) which would also be organized the same way. So it is a Russian doll type structure of algorithmic data handling, and Dagster seems on the surface of it to be perfectly positioned to deliver the implementation of this exact architecture, with Dagit tool and its ability of hierarchical structures browsing (composite_solid's handling). The main element of Dagster which one would rely upon to create such hierarchical structure of data handling is composite_solid element. In order for it to work the composite_solid element needs to be able to the same manipulations with its inputs that the solid can. It is particularly important for composite_solid to be able to take a complex Russian doll type input and unpack it into a sequence of smaller object. So how come this composite_solid's have these limitations in Dagster? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @hfrcomm - in your ideal world, where do you envision the code that does the dictionary unpacking would execute? In production Dagster setups, typically each solid executes in its own process (or K8s pod). Are you planning to use such a setup? If so, would you like the dictionary unpacking code to execute once in its own process, or once in each of the |
Beta Was this translation helpful? Give feedback.
-
I think the core issue causing the difficulty here is that composite solids are only meant for organizing solids, not directly defining computations. We're actually discussing a rename that will make this more clear, clarifying that composite solids are not solids, but containers for solids: #2902. Because all computations in a pipeline must happen inside a solid, the way that I would think about implementing this would be to include a solid that does the unpacking. E.g.
Does that make sense? Are there reasons I'm missing that this doesn't work well for you? |
Beta Was this translation helpful? Give feedback.
I think the core issue causing the difficulty here is that composite solids are only meant for organizing solids, not directly defining computations. We're actually discussing a rename that will make this more clear, clarifying that composite solids are not solids, but containers for solids: #2902.
Because all computations in a pipeline must happen inside a solid, the way that I would think about implementing this would be to include a solid that does the unpacking. E.g.