-
I currently have a DAG in a different tool whose purpose is to process updates and dependencies for loading data into a large collection of complicated databases with a proprietary format. In many cases, the amount of data being processed is large and sometimes the format of the ingested data is quite complicated. The way inputs are made available is also very complicated in a lot of cases. For this reason, we have never actually brought any of the data we process into the current DAG execution context since it is only for orchestrating the work and the actual ETL is handled by the code for the proprietary database. There are also no available 3rd party tools for working with this database's native storage format efficiently, so serialization and de-serialization would be expensive. I'd like to evaluate Dagster though since I really appreciate its emphasis on strong typing and discrete modeling which is something our current solution lacks. With all that context, my question is this: What exactly are Outputs expected to be? Do they always have to be data? It seems like what we are currently doing with our DAG is processing and returning something closer to AssetMaterializations most of the time. Is there such a concept as a "ProxyOutput" that stands in for the data itself? Maybe I'm just making a hash of the concepts and this is all a lot simpler than I'm making it out to be. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @nugend - outputs don't always need to be data. If you don't need to pass anything between a solid and its downstream solids, you can use a "Nothing" dependency. If the location of the data that one solid produces is dynamic, you can output a String that describes that location (e.g. the name of a database table or path in a filesystem). |
Beta Was this translation helpful? Give feedback.
Hi @nugend - outputs don't always need to be data. If you don't need to pass anything between a solid and its downstream solids, you can use a "Nothing" dependency. If the location of the data that one solid produces is dynamic, you can output a String that describes that location (e.g. the name of a database table or path in a filesystem).