Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a single repository structure, image with relevant dependencies for data pipelines #198

Open
eoriorda opened this issue Aug 16, 2022 · 5 comments
Assignees

Comments

@eoriorda
Copy link

No description provided.

@MichaelTiemannOSC
Copy link
Contributor

Related: #201

The above is related because AT PRESENT, virtually all of our data pipelines have library requirements not currently met by any existing default runtime environments. The AICoE tutorial explains how to use Thoth, Sesheta, and other CI/CD tools to build and deploy one's own images into Quay, but that quickly leads to runtime image fragmentation. The Highlander approach is in development, but not yet ready.

From an ARCHITECTURAL perspective, we need to describe how people should write pipelines today (using notebooks and suffering the costs of installing and loading their own modules on a case-by-case basis), as well as where we are going (what basic approach to deciding when to create a new image, maintained by whom, and how a part of the larger data reproducibility story). I think it is unsatisfying to achieve our data reproducibility solution by creating a runtime image manageability problem.

@caldeirav caldeirav changed the title Up to date Arch diagram for Data Commons including inventory of all apps Create a single repository structure, image with relevant dependencies for data pipelines Sep 12, 2022
@eoriorda
Copy link
Author

Produce a dedicated image pipeline repository . Goal to have an image created in the OSC Quay account . @erikerlandson

@HeatherAck
Copy link
Contributor

Defer to after COP27, create standard image with pre-loaded libraries.

@HeatherAck HeatherAck moved this from In Progress to LF- Maintenance Backlog in Data Commons Platform Oct 4, 2022
@HeatherAck
Copy link
Contributor

See also: #98

@HeatherAck
Copy link
Contributor

still in progress; need to break out "default" versions of jupyter notebooks (AI and non-AI);
Need DBT directory structure beyond simple use case (@mtiemann - reached out to DBT community for guidance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: LF- Maintenance Backlog
Development

No branches or pull requests

4 participants