-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebuild Data Commons DEV on separate cluster #118
Comments
Since everything I've done has followed the "data as code" approach (more or less), it's no problem for the pipelines I've created to rerun from scratch. Of course each of the pipelines needs a bit of scrubbing to properly use iceberg and implement its metadata and other best practices correctly. Would be great to do that in view/with help from members as a next step in onboarding resources. |
FYI current configurations are stored under operate-first org, so we can manage them next to our other services (this makes it easier to share configurations between deployments. ) Regardless, extending configurations to multiple environment should be straight forwards. We should identify the amount of users for both clusters so that we can identify namespace quotas/resource requirements etc. |
I would expect the number of users to be essentially the same, at least initially. The main difference is that in the new "prod" cluster, only certain trino groups will have privileges that are more than "select" - the pipeline processes that run workloads like data ingest will have access to table write privilegs, and if needed access to the underlying s3 buckets. All other users will have only "select", and will not have s3 credentials. Eventually, as the OS-Climate community grows, most community members will be using the prod cluster only, and so eventually the prod cluster will grow larger, but to start with they should have the same set of users. |
to clarify -- by 2 clusters are we talking about ocp clusters? |
CC @HeatherAck to keep abreast of usage and cost. |
@HumairAK Yes, we are talking about two separate OCP clusters. The data layers (from storage volumes, all the way up to Trino) should also be totally separate. As for keeping the configurations in an operate first org, this would mean that ultimately when we move to a full gitops model we will have OS-C owners for all our repositories also contributing / managing their config from operate first. If this is fine and understood then it would be good to document the process (unless it already exists?) On capacity planning - Erik is right if we exclude the Airbus onboarding stream. I am assuming we will use a separate cluster for SOSTrades / Witness when we onboard them, in which case we don't need to cater for much more capacity. |
This workflow already exists to some degree, we don't have the documentation yet, example of what that looks like: operate-first/apps#1418 We would just need to add these members to an owners file like this within the repo where these osc configs are held, so they can then use It would be helpful to identify the configs that we expect to be changed by various users, and we can separate them out within the directory structure and provide a OWNERS file with specific mentions of those members that will need to only touch these files. |
Alright let's go ahead with this and I will bring the repo info / management of OWNERS into our OS-C Data Commons doc then. |
@redmikhail With the access to our partner portal to get OpenShift subs, can you confirm there is no show stopper now to create the required clusters and also when these could be available for @HumairAK to do the platform deployment? |
@caldeirav I have added entitlements to the account , so we now should be able to build cluster. I will create vanilla cluster setup (control plane and infrastructure nodes ) so we can add ArgoCD and proceed with configuration using operate-first repo. I will also start adding "sub-tasks" to this issue so we could track actionable items. |
With the creation of these two clusters i expect we will need to inform the community of the new links for the various environments. Therefore I have brought in the issue #44 again as it may make sense to have one dashboard where community members can access Trino, jupyter, CloudBeaver, the token generator, etc... for both dev and prod in one place. |
Just to put this on the record, we'll need to confirm that these new clusters have access to GPUs having >= 16GB of ram (and access to larger ones is almost certainly going to be desirable) |
@erikerlandson For now we are planning to use combination of the p3.2xlarge set of nodes (https://aws.amazon.com/blogs/aws/new-amazon-ec2-instances-with-up-to-8-nvidia-tesla-v100-gpus-p3/) that has 16GB GPU memory and g4dn instances for the other type of gpu workload (will need be tainted differently ) |
Based on discussion above and somewhat large scope of some of the tasks adding here task list with references to the individual sub-tasks:
|
@redmikhail as I go through the tasks and in order to avoid misunderstanding - we want separate Trino for DEV / PROD (with a different single catalog for each instance). The reason is we will separately upgrade Trino in DEV and PROD. |
OS-Climate - Configuration of Dev/Prod servers meeting (11/02/2022)
Will raise separate issues for changes to environment summarised above so we can track them separately. |
|
|
|
FYI on the dev cluster these services are up: Jupyterhub Console link: https://console-openshift-console.apps.odh-cl2.apps.os-climate.org/k8s/all-namespaces/machine.openshift.io~v1beta1~MachineSet https://jupyterhub-odh-jupyterhub.apps.odh-cl2.apps.os-climate.org/ trino/cloudbeaver admin account same as before Not up: |
We definitely need this, all of our trino authentication story is based on JWT. That, Or some other solution for generating JWT I think we should stand yours up - it gives us JTW, and authenticated via github, both of which are important to how we designed OSC platform. I do not want to block this by getting into long discussion but we might also run it as an Open Services Group service, if there is a reasonable OSG affiliated cluster to run it on. |
Okay sure. I can set it up. But I think we might need to a more robust solution for a prod environment in the future. |
I agree, but I am not currently sure what that solution should be, and your tool does the job effectively |
trino token service added: https://das-odh-trino.apps.odh-cl2.apps.os-climate.org |
As the eventual PROD cluster is going to have some significant differences with respect to the new DEV cluster, I am going to close this issue out on behalf of the recent creation of DEV. Future discussion and progress on PROD will be tracked on #136 |
Going forward we want proper isolation between DEV and PROD environments for Data Commons, which means:
@MichaelTiemannOSC @erikerlandson
The text was updated successfully, but these errors were encountered: