Skip to content

Commit

Permalink
Merge branch 'master' into fix/fe-bundle
Browse files Browse the repository at this point in the history
  • Loading branch information
jawadqur authored Dec 5, 2024
2 parents 46efca6 + a464375 commit 6faa91c
Show file tree
Hide file tree
Showing 15 changed files with 209 additions and 45 deletions.
2 changes: 1 addition & 1 deletion helm/etl/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.5
version: 0.1.6

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
Expand Down
14 changes: 6 additions & 8 deletions helm/etl/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# etl

![Version: 0.1.5](https://img.shields.io/badge/Version-0.1.5-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: master](https://img.shields.io/badge/AppVersion-master-informational?style=flat-square)
![Version: 0.1.6](https://img.shields.io/badge/Version-0.1.6-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: master](https://img.shields.io/badge/AppVersion-master-informational?style=flat-square)

A Helm chart for gen3 etl

Expand All @@ -14,6 +14,7 @@ A Helm chart for gen3 etl
| esGarbageCollect.enabled | bool | `false` | Whether to create es garbage collect cronjob. |
| esGarbageCollect.schedule | string | `"0 0 * * *"` | The cron schedule expression to use in the es garbage collect cronjob. Runs once a day by default. |
| esGarbageCollect.slack_webhook | string | `"None"` | Slack webhook endpoint to use for cronjob. |
| etlForced | string | `"TRUE"` | |
| etlMapping.mappings[0].aggregated_props[0].fn | string | `"count"` | |
| etlMapping.mappings[0].aggregated_props[0].name | string | `"_samples_count"` | |
| etlMapping.mappings[0].aggregated_props[0].path | string | `"samples"` | |
Expand Down Expand Up @@ -93,17 +94,14 @@ A Helm chart for gen3 etl
| image.tube.repository | string | `"quay.io/cdis/tube"` | The Docker image repository for the fence service |
| image.tube.tag | string | `"master"` | Overrides the image tag whose default is the chart appVersion. |
| imagePullSecrets | list | `[]` | Docker image pull secrets. |
| legacySupport | bool | `false` | |
| podAnnotations | map | `{}` | Annotations to add to the pod |
| resources | map | `{"spark":{"limits":{"cpu":1,"memory":"2Gi"},"requests":{"cpu":0.3,"memory":"128Mi"}},"tube":{"limits":{"cpu":1,"memory":"2Gi"},"requests":{"cpu":0.3,"memory":"128Mi"}}}` | Resource requests and limits for the containers in the pod |
| resources.spark.limits | map | `{"cpu":1,"memory":"2Gi"}` | The maximum amount of resources that the container is allowed to use |
| resources.spark.limits.cpu | string | `1` | The maximum amount of CPU the container can use |
| resources.spark.limits.memory | string | `"2Gi"` | The maximum amount of memory the container can use |
| resources | map | `{"spark":{"requests":{"cpu":0.3,"memory":"128Mi"}},"tube":{"requests":{"cpu":0.3,"memory":"128Mi"}}}` | Resource requests and limits for the containers in the pod |
| resources.spark.requests | map | `{"cpu":0.3,"memory":"128Mi"}` | The amount of resources that the container requests |
| resources.spark.requests.cpu | string | `0.3` | The amount of CPU requested |
| resources.spark.requests.memory | string | `"128Mi"` | The amount of memory requested |
| resources.tube.limits | map | `{"cpu":1,"memory":"2Gi"}` | The maximum amount of resources that the container is allowed to use |
| resources.tube.limits.cpu | string | `1` | The maximum amount of CPU the container can use |
| resources.tube.limits.memory | string | `"2Gi"` | The maximum amount of memory the container can use |
| resources.tube.requests | map | `{"cpu":0.3,"memory":"128Mi"}` | The amount of resources that the container requests |
| resources.tube.requests.cpu | string | `0.3` | The amount of CPU requested |
| resources.tube.requests.memory | string | `"128Mi"` | The amount of memory requested |
| schedule | string | `"*/30 * * * *"` | |
| suspendCronjob | bool | `true` | |
40 changes: 22 additions & 18 deletions helm/etl/templates/etl-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ kind: CronJob
metadata:
name: etl-cronjob
spec:
schedule: "0 0 1 1 */5"
suspend: {{ .Values.suspendCronjob }}
schedule: {{ .Values.schedule | quote }}
jobTemplate:
spec:
backoffLimit: 0
Expand Down Expand Up @@ -35,6 +36,12 @@ spec:
values:
- ONDEMAND
volumes:
{{- if .Values.legacySupport }}
- name: config-volume
secret:
defaultMode: 420
secretName: etl-secret
{{- end }}
- name: signal-volume
emptyDir: {}
- name: creds-volume
Expand Down Expand Up @@ -80,9 +87,6 @@ spec:
requests:
cpu: {{ .Values.resources.spark.requests.cpu }}
memory: {{ .Values.resources.spark.requests.memory }}
# limits:
# cpu: {{ .Values.resources.spark.limits.cpu }}
# memory: {{ .Values.resources.spark.limits.memory }}
command: ["/bin/bash" ]
args:
- "-c"
Expand All @@ -105,7 +109,6 @@ spec:
while true; do sleep 5; done
- name: tube
imagePullPolicy: IfNotPresent
# image: quay.io/cdis/tube:feat_helm_test
image: {{ .Values.image.tube.repository }}:{{ .Values.image.tube.tag }}
ports:
- containerPort: 80
Expand Down Expand Up @@ -153,7 +156,7 @@ spec:
- name: SPARK_DRIVER_MEMORY
value: 6g
- name: ETL_FORCED
value: "TRUE"
value: {{ .Values.etlForced }}
- name: gen3Env
valueFrom:
configMapKeyRef:
Expand All @@ -166,11 +169,11 @@ spec:
key: slack_webhook
optional: true
volumeMounts:
# - name: "creds-volume"
# readOnly: true
# mountPath: "/gen3/tube/creds.json"
# subPath: creds.json
# Volume to signal when to kill spark
{{- if .Values.legacySupport }}
- mountPath: /tube/tube/settings.py
name: config-volume
subPath: settings.py
{{- end }}
- mountPath: /usr/share/pod
name: signal-volume
- name: "etl-mapping"
Expand All @@ -185,9 +188,6 @@ spec:
requests:
cpu: {{ .Values.resources.tube.requests.cpu }}
memory: {{ .Values.resources.tube.requests.memory }}
# limits:
# cpu: {{ .Values.resources.tube.limits.cpu }}
# memory: {{ .Values.resources.tube.limits.memory }}
command: ["/bin/bash"]
args:
- "-c"
Expand All @@ -199,13 +199,17 @@ spec:
# Port 9000 is open, continue with the rest of the script
echo "Port 9000 is now open. Continuing with the script..."
echo "python run_config.py && python run_etl.py"
python run_config.py && python run_etl.py
if [[ $ETL_FORCED != "false" ]]; then
echo "python run_config.py && python run_etl.py --force"
python run_config.py && python run_etl.py --force
else
echo "python run_config.py && python run_etl.py"
python run_config.py && python run_etl.py
fi
exitcode=$?
# Kill sidecar and all processes
echo "Exit code: $exitcode"
pkill -u root && exit $exitcode
exit "$exitcode" &
restartPolicy: Never
restartPolicy: Never
104 changes: 104 additions & 0 deletions helm/etl/templates/etl-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
{{ if .Values.legacySupport }}
kind: Secret
apiVersion: v1
metadata:
name: etl-secret
stringData:
settings.py: |-
import os
import tube.enums as enums
from cdislogging import get_logger
from tube.config_helper import find_paths, load_json
from .utils.general import get_resource_paths_from_yaml
logger = get_logger("__name__", log_level="warn")
LIST_TABLES_FILES = "tables.txt"
#
# Load db credentials from a creds.json file.
# See config_helper.py for paths searched for creds.json
# ex: export XDG_DATA_HOME="$HOME/.local/share"
# and setup $XDG_DATA_HOME/.local/share/gen3/tube/creds.json
#
conf_data = load_json("creds.json", "tube")
DB_HOST = os.getenv("DB_HOST") or conf_data.get("db_host", "localhost")
DB_PORT = os.getenv("DB_PORT") or conf_data.get("db_port", "5432")
DB_DATABASE = os.getenv("DB_DATABASE") or conf_data.get("db_database", "sheepdog")
DB_USERNAME = os.getenv("DB_USERNAME") or conf_data.get("db_username", "peregrine")
DB_PASSWORD = os.getenv("DB_PASSWORD") or conf_data.get("db_password", "unknown")
DB_USE_SSL = os.getenv("DB_USE_SSL") or conf_data.get(
"db_use_ssl", False
) # optional property to db_use_ssl
JDBC = (
"jdbc:postgresql://{}:{}/{}".format(DB_HOST, DB_PORT, DB_DATABASE)
if DB_USE_SSL is False
else "jdbc:postgresql://{}:{}/{}?sslmode=require".format(
DB_HOST, DB_PORT, DB_DATABASE
)
)
PYDBC = "postgresql://{}:{}@{}:{}/{}".format(
DB_USERNAME, DB_PASSWORD, DB_HOST, DB_PORT, DB_DATABASE
)
DICTIONARY_URL = os.getenv(
"DICTIONARY_URL",
"https://s3.amazonaws.com/dictionary-artifacts/datadictionary/develop/schema.json",
)
ES_URL = os.getenv("ES_URL", "esproxy-service")
HDFS_DIR = "/result"
# Three modes: Test, Dev, Prod
RUNNING_MODE = os.getenv("RUNNING_MODE", enums.RUNNING_MODE_DEV) # 'Prod' or 'Dev'
PARALLEL_JOBS = 1
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")
ES = {
"es.nodes": ES_URL,
"es.port": "9200",
"es.input.json": "yes",
"es.nodes.client.only": "false",
"es.nodes.discovery": "false",
"es.nodes.data.only": "false",
"es.nodes.wan.only": "true",
}
HADOOP_HOME = os.getenv("HADOOP_HOME", "/usr/local/Cellar/hadoop/3.1.0/libexec/")
JAVA_HOME = os.getenv(
"JAVA_HOME", "/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home"
)
HADOOP_URL = os.getenv("HADOOP_URL", "http://spark-service:9000")
ES_HADOOP_VERSION = os.getenv("ES_HADOOP_VERSION", "")
ES_HADOOP_HOME_BIN = "{}/elasticsearch-hadoop-{}".format(
os.getenv("ES_HADOOP_HOME", ""), os.getenv("ES_HADOOP_VERSION", "")
)
HADOOP_HOST = os.getenv("HADOOP_HOST", "spark-service")
# Searches same folders as load_json above
try:
MAPPING_FILE = find_paths("etlMapping.yaml", "tube")[0]
except:
MAPPING_FILE = None
try:
USERYAML_FILE = find_paths("user.yaml", "tube")[0]
except IndexError:
USERYAML_FILE = None
PROJECT_TO_RESOURCE_PATH = get_resource_paths_from_yaml(USERYAML_FILE)
SPARK_MASTER = os.getenv("SPARK_MASTER", "local[1]") # 'spark-service'
SPARK_EXECUTOR_MEMORY = os.getenv("SPARK_EXECUTOR_MEMORY", "2g")
SPARK_DRIVER_MEMORY = os.getenv("SPARK_DRIVER_MEMORY", "512m")
APP_NAME = "Gen3 ETL"
os.environ[
"PYSPARK_SUBMIT_ARGS"
] = "--jars {}/dist/elasticsearch-spark-20_2.11-{}.jar pyspark-shell".format(
ES_HADOOP_HOME_BIN, ES_HADOOP_VERSION
)
os.environ["HADOOP_CLIENT_OPTS"] = os.getenv("HADOOP_CLIENT_OPTS", "")
{{- end }}
20 changes: 8 additions & 12 deletions helm/etl/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,25 +33,13 @@ resources:
cpu: 0.3
# -- (string) The amount of memory requested
memory: 128Mi
# -- (map) The maximum amount of resources that the container is allowed to use
limits:
# -- (string) The maximum amount of CPU the container can use
cpu: 1.0
# -- (string) The maximum amount of memory the container can use
memory: 2Gi
spark:
# -- (map) The amount of resources that the container requests
requests:
# -- (string) The amount of CPU requested
cpu: 0.3
# -- (string) The amount of memory requested
memory: 128Mi
# -- (map) The maximum amount of resources that the container is allowed to use
limits:
# -- (string) The maximum amount of CPU the container can use
cpu: 1.0
# -- (string) The maximum amount of memory the container can use
memory: 2Gi


esEndpoint: gen3-elasticsearch-master
Expand Down Expand Up @@ -154,3 +142,11 @@ esGarbageCollect:
custom_image:
# -- (string) Slack webhook endpoint to use for cronjob.
slack_webhook: None

schedule: "*/30 * * * *"

suspendCronjob: true

legacySupport: false

etlForced: "TRUE"
4 changes: 2 additions & 2 deletions helm/gen3/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ dependencies:
version: 0.1.16
repository: file://../common
- name: etl
version: 0.1.5
version: 0.1.6
repository: file://../etl
condition: etl.enabled
- name: frontend-framework
Expand Down Expand Up @@ -80,7 +80,7 @@ dependencies:
repository: "file://../sheepdog"
condition: sheepdog.enabled
- name: ssjdispatcher
version: 0.1.13
version: 0.1.14
repository: "file://../ssjdispatcher"
condition: ssjdispatcher.enabled
- name: sower
Expand Down
4 changes: 2 additions & 2 deletions helm/gen3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Helm chart to deploy Gen3 Data Commons
| file://../audit | audit | 0.1.16 |
| file://../aws-es-proxy | aws-es-proxy | 0.1.13 |
| file://../common | common | 0.1.16 |
| file://../etl | etl | 0.1.5 |
| file://../etl | etl | 0.1.6 |
| file://../fence | fence | 0.1.26 |
| file://../frontend-framework | frontend-framework | 0.1.6 |
| file://../gen3-network-policies | gen3-network-policies | 0.1.2 |
Expand All @@ -40,7 +40,7 @@ Helm chart to deploy Gen3 Data Commons
| file://../revproxy | revproxy | 0.1.19 |
| file://../sheepdog | sheepdog | 0.1.20 |
| file://../sower | sower | 0.1.15 |
| file://../ssjdispatcher | ssjdispatcher | 0.1.13 |
| file://../ssjdispatcher | ssjdispatcher | 0.1.14 |
| file://../wts | wts | 0.1.18 |
| https://charts.bitnami.com/bitnami | postgresql | 11.9.13 |
| https://helm.elastic.co | elasticsearch | 7.10.2 |
Expand Down
2 changes: 1 addition & 1 deletion helm/ssjdispatcher/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.13
version: 0.1.14

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
Expand Down
4 changes: 3 additions & 1 deletion helm/ssjdispatcher/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ssjdispatcher

![Version: 0.1.13](https://img.shields.io/badge/Version-0.1.13-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: master](https://img.shields.io/badge/AppVersion-master-informational?style=flat-square)
![Version: 0.1.14](https://img.shields.io/badge/Version-0.1.14-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: master](https://img.shields.io/badge/AppVersion-master-informational?style=flat-square)

A Helm chart for gen3 ssjdispatcher

Expand Down Expand Up @@ -32,6 +32,8 @@ A Helm chart for gen3 ssjdispatcher
| commonLabels | map | `nil` | Will completely override the commonLabels defined in the common chart's _label_setup.tpl |
| criticalService | string | `"true"` | Valid options are "true" or "false". If invalid option is set- the value will default to "false". |
| dispatcherJobNum | string | `"10"` | Ssjdispater job number. |
| externalSecrets | map | `{"credsFile":null}` | External secrets configuration |
| externalSecrets.credsFile | string | `nil` | Will override the name of the aws secrets manager secret. Default is "credentials.json" |
| fullnameOverride | string | `""` | Override the full name of the deployment. |
| gen3Namespace | string | `"default"` | Namespace to deploy the job. |
| global.aws | map | `{"awsAccessKeyId":null,"awsSecretAccessKey":null,"enabled":false}` | AWS configuration |
Expand Down
7 changes: 7 additions & 0 deletions helm/ssjdispatcher/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,10 @@ Create the name of the service account to use
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}

{{/*
ssjdispatcher credentials.json Secrets Manager Name
*/}}
{{- define "creds-file" -}}
{{- default "credentials.json" .Values.externalSecrets.credsFile }}
{{- end }}
19 changes: 19 additions & 0 deletions helm/ssjdispatcher/templates/external-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{{ if .Values.global.externalSecrets.deploy }}
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: ssjdispatcher-creds
spec:
refreshInterval: 5m
secretStoreRef:
name: {{include "common.SecretStore" .}}
kind: SecretStore
target:
name: ssjdispatcher-creds
creationPolicy: Owner
data:
- secretKey: credentials.json
remoteRef:
#name of secret in secrets manager
key: {{include "credsFile" .}}
{{- end }}
12 changes: 12 additions & 0 deletions helm/ssjdispatcher/templates/role-binding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: ssjdispatcher-binding
subjects:
- kind: ServiceAccount
name: {{ include "ssjdispatcher.serviceAccountName" . }}
apiGroup: ""
roleRef:
kind: ClusterRole
name: admin
apiGroup: rbac.authorization.k8s.io
Loading

0 comments on commit 6faa91c

Please sign in to comment.