Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production Deploy #980

Merged
merged 134 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
7b27ef1
add infrastructure/batch directory, add batch/roles.tf file which inc…
avrohomgottlieb Sep 18, 2024
e14f579
add batch/security.tf file with batch security group and db resource …
avrohomgottlieb Sep 18, 2024
3ff3441
add batch/computed_environment.tf
avrohomgottlieb Sep 19, 2024
f073407
change compute_environment name
avrohomgottlieb Sep 25, 2024
7b6b0d6
update ecs_task role name
avrohomgottlieb Sep 25, 2024
ad8699f
add first pass at job_queue
avrohomgottlieb Sep 25, 2024
1443cdb
Merge pull request #889 from AlexsLemonade/feature/loader
avrohomgottlieb Sep 26, 2024
a08b3da
fix typo in job_queue, format batch terraform files
avrohomgottlieb Sep 26, 2024
695004a
update queue and compute_environment names
avrohomgottlieb Sep 26, 2024
9d3adda
first pass at batch job_definition
avrohomgottlieb Sep 26, 2024
8d49eb4
update name property of job queue
avrohomgottlieb Sep 26, 2024
9cc543d
update resourceRequirements to current version
avrohomgottlieb Sep 26, 2024
93672a4
Merge pull request #899 from AlexsLemonade/avrohom/844-batch-computed…
avrohomgottlieb Sep 27, 2024
559c96f
add ephemeralStorage and fargatePlatformConfiguration in job_definiti…
avrohomgottlieb Sep 27, 2024
5f328b3
update dockerhub image name
avrohomgottlieb Sep 27, 2024
8de8591
update command in job_definition::container_properties
avrohomgottlieb Sep 27, 2024
2ac9ab8
Merge pull request #910 from AlexsLemonade/avrohom/900-add-batch-job-…
avrohomgottlieb Oct 1, 2024
a1142bb
update command in job_definition
avrohomgottlieb Oct 1, 2024
16d247f
remove aws provider version comment
avrohomgottlieb Oct 7, 2024
89f0048
Merge pull request #911 from AlexsLemonade/avrohom/901-add-batch-job-…
avrohomgottlieb Oct 7, 2024
0a641b8
move aws provider from networking.tf to provider.tf
avrohomgottlieb Oct 9, 2024
8a1a67a
update aws provider version to 5.12.0
avrohomgottlieb Oct 9, 2024
0ab77b9
Merge pull request #929 from AlexsLemonade/avrohom/918-aws-provider-i…
avrohomgottlieb Oct 11, 2024
1aa5b61
add batch/variables.tf file, include environment variable
avrohomgottlieb Oct 13, 2024
da616ca
update reference in job_definition, rename var from environment to ba…
avrohomgottlieb Oct 13, 2024
9b29737
move batch_image and batch_resource_requirement fields from job_defin…
avrohomgottlieb Oct 13, 2024
7d0b4ba
move image and resource_requirement var values back into job_definition
avrohomgottlieb Oct 14, 2024
a28f960
add retry_strategy job_definition::container_properties
avrohomgottlieb Oct 14, 2024
b5a9f4e
move environment_variables from batch/variables.tf back into job_defi…
avrohomgottlieb Oct 14, 2024
92a7849
fix hidden computed_file query bug in Project::purge_computed_files
avrohomgottlieb Oct 14, 2024
82ce51e
add Sample::purge_computed_files method
avrohomgottlieb Oct 14, 2024
b578d27
add ComputedFile::purge, fix a few Project and Sample CF related name…
avrohomgottlieb Oct 15, 2024
b9d8476
add break up loader::_create_computed_file into loader::_create_compu…
avrohomgottlieb Oct 15, 2024
8df351d
add loader::generate_computed_file function
avrohomgottlieb Oct 15, 2024
19bd10e
convert all args to kwargs in loader::generate_computed_file
avrohomgottlieb Oct 15, 2024
77ae571
add generate_computed_file management command
avrohomgottlieb Oct 15, 2024
6a8c4f5
add validation to generate_computed_file command
avrohomgottlieb Oct 15, 2024
262e2a4
Merge pull request #937 from AlexsLemonade/dev
avrohomgottlieb Oct 16, 2024
0bd2539
remove computed_file_name parameter from CF::get_project|sample_file
avrohomgottlieb Oct 16, 2024
4a30d4e
improve readability of loader::generate_computed_file
avrohomgottlieb Oct 16, 2024
f2c04b3
remove return early statements, clean up logging statements
avrohomgottlieb Oct 16, 2024
99a1175
Merge branch 'feature/batch' into avrohom/915-add-generate-computed-f…
avrohomgottlieb Oct 16, 2024
a2bf77c
add dispatch_to_batch command
avrohomgottlieb Oct 16, 2024
d12761f
improve logging statements, fix download config reference bug
avrohomgottlieb Oct 28, 2024
5b10871
improve code quality in projects query
avrohomgottlieb Oct 28, 2024
b8bfdbc
enforce usage of computed_file property in project and sample model m…
avrohomgottlieb Oct 28, 2024
84cc335
Merge pull request #933 from AlexsLemonade/avrohom/920-add-batch-envars
avrohomgottlieb Oct 28, 2024
5ab62f6
Merge pull request #936 from AlexsLemonade/avrohom/915-add-generate-c…
avrohomgottlieb Oct 29, 2024
4fb9a64
Add access to local public key for dev env in deploy script
avrohomgottlieb Oct 31, 2024
2de4984
Rename dockerhub_repo var to dockerhub_account throughout infrastruct…
avrohomgottlieb Oct 31, 2024
29aeec0
update infrastructure readme with new dev stack instructions
avrohomgottlieb Oct 31, 2024
e838b34
add public ssh key and 1password integrations to gitignore
avrohomgottlieb Oct 31, 2024
5d7ac2f
fix typo in infrastructure readme
avrohomgottlieb Oct 31, 2024
0b6ec04
Merge pull request #940 from AlexsLemonade/avrohom/939-update-deploy-…
avrohomgottlieb Nov 1, 2024
ce649a9
add batch module file, and batch/variables file in order to get resou…
avrohomgottlieb Nov 3, 2024
701624a
update database depricated field name
avrohomgottlieb Nov 3, 2024
b6978ff
update reference to db field
avrohomgottlieb Nov 3, 2024
19164c9
update references to resources passed in as vars, fix typos
avrohomgottlieb Nov 3, 2024
d2c9155
fix all deprecation errors
avrohomgottlieb Nov 3, 2024
cfde68e
update aws_vpc and aws_cloudwatch_log_stream resource names to confor…
avrohomgottlieb Nov 4, 2024
d308bc5
update ecs_task policy resource names to conform to naming convention…
avrohomgottlieb Nov 4, 2024
b2d3b70
keep job_queue::compute_environments for now (will update to job_queu…
avrohomgottlieb Nov 4, 2024
73385ce
rename aws_vpc and aws_cloudwatch_log_stream resource names to origin…
avrohomgottlieb Nov 4, 2024
aa891e2
add project property method all_samples_no_multiplexed_duplicates
avrohomgottlieb Nov 5, 2024
6603b1c
remove locks from ComputedFile::get_sample_file
avrohomgottlieb Nov 5, 2024
3af36ae
add ComputedFile::bulk_create_multiplexed_files
avrohomgottlieb Nov 5, 2024
4af3ea1
update loader::_create_computed_file to account for new multiplexed s…
avrohomgottlieb Nov 5, 2024
acf9744
fix bugs in loader::_create_computed_file, ComputedFile::bulk_create_…
avrohomgottlieb Nov 5, 2024
5ae96dc
update multiplexed sample test in test_loader, along with expected va…
avrohomgottlieb Nov 5, 2024
38cbb1a
update dispatch to batch caller with new project.sample query
avrohomgottlieb Nov 6, 2024
c7461cc
clarify comments, update typos
avrohomgottlieb Nov 6, 2024
dd82522
rename Project::all_samples_no_multipexed_duplicates to Project::samp…
avrohomgottlieb Nov 7, 2024
bee5265
rename ComputedFile::bulk_create_multiplexed_files to ComputedFile::g…
avrohomgottlieb Nov 7, 2024
3afe296
Merge pull request #945 from AlexsLemonade/avrohom/941-debug-batch-im…
avrohomgottlieb Nov 8, 2024
7e50523
clean up ComputedFile::get_multiplexed_computed_files by swapping out…
avrohomgottlieb Nov 8, 2024
42bfd74
add batch_job_role, add jobRoleArn to job_definition, tighten up batc…
avrohomgottlieb Nov 10, 2024
2201241
wire up job queueand job definition access through terraform for job …
avrohomgottlieb Nov 10, 2024
d2c2712
fix typos in dispatch_to_batch
avrohomgottlieb Nov 10, 2024
4ebee70
add batch_submit_job policy to api permissions
avrohomgottlieb Nov 10, 2024
c033dff
allow public network access for batch jobs, fix incorrect reference
avrohomgottlieb Nov 10, 2024
b0f76f0
ensure access to AWS credential envars inside docker container
avrohomgottlieb Nov 10, 2024
2be5043
add load-metadata, generate-computed-files, and dispatch-to-batch to …
avrohomgottlieb Nov 10, 2024
82e2239
fix json errors in new batch_job_role s3_access_policy
avrohomgottlieb Nov 10, 2024
a0921dd
fix spatial input metadata file exclusion bug for batch
avrohomgottlieb Nov 13, 2024
6843730
Merge pull request #946 from AlexsLemonade/avrohom/943-fix-multiplexe…
avrohomgottlieb Nov 13, 2024
73beeba
merge in updates from 743-fix-multiplexed-for-batch
avrohomgottlieb Nov 13, 2024
00c1fe3
remove dispatch_to_batch and mock-data from sportal
avrohomgottlieb Nov 13, 2024
6b37c6c
invert return logic in Library::get_data_file_paths
avrohomgottlieb Nov 13, 2024
a9e2d25
move envars accession out of app code and into config
avrohomgottlieb Nov 13, 2024
3e7425e
move AWS_REGION from confing/prod to config/common
avrohomgottlieb Nov 14, 2024
7e9bad3
Merge pull request #947 from AlexsLemonade/avrohom/debug-dispatch-to-…
avrohomgottlieb Nov 14, 2024
efefb62
Merge branch 'dev' into feature/batch
avrohomgottlieb Nov 14, 2024
0c0c4e1
Merge pull request #953 from AlexsLemonade/avrohom/merge-dev-into-fea…
avrohomgottlieb Nov 15, 2024
2191d6b
Merge pull request #890 from AlexsLemonade/feature/batch
avrohomgottlieb Nov 15, 2024
eb5bd3f
Added corrected svg
dvenprasad Nov 20, 2024
de44ba4
upgrade GHA terraform version to 1.9.8, and actions/checkout to v4
avrohomgottlieb Nov 21, 2024
a4873ed
upgrade hashicorp/setup-terraform from v1 to v3
avrohomgottlieb Nov 21, 2024
3473f7a
Merge pull request #968 from AlexsLemonade/avrohom/964-address-terraf…
avrohomgottlieb Nov 21, 2024
41ea6cf
downgrade to tf 0.13.0 in github gha workflows
avrohomgottlieb Nov 21, 2024
8019958
Merge pull request #969 from AlexsLemonade/avrohom/downgrade-to-tf-0-13
avrohomgottlieb Nov 21, 2024
41120fa
change aws provider version to 4.0.0
avrohomgottlieb Nov 21, 2024
fe5260c
comment out batch to go around new feature usages while changing aws …
avrohomgottlieb Nov 21, 2024
28ee16f
comment out batch module output vars and add TODOs to uncomment later
avrohomgottlieb Nov 21, 2024
600cf09
Merge pull request #970 from AlexsLemonade/avrohom/upgrade-aws-provid…
davidsmejia Nov 21, 2024
a7176c2
downgrade aws provider version from 4.0.0 to 3.76.1
avrohomgottlieb Nov 21, 2024
5c8c446
Merge pull request #971 from AlexsLemonade/avrohom/downgrade-aws-vers…
avrohomgottlieb Nov 21, 2024
bdc5d94
Merge pull request #967 from AlexsLemonade/deepa-fix-about-page-image…
dvenprasad Nov 22, 2024
db8128b
change aws_provider version back to 3.37.0, revert deprecation warnin…
avrohomgottlieb Nov 22, 2024
a2001d0
handle merge conflict
avrohomgottlieb Nov 22, 2024
56c6432
comment out newer 5.12.0 version of aws resources
avrohomgottlieb Nov 22, 2024
ca9fe3d
fix typo
avrohomgottlieb Nov 22, 2024
1c2d784
add upgrade flag to init_terraform script
avrohomgottlieb Nov 22, 2024
e538730
Update infrastructure/variables.tf
avrohomgottlieb Nov 22, 2024
8d8a88f
Update infrastructure/api.tf
avrohomgottlieb Nov 22, 2024
11105d6
Update infrastructure/s3.tf
avrohomgottlieb Nov 22, 2024
d049c27
Update infrastructure/s3.tf
avrohomgottlieb Nov 22, 2024
3a0fc2f
Update infrastructure/s3.tf
avrohomgottlieb Nov 22, 2024
127f372
Update infrastructure/networking.tf
avrohomgottlieb Nov 22, 2024
923d24f
Update infrastructure/database.tf
avrohomgottlieb Nov 22, 2024
bf43e39
Merge pull request #972 from AlexsLemonade/avrohom/revert-tf-resource…
avrohomgottlieb Nov 22, 2024
6f7b666
move aws provider version to provider block
avrohomgottlieb Nov 22, 2024
b9bc9b0
Merge pull request #973 from AlexsLemonade/avrohom/tf-move-version-to…
avrohomgottlieb Nov 22, 2024
a499cd9
set tf og to trace in init_terraform script
avrohomgottlieb Nov 22, 2024
01d28a0
Merge pull request #974 from AlexsLemonade/avrohom/set-tf-log-to-trace
avrohomgottlieb Nov 22, 2024
e1929ce
update popen command with TF_LOG envar
avrohomgottlieb Nov 22, 2024
d52df33
Merge pull request #975 from AlexsLemonade/avrohom/add-tf-log-to-popen
avrohomgottlieb Nov 22, 2024
4a21884
remove unnecessary double TF_LOG
avrohomgottlieb Nov 22, 2024
c2cdfdf
Merge pull request #976 from AlexsLemonade/avrohom/remove-double-tf-logg
avrohomgottlieb Nov 22, 2024
696e578
try different source and newer requirement version locking
avrohomgottlieb Nov 22, 2024
d7ff5f9
Merge pull request #977 from AlexsLemonade/avrohom/terraform-provider…
avrohomgottlieb Nov 22, 2024
29a934a
remove default tags as they're already included in the provider block
avrohomgottlieb Nov 22, 2024
46b2f20
Merge pull request #978 from AlexsLemonade/avrohom/fix-tags-and-acl-i…
avrohomgottlieb Nov 22, 2024
2e60cb7
remove extra db tag
avrohomgottlieb Nov 22, 2024
59327f9
Merge pull request #979 from AlexsLemonade/avrohom/fix-extra-db-tag
avrohomgottlieb Nov 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/deploy_prod_backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Load 1Password Secrets
id: op-load-secrets
Expand All @@ -31,9 +31,9 @@ jobs:
SENTRY_DSN: "${{ secrets.OP_SENTRY_DSN }}"

- name: Setup Terraform
uses: hashicorp/setup-terraform@v1
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 0.12.26
terraform_version: 0.13.0

- name: Deploy
run: cd infrastructure && python3 deploy.py -e prod -u deployer -d ccdl -v $(git rev-parse HEAD)
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/deploy_staging_backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Load 1Password Secrets
id: op-load-secrets
Expand All @@ -31,9 +31,9 @@ jobs:
SENTRY_DSN: "${{ secrets.OP_SENTRY_DSN }}"

- name: Setup Terraform
uses: hashicorp/setup-terraform@v1
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 0.12.26
terraform_version: 0.13.0

- name: Deploy
run: cd infrastructure && python3 deploy.py -e staging -u deployer -d ccdlstaging -v $(git rev-parse HEAD)
Expand Down
10 changes: 7 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,13 @@ infrastructure/.terraform.lock.hcl
.vscode
*.code-workspace

# SSH keys
*.pem
*.pub

# 1Password integration
.op/

#
# Client
#
Expand All @@ -147,9 +154,6 @@ client/out/
# production
client/build

# misc
*.pem

# debug
client/npm-debug.log*
client/yarn-debug.log*
Expand Down
3 changes: 3 additions & 0 deletions api/scpca_portal/config/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,3 +184,6 @@ class Common(Configuration):
CORS_ALLOW_HEADERS = default_headers + (API_KEY_HEADER,)

TERMS_AND_CONDITIONS = "PLACEHOLDER"

# AWS
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
7 changes: 4 additions & 3 deletions api/scpca_portal/config/production.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,14 @@ class Production(Common):

UPDATE_S3_DATA = True

# AWS
AWS_REGION = os.getenv("AWS_REGION")

# AWS S3
AWS_S3_INPUT_BUCKET_NAME = "scpca-portal-inputs"
AWS_S3_OUTPUT_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME")

# AWS Batch
AWS_BATCH_JOB_QUEUE_NAME = os.environ.get("AWS_BATCH_JOB_QUEUE_NAME")
AWS_BATCH_JOB_DEFINITION_NAME = os.environ.get("AWS_BATCH_JOB_DEFINITION_NAME")

# https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching#cache-control
# Response can be cached by browser and any intermediary caches
# (i.e. it is "public") for up to 1 day
Expand Down
70 changes: 50 additions & 20 deletions api/scpca_portal/loader.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import shutil
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from threading import Lock
from typing import Any, Dict, List, Set

from django.conf import settings
Expand All @@ -10,7 +9,14 @@

from scpca_portal import common, metadata_file, s3
from scpca_portal.config.logging import get_and_configure_logger
from scpca_portal.models import ComputedFile, Contact, ExternalAccession, Project, Publication
from scpca_portal.models import (
ComputedFile,
Contact,
ExternalAccession,
Project,
Publication,
Sample,
)

logger = get_and_configure_logger(__name__)

Expand Down Expand Up @@ -136,25 +142,55 @@ def create_project(
return project


def _create_computed_file(future, *, update_s3: bool, clean_up_output_data: bool) -> None:
def _create_computed_file(
computed_file: ComputedFile, update_s3: bool, clean_up_output_data: bool
) -> None:
"""
Save computed file returned from future to the db.
Upload file to s3 and clean up output data depending on passed options.
"""
if computed_file := future.result():

# Only upload and clean up projects and the last sample if multiplexed
if computed_file.project or computed_file.sample.is_last_multiplexed_sample:
if update_s3:
s3.upload_output_file(computed_file.s3_key, computed_file.s3_bucket)
if clean_up_output_data:
computed_file.clean_up_local_computed_file()
if update_s3:
s3.upload_output_file(computed_file.s3_key, computed_file.s3_bucket)
if clean_up_output_data:
computed_file.clean_up_local_computed_file()

if computed_file.sample and computed_file.has_multiplexed_data:
computed_files = computed_file.get_multiplexed_computed_files()
ComputedFile.objects.bulk_create(computed_files)
else:
computed_file.save()


def _create_computed_file_callback(future, *, update_s3: bool, clean_up_output_data: bool) -> None:
"""
Wrap computed file saving and uploading to s3 in a way that accommodates multiprocessing.
"""
if computed_file := future.result():
_create_computed_file(computed_file, update_s3, clean_up_output_data)

# Close DB connection for each thread.
connection.close()


def generate_computed_file(
*,
download_config: Dict,
project: Project | None = None,
sample: Sample | None = None,
update_s3: bool = True,
) -> None:

# Purge old computed file
if old_computed_file := (project or sample).get_computed_file(download_config):
old_computed_file.purge(update_s3)

if project and (computed_file := ComputedFile.get_project_file(project, download_config)):
_create_computed_file(computed_file, update_s3, clean_up_output_data=False)
if sample and (computed_file := ComputedFile.get_sample_file(sample, download_config)):
_create_computed_file(computed_file, update_s3, clean_up_output_data=False)
sample.project.update_downloadable_sample_count()


def generate_computed_files(
project: Project,
max_workers: int,
Expand All @@ -170,33 +206,27 @@ def generate_computed_files(

# Prep callback function
on_get_file = partial(
_create_computed_file,
_create_computed_file_callback,
update_s3=update_s3,
clean_up_output_data=clean_up_output_data,
)
# Prepare a threading.Lock for each sample, with the chief purpose being to protect
# multiplexed samples that share a zip file.
locks = {}

with ThreadPoolExecutor(max_workers=max_workers) as tasks:
# Generated project computed files
for config in common.GENERATED_PROJECT_DOWNLOAD_CONFIGS:
tasks.submit(
ComputedFile.get_project_file,
project,
config,
project.get_output_file_name(config),
).add_done_callback(on_get_file)

# Generated sample computed files
for sample in project.samples.all():
for sample in project.samples_to_generate:
for config in common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS:
sample_lock = locks.setdefault(sample.get_config_identifier(config), Lock())
tasks.submit(
ComputedFile.get_sample_file,
sample,
config,
sample.get_output_file_name(config),
sample_lock,
).add_done_callback(on_get_file)

project.update_downloadable_sample_count()
91 changes: 91 additions & 0 deletions api/scpca_portal/management/commands/dispatch_to_batch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import logging

from django.conf import settings
from django.core.management.base import BaseCommand

import boto3

from scpca_portal import common
from scpca_portal.models import Project

batch = boto3.client(
"batch",
region_name=settings.AWS_REGION,
)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())


class Command(BaseCommand):
help = """
Submits all computed file combinations to the specified AWS Batch job queue
for projects for which computed files have yet to be generated for them.
If a project-id is passed, then computed files are only submitted for that specific project.
"""

def add_arguments(self, parser):
parser.add_argument("--project-id", type=str)

def handle(self, *args, **kwargs):
self.dispatch_to_batch(**kwargs)

def submit_job(
self,
*,
download_config_name: str,
project_id: str = "",
sample_id: str = "",
) -> None:
"""
Submit job to AWS Batch, accordingly to the resource_id and download_config combination.
"""
resource_flag = "--project-id" if project_id else "--sample-id"
resource_id = project_id if project_id else sample_id
job_name = f"{resource_id}-{download_config_name}"

response = batch.submit_job(
jobName=job_name,
jobQueue=settings.AWS_BATCH_JOB_QUEUE_NAME,
jobDefinition=settings.AWS_BATCH_JOB_DEFINITION_NAME,
containerOverrides={
"command": [
"python",
"manage.py",
"generate_computed_file",
resource_flag,
resource_id,
"--download-config-name",
download_config_name,
],
},
)

logger.info(f'{job_name} submitted to Batch with jobId {response["jobId"]}')

def dispatch_to_batch(self, project_id: str = "", **kwargs):
"""
Iterate over all projects that don't have computed files and submit each
resource_id and download_config combination to the Batch queue.
If a project id is passed, then computed files are created for all combinations
within that project.
"""
projects = (
Project.objects.filter(project_computed_files__is_null=True)
if not project_id
else Project.objects.filter(scpca_id=project_id)
)

for project in projects:
for download_config_name in common.PROJECT_DOWNLOAD_CONFIGS.keys():
self.submit_job(
project_id=project.scpca_id,
download_config_name=download_config_name,
)

for sample in project.samples_to_generate:
for download_config_name in common.SAMPLE_DOWNLOAD_CONFIGS.keys():
self.submit_job(
sample_id=sample.scpca_id,
download_config_name=download_config_name,
)
75 changes: 75 additions & 0 deletions api/scpca_portal/management/commands/generate_computed_file.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import logging

from django.core.management.base import BaseCommand

from scpca_portal import common, loader
from scpca_portal.models import Project, Sample

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())


class Command(BaseCommand):
help = """
This command is meant to be called as an entrypoint to AWS Batch Fargate job instance.
Individual files are computed according:
- To the project or sample id
- An appropriate corresponding download config

When computation is completed, files are uploaded to S3, and the job is marked as completed.

At which point the instance which generated this computed file will receive a new job
from the job queue and begin computing the next file.
"""

def add_arguments(self, parser):
parser.add_argument("--project-id", type=str)
parser.add_argument("--sample-id", type=str)
parser.add_argument("--download-config-name", type=str)

def handle(self, *args, **kwargs):
self.generate_computed_file(**kwargs)

def generate_computed_file(
self,
project_id: str,
sample_id: str,
download_config_name: str,
**kwargs,
) -> None:
"""Generates a project's computed files according predetermined download configurations"""
loader.prep_data_dirs()

ids_not_mutually_exclusive = project_id and sample_id or (not project_id and not sample_id)
if ids_not_mutually_exclusive:
logger.error(
"Invalid id combination. Passed ids must be mutually exclusive."
"Either a project_id or a sample_id must be passed, but not both or neither."
)

if project_id:
project = Project.objects.filter(scpca_id=project_id).first()
if not project:
logger.error(f"{project} does not exist.")
if download_config_name not in common.PROJECT_DOWNLOAD_CONFIGS.keys():
logger.error(f"{download_config_name} is not a valid project download config name.")
logger.info(
f"Here are valid download_config_name values for projects: "
f"{common.PROJECT_DOWNLOAD_CONFIGS.keys()}"
)
download_config = common.PROJECT_DOWNLOAD_CONFIGS[download_config_name]
loader.generate_computed_file(project=project, download_config=download_config)

if sample_id:
sample = Sample.objects.filter(scpca_id=sample_id).first()
if not sample:
logger.error(f"{sample} does not exist.")
if download_config_name not in common.SAMPLE_DOWNLOAD_CONFIGS.keys():
logger.error(f"{download_config_name} is not a valid sample download config name.")
logger.info(
f"Here are valid download_config_name values for samples: "
f"{common.SAMPLE_DOWNLOAD_CONFIGS.keys()}"
)
download_config = common.SAMPLE_DOWNLOAD_CONFIGS[download_config_name]
loader.generate_computed_file(sample=sample, download_config=download_config)
Loading
Loading