Skip to content

Commit

Permalink
Annotation Versioning Feature (#184)
Browse files Browse the repository at this point in the history
* Refactored annotation_task to use annotation_units. Updated tests and linted. Should be passing.

* Refactored annotation_task to use annotation units in AnnotationTaskInterface

* Wrapped up refactor. All tests should be passing.

* Added some comments, refactored annotation-task further to get genomic unit and dataset directly from annotation-unit

* Refactored annotation-task further. Added get_genomic_unit_type() and get_dependencies() in AnnotationUnit. Clarified some function names and docstrings. Tests and linting passing locally.

* Processing versioning tasks (#182)

* Pushing up pulled in code changes from the annotation-task refactor, pairing and other work done in the last week, before losing power and wifi.

* Was able to process versioning tasks for annotation units. Skipped all other tasks for now, will be working on that next. Updated test fixture's annotation-configuration to match current Rosalution annotation configuration. Paired with Angelina on some of this stuff.

* testing process tasks for datasets without dependencies and datasets with dependencies

* Tests pass for CPAM0002, need to rework CPAM0046

* Paired with Rabab to refactor how we manage skipping a dependency for unit tests when processing annotation unit tasks; agreed upon a base set of datasets to use in configuration; and updated the neccesary code patching

* Missed a file.

* got it working; heck ya

* tests passing, linting & formatting passing

* wip for genomic units;linting; and formatting

* finished cleaning up genomic unit unit tests and added parameterized test methods to have more then one test case peer unit test

* wip

* wip to get annotation by analysis name'

* backend wip for getting annotations by analysis

* rabab & angelina pair for getting version result

* formatted backend files

* Able to retrieve version for 'rest' versioning type. Hardcoded 'rosalution' type version for rosalution's manifest. Paired with Angelina on Wednesday to create a couple of helper functions for testing. Thursday - Rabab worked on combining & testing all 3 versioning types in one test.

* Fixed some of the linting errors

* Retrieve and show annotations (#180)

* Updated backend to include annotation retrieval for dependencies and ui

* Updated unit tests and integration to pass

* finished added the version calcs and fixed version retrieval, checking if transcripts exists is still broken, the case where transcript_id and no transcripts are listed in the variant is the case that needs to be fixed

* fixed creatining multiple genomic units when uploading twice, investigating why transcripts are showing as not existing when they do

* wrapped up cleaning the tests; removed extra logging; linted and formatted; paired with Rabab

---------

Co-authored-by: SeriousHorncat <[email protected]>

* wrapping up first draft of migration script;and and tidying up (#183)

* wrapping up first draft of migration script;and tidying up feature

* added a port for local mongodb developer so vscode can connect to mongodb in the container;  fixed documentation in script for the example run command

* updated the script to handle the rename and removed overwrite input

* Adding back in the test case Rabab created that i accidently removed via a force-push;  updated initial seed fixtures to use calculated annotation versions; cleaned up the documentation for the create annotation manifest script for migration

* Added back the feature to append annotations to an existing dataset within genomic units

* caught a mistake with the initial seed fixtures; changes were incomplete; this should fix that

* missed fixing some unit tests affected by a change, ended up refactoring the tests to test more clearly

---------

Co-authored-by: SeriousHorncat <[email protected]>
  • Loading branch information
fatimarabab and SeriousHorncat authored Sep 30, 2024
1 parent 0531fba commit 1be997d
Show file tree
Hide file tree
Showing 34 changed files with 26,592 additions and 23,958 deletions.
116 changes: 73 additions & 43 deletions backend/src/core/annotation.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
import logging
import queue

from .annotation_task import AnnotationTaskFactory
from ..repository.analysis_collection import AnalysisCollection
from ..repository.genomic_unit_collection import GenomicUnitCollection

from .annotation_task import AnnotationTaskFactory, VersionAnnotationTask
from ..models.analysis import Analysis
from ..repository.annotation_config_collection import AnnotationConfigCollection
from ..core.annotation_unit import AnnotationUnit
Expand Down Expand Up @@ -83,69 +86,96 @@ def queue_annotation_tasks(self, analysis: Analysis, annotation_task_queue: Anno
annotation_task_queue.put(annotation_unit_queued)

@staticmethod
def process_tasks(annotation_queue, genomic_unit_collection): # pylint: disable=too-many-locals
def process_tasks(
annotation_queue: AnnotationQueue, analysis_name: str, genomic_unit_collection: GenomicUnitCollection,
analysis_collection: AnalysisCollection
): # pylint: disable=too-many-branches,too-many-locals
"""Processes items that have been added to the queue"""
logger.info("%s Processing annotation tasks queue ...", annotation_log_label())

with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
annotation_task_futures = {}
while not annotation_queue.empty():
annotation_unit = annotation_queue.get()
latest = False
if genomic_unit_collection.annotation_exist(annotation_unit.genomic_unit, annotation_unit.dataset
) and annotation_unit.is_version_latest():
logger.info('%s Annotation Exists...', format_annotation_logging(annotation_unit))
latest = True
continue
ready = True

if annotation_unit.has_dependencies():
missing_dependencies = annotation_unit.get_missing_dependencies()
for missing in missing_dependencies:
annotation_value = genomic_unit_collection.find_genomic_unit_annotation_value(
annotation_unit.genomic_unit, missing
)
ready = annotation_unit.ready_for_annotation(annotation_value, missing)

if not ready and not latest:
if annotation_unit.should_continue_annotation():
logger.info(
'%s Delaying Annotation, Missing %s Dependencies...',
format_annotation_logging(annotation_unit), annotation_unit.get_missing_dependencies()
)
annotation_queue.put(annotation_unit)
else:
logger.info(
'%s Canceling Annotation, Missing %s Dependencies...',
format_annotation_logging(annotation_unit), annotation_unit.get_missing_dependencies()
)
if not annotation_unit.version_exists():
version_task = AnnotationTaskFactory.create_version_task(annotation_unit)
logger.info('%s Creating Task To Version...', format_annotation_logging(annotation_unit))
annotation_task_futures[executor.submit(version_task.annotate)] = version_task
else:
if genomic_unit_collection.annotation_exist(annotation_unit):
logger.info('%s Annotation Exists...', format_annotation_logging(annotation_unit))
continue

if annotation_unit.has_dependencies():
missing_dependencies = annotation_unit.get_missing_dependencies()
for missing_dataset_name in missing_dependencies:
analysis_manifest_dataset = analysis_collection.get_manifest_dataset_config(
analysis_name, missing_dataset_name
)
if analysis_manifest_dataset is None:
continue

continue
dependency_annotation_unit = AnnotationUnit(
annotation_unit.genomic_unit, analysis_manifest_dataset
)
dependency_annotation_unit.set_latest_version(analysis_manifest_dataset['version'])
annotation_value = genomic_unit_collection.find_genomic_unit_annotation_value(
dependency_annotation_unit
)
if annotation_value:
annotation_unit.set_annotation_for_dependency(missing_dataset_name, annotation_value)

task = AnnotationTaskFactory.create(annotation_unit.genomic_unit, annotation_unit.dataset)
logger.info('%s Creating Task To Annotate...', format_annotation_logging(annotation_unit))
if not annotation_unit.conditions_met_to_gather_annotation():
if annotation_unit.should_continue_annotation():
logger.info(
'%s Delaying Annotation, Missing %s Dependencies %s/10 times...',
format_annotation_logging(annotation_unit), annotation_unit.get_missing_dependencies(),
annotation_unit.get_delay_count() + 1
)
annotation_queue.put(annotation_unit)
else:
logger.info(
'%s Canceling Annotation, Missing %s Dependencies...',
format_annotation_logging(annotation_unit), annotation_unit.get_missing_dependencies()
)
continue

annotation_task = AnnotationTaskFactory.create_annotation_task(annotation_unit)
logger.info('%s Creating Task To Annotate...', format_annotation_logging(annotation_unit))

annotation_task_futures[executor.submit(task.annotate)] = (annotation_unit.genomic_unit, task)
annotation_task_futures[executor.submit(annotation_task.annotate)] = annotation_task

for future in concurrent.futures.as_completed(annotation_task_futures):
annotation_unit.genomic_unit, annotation_task = annotation_task_futures[future]
logger.info('%s Query completed...', format_annotation_logging(annotation_unit))
task = annotation_task_futures[future]

try:
result_temp = future.result()

for annotation in annotation_task.extract(result_temp):
task_process_result = future.result()
if isinstance(task, VersionAnnotationTask):
annotation_unit = task.annotation_unit
version = task.extract_version(task_process_result)
annotation_unit.set_latest_version(version)
logger.info(
'%s Saving %s...',
format_annotation_logging(annotation_unit, annotation_task.dataset['data_set']),
annotation['value']
'%s Version Calculated %s...', format_annotation_logging(annotation_unit), version
)
genomic_unit_collection.annotate_genomic_unit(annotation_task.genomic_unit, annotation)
analysis_collection.add_dataset_to_manifest(analysis_name, annotation_unit)
annotation_queue.put(annotation_unit)
else:
for annotation in task.extract(task_process_result):
logger.info(
'%s Saving %s...',
format_annotation_logging(annotation_unit, task.annotation_unit.get_dataset_name()),
annotation['value']
)

genomic_unit_collection.annotate_genomic_unit(
task.annotation_unit.genomic_unit, annotation
)

except FileNotFoundError as error:
logger.info(
'%s exception happened %s with %s and %s', annotation_log_label(), error,
annotation_unit.genomic_unit, annotation_task
annotation_unit.genomic_unit, task
)

del annotation_task_futures[future]
Expand Down
Loading

0 comments on commit 1be997d

Please sign in to comment.