-
Notifications
You must be signed in to change notification settings - Fork 1
2. Data ingest
Scott Warchal edited this page Feb 23, 2023
·
6 revisions
Most of the relevant code is contained in the ingest
module, with some functions from utils.
-
plaque_assay
is launched when a pair of plates for a givenworkflow_id
andvariant
are exported. The only data directly given toplaque_assay
is a list containing 2 strings, which are paths to the the 2 replicate plates (plate_list: List[str]
)
variant = utils.get_variant_from_plate_list(plate_list, session)
workflow_id = utils.get_workflow_id_from_plate_list(plate_list)
dataset = ingest.read_data_from_list(plate_list)
indexfiles = ingest.read_indexfiles_from_list(plate_list)
dataset["variant"] = variant
indexfiles["variant"] = variaint
# do stuff with dataset
-
The workflow_id is parsed directly from the plate barcodes within the path
utils.get_workflow_id_from_plate_list()
. It will error if the workflow_id is not the same for the two plates. -
The variant is parsed as an integer from the path, which is then used to query the
NE_available_strains
table to obtain the name of the variant withutils.get_variant_from_plate_list()
. It will fail with aVariantLookupError
if there is no match in the database for that integer.
-
The
PlateResults.txt
files are read in as pandas DataFrames withingest.read_data_from_list()
.- Reads in dataframes and concatenates them.
- Re-labels wells to 96-well format.
- Adds metadata: variant, barcode, dilution etc.
-
The
indexfile.txt
is read in as a pandas DataFrame.