Skip to content

2. Data ingest

Scott Warchal edited this page Feb 14, 2022 · 6 revisions

Most of the relevant code is contained in the ingest module, with some functions from utils.

  • plaque_assay is launched when a pair of plates for a given workflow_id and variant is exported. The only data directly given to plaque_assay is a list containing 2 strings, which are paths to the the 2 replicate plates (plate_list: List[str])
    variant = utils.get_variant_from_plate_list(plate_list, session)
    workflow_id = utils.get_workflow_id_from_plate_list(plate_list)
    dataset = ingest.read_data_from_list(plate_list)
    indexfiles = ingest.read_indexfiles_from_list(plate_list)
    dataset["variant"] = variant
    indexfiles["variant"] = variaint
    
    # do stuff with dataset

Parsing metadata from paths

  • The workflow_id is parsed directly from the plate barcodes within the path utils.get_workflow_id_from_plate_list(). It will error if the workflow_id is not the same for the two plates.

  • The variant is parsed as an integer from the path, which is then used to query the NE_available_strains table to obtain the name of the variant with utils.get_variant_from_plate_list(). It will fail with VariantLookupError if there is no match in the database.

Reading in tables

  • The PlateResults.txt files are read in as pandas DataFrames with ingest.read_data_from_list().

    • Reads in dataframes and concatenates them.
    • Re-labels wells to 96-well format.
    • Adds metadata: variant, barcode, dilution etc.
  • The indexfile.txt is read in as a pandas DataFrame.

Clone this wiki locally