Skip to content

2. Data ingest

Scott Warchal edited this page Feb 14, 2022 · 6 revisions

Most of the relevant code is contained in the ingest module, with some functions from utils.

  • plaque_assay is launched when a pair of plates for a given workflow_id and variant is exported. The only data directly given to plaque_assay is a list containing 2 strings, which are paths to the the 2 replicate plates (plate_list: List[str])

  • The workflow_id is parsed directly from the plate barcodes within the path utils.get_workflow_id_from_plate_list(). It will error if the workflow_id is not the same for the two plates.

  • The variant is parsed as an integer from the path, which is then used to query the NE_available_strains table to obtain the name of the variant with utils.get_variant_from_plate_list(). It will fail with VariantLookupError if there is no match in the database.

  • The PlateResults.txt files are read in as pandas DataFrames with ingest.read_data_from_list().

    • Reads in dataframes and concatenates them.
    • Re-labels wells to 96-well format.
    • Adds metadata: variant, barcode, dilution etc.
  • The indexfile.txt is read in as a pandas DataFrame.

Clone this wiki locally