DPL-471-1: malformed root_sample_ids that have duplicates in MLWH, and WERE picked #539
Labels
Data integrity
data fix
Enhancement
New feature or request
GSU
Delivers work for the GSU unit
Heron
RVI
RVI Project
User Story
Part of the wider DPL-471 issue which spawned in turn from DPL-048. Related are DPL-048-2, DPL-048-3 and DPL-048-4.
This story concerns the 94 malformed root_sample_ids in the MLWH
lighthouse_sample
table (+ MongoDB) which have duplicate samples with the correct root_sample_id. These came from the MK lighthouse lab in August 2021 and have an extra substring (something like '_RNA123456789') concatenated on the end of the correct ID.The samples were picked at some point and are therefore found in SequenceScape and Event Warehouse as well as in the MLWH
sample
table. As such these need to be addressed in all 5 places (MLWH x2, Mongo, SS, EW). These are also in theiq_seq_flowcell
table, meaning they have been picked/sequenced and we need to investigate how far they have gone. Indeed these 94 samples all show up twice in theiq_seq_flowcell
table, so they have been picked/sequenced twice.Fix
The main issue is that since these are duplicated, we cannot simply fix the root_sample_id in the databases. The root_sample_id/plate_barcode/coordinate combination must be unique, and the fixed IDs break this uniqueness. We also can't really delete the rows since these samples were used and have a paper trail of picking -> sequencing etc. that shouldn't be broken.
For SequenceScape/the MLWH sample table, we can add a flag in the description or comments which shows that the sample is a duplicate. For MLWH/MongoDB, we may not be able to do this, and the data might have to be left (not ideal) or some other option will need to be found to deal with these. This could be difficult/complicated... the good news is it is just 94 samples.
Who are the primary contacts for this story
Jonnie B
Alan K
Acceptance criteria
The text was updated successfully, but these errors were encountered: