Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPL-897 Housekeeping - leading and trailing space for the plate_barcode, rna_id [Hinxton #713427] [Heron] #267

Open
rl15 opened this issue Feb 25, 2021 · 6 comments
Assignees

Comments

@rl15
Copy link

rl15 commented Feb 25, 2021

Describe the Housekeeping
Step 1: Find records if in Lighthouse_sample table & LabWhere
select distinct plate_barcode from lighthouse_sample where plate_barcode like '% %';
25 records
RNA008583
AP-rna-00115445
RNA020797
AP-rna-00026995
GLA008074R
GLA008028R
002127987
GLA010285R
AP-rna-00128103
GLA019321R
GLA019716R
GLA019715R
GLA019659R
GLA019717R
GLA019650R
GLA019651R
AP-rna-00142151
AP-rna-00145460
AP-rna-00146061
AP-rna-00130886
AP-rna-00148159
RNA427769
AP-rna-00152533-rna-00152877
AP-rna-00190145
RNA459749

Blocking issues
Describe any other issues or tickets that may be blocking this change.

Additional context
Ian W wrote (Thursday, 25 February 2021 at 08:18)

We've noticed that some records in lighthouse_sample have a leading space for the plate_barcode, rna_id

Sometimes this leading space also occurs in LabWhere and sometimes it doesn't

Please can you fix this

Thanks,

Ian

As an example:

mysql> select root_sample_id, plate_barcode FROM lighthouse_sample WHERE rna_id = 'AP-rna-00145460_A05';
Empty set (11.98 sec)

mysql> select root_sample_id, plate_barcode FROM lighthouse_sample WHERE rna_id = ' AP-rna-00145460_A05';
+----------------+------------------+
| root_sample_id | plate_barcode |
+----------------+------------------+
| AAD02265461 | AP-rna-00145460 |
+----------------+------------------+
1 row in set (11.59 sec)

mysql> SELECT labware.friendly_name AS labware_name , DATE(e.occured_at) AS updated_at , location.friendly_name AS location_barcode , mt.value AS location_parentage FROM mlwh_events.events e JOIN mlwh_events.metadata mt ON mt.event_id=e.id JOIN mlwh_events.roles labware_r ON e.id = labware_r.event_id JOIN mlwh_events.subjects labware ON labware.id = labware_r.subject_id JOIN mlwh_events.role_types labware_rt ON labware_r.role_type_id = labware_rt.id JOIN mlwh_events.roles location_r ON e.id = location_r.event_id JOIN mlwh_events.subjects location ON location.id = location_r.subject_id JOIN mlwh_events.role_types location_rt ON location_r.role_type_id = location_rt.id JOIN ( SELECT labware.friendly_name AS plate , MAX(e.occured_at) AS occured_at FROM mlwh_events.events e JOIN mlwh_events.roles labware_r ON e.id = labware_r.event_id JOIN mlwh_events.subjects labware ON labware.id = labware_r.subject_id JOIN mlwh_events.role_types labware_rt ON labware_r.role_type_id = labware_rt.id JOIN mlwh_events.roles location_r ON e.id = location_r.event_id JOIN mlwh_events.subjects location ON location.id = location_r.subject_id JOIN mlwh_events.role_types location_rt ON location_r.role_type_id = location_rt.id WHERE e.lims_id='LABWHERE' AND labware_rt.key = 'labware' AND location_rt.key = 'location' GROUP BY plate ) AS last_plate_event ON labware.friendly_name = last_plate_event.plate AND e.occured_at = last_plate_event.occured_at WHERE e.lims_id='LABWHERE' AND labware_rt.key = 'labware' AND location_rt.key = 'location' AND mt.key = "location_info" AND labware.friendly_name IN ('AP-rna-00115445', 'AP-rna-00145460', ' GLA019659R');
+-----------------+------------+-----------------------------------+---------------------------------------------------------------------+
| labware_name | updated_at | location_barcode | location_parentage |
+-----------------+------------+-----------------------------------+---------------------------------------------------------------------+
| GLA019659R | 2020-09-16 | lw-uk-biocentre-box-gsw-171-17143 | Sanger / Ogilvie / Heron Boxes Destroyed - UK Biocentre box GSW 171 |
| AP-rna-00115445 | 2020-07-23 | lw-ap-replacement-3-16358 | Sanger / Ogilvie / Heron Boxes Destroyed - AP replacement 3 |
| AP-rna-00145460 | 2020-09-30 | lw-uk-biocentre-box-ap-318-17255 | Sanger / Ogilvie / Heron Boxes Destroyed - UK Biocentre box AP 318 |
+-----------------+------------+-----------------------------------+---------------------------------------------------------------------+

@rl15 rl15 changed the title GPL-897 Housekeeping - leading space for the plate_barcode, rna_id [Hinxton #713427] GPL-897 Housekeeping - leading space for the plate_barcode, rna_id [Hinxton #713427] [Heron] Mar 1, 2021
@emrojo emrojo self-assigned this Mar 24, 2021
@KatyTaylor KatyTaylor changed the title GPL-897 Housekeeping - leading space for the plate_barcode, rna_id [Hinxton #713427] [Heron] GPL-897 Housekeeping - leading and trailing space for the plate_barcode, rna_id [Hinxton #713427] [Heron] Mar 24, 2021
@KatyTaylor
Copy link
Contributor

Also deal with trailing whitespace as per #276

@emrojo
Copy link
Contributor

emrojo commented Mar 24, 2021

EDIT:
My previous comparison using = didnt match as expected with exact comparison because = does padding trailing comparison which is a feature of mysql default collation: https://bugs.mysql.com/bug.php?id=64772

This is the corrected query for plate barcodes not re-inserted without whitespaces by another upload (25 rows):

SELECT barcodes_without_whitespaces.plate_barcode, barcodes_without_whitespaces.created_at 
FROM
(SELECT DISTINCT replace(barcodes_with_whitespaces.plate_barcode, ' ', '') AS plate_barcode, created_at FROM (
    SELECT DISTINCT plate_barcode, created_at FROM lighthouse_sample WHERE rna_id LIKE '% %'
  ) AS barcodes_with_whitespaces
) AS barcodes_without_whitespaces
LEFT OUTER JOIN lighthouse_sample ON lighthouse_sample.plate_barcode LIKE barcodes_without_whitespaces.plate_barcode 
WHERE lighthouse_sample.plate_barcode IS NULL
GROUP BY barcodes_without_whitespaces.plate_barcode;

What it means is none of the files have been re-uploaded with corrected barcode for these plates.

@emrojo
Copy link
Contributor

emrojo commented Mar 25, 2021

None of the plates were in dart.

  1. Plan to update mlwh:
UPDATE lighthouse_sample 
SET rna_id=trim(rna_id), plate_barcode=trim(plate_barcode) 
WHERE rna_id LIKE '% %' OR plate_barcode LIKE '% %';
  1. Plan to update mongodb:
db.samples.update(
  // Find all with plate barcodes and whitespaces, or rna id with whitespaces
  {"$or": [{"plate_barcode": /.* .*/},{"RNA ID": /.* .*/ }] },
  // Update removing the whitespaces
  [{
    "$set": 
      {
        "plate_barcode": {  "$trim": { "input": "$plate_barcode"} }, 
        "RNA ID": {"$trim": {"input": "$RNA ID"} }
      }
  }], 
  // Apply to all records not just the first one
  {
    "multi": true
  });

@emrojo
Copy link
Contributor

emrojo commented Mar 25, 2021

No plates were created with whitespaces after 15/Dec/2020, code was modified to fix that behaviour from 16/Dec as says the issue: #214

@harrietc52
Copy link
Contributor

harrietc52 commented Mar 29, 2021

Should Step 1 about have the below check on plate_barcode too?
eg

WHERE rna_id LIKE '% %' OR WHERE plate_barcode LIKE '% %';

And the same with Step 2

db.samples.update(
{
  "plate_barcode": /.* .*/
  OR 
  "rna_id": /.* .*/
}, ...)

@emrojo
Copy link
Contributor

emrojo commented Apr 7, 2021

All changes applied, many thanks for code review @harrietc52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants