-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Y24 -473 generalize multiple source identifier filtering logic #1508
base: develop
Are you sure you want to change the base?
Changes from 5 commits
3fee40c
cd097c5
3f6707d
515c681
b53bebf
aed0dc4
7c2ee5b
f689072
af7092d
891304d
be17370
eb7ce11
d2130ae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# frozen_string_literal: true | ||
|
||
# SourceIdentifierFilterable | ||
# | ||
# This module provides functionality for filtering records based on source identifiers. | ||
# A source identifier can be a plate barcode, tube barcode, or a combination of plate barcode | ||
# and well position. | ||
# Models that include this module will have methods to filter records using these identifiers. | ||
# | ||
# ## Methods: | ||
# | ||
# * apply_source_identifier_filter(records, value, plate_join: :plate, | ||
# tube_join: :tube, well_join: :well) | ||
# - Filters the given records based on the provided source identifiers. | ||
# - Parameters: | ||
# - records: The ActiveRecord relation to filter. | ||
# - value: An array of source identifiers to filter by. | ||
# - plate_join: The association name for joining with the plate table (default: :plate). | ||
# - tube_join: The association name for joining with the tube table (default: :tube). | ||
# - well_join: The association name for joining with the well table (default: :well). | ||
# | ||
# ## Example Usage: | ||
# | ||
# class MyModel < ApplicationRecord | ||
# include SourceIdentifierFilterable | ||
# end | ||
# | ||
# records = MyModel.all | ||
# source_identifiers = ['PLATE123', 'TUBE456', 'PLATE789:A1'] | ||
# filtered_records = MyModel.apply_source_identifier_filter(records, source_identifiers) | ||
# | ||
# # Custom join associations | ||
# filtered_records = MyModel.apply_source_identifier_filter(records, source_identifiers, | ||
# plate_join: :source_plate, | ||
# tube_join: :source_tube) | ||
# | ||
module SourceIdentifierFilterable | ||
extend ActiveSupport::Concern | ||
|
||
class_methods do # rubocop:disable Metrics/BlockLength | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure it is the right idea to ignore this. This is a great piece of work but if you split it into functions it will be a lot more readable. Happy to go through it with you. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I will see I can break it down further. 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suspect Rubocop would have some ideas too but because of the Metrics/BlockLength it won't make any more suggestions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I’ve broken the code into functions and modularized it using module functions. If these are changed to private class methods, RuboCop still raises concerns about the module size. Alternatively, the other option to wrap these methods into another module and include it within this concern will allow to remove 'block length' warning . I’m not entirely convinced about the module size limits imposed by RuboCop or the need to artificially modify the code just to satisfy it. While there is definitely value in splitting functions to improve readability and modularity, the imposed limit and the way RuboCop evaluates the structure feels bit unjustified. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand your frustration and Rubocop can be dogmatic sometimes but looking at the result I find it far easier to understand what is going on. It is a question of maintainability. I think if we ignore Rubocop when we disagree with it we just end up with lots of stuff disabled which can defeat the purpose of it. I really struggle with long methods because I get overwhelmed and need to break it down. I would now feel far more confident modifying it because I know what is going on. The method names help me to understand. I hope that adds some context? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another thing this does for me is it helps me to identify places where we can make stuff more flexible. For example in this filter you have used plate, tube and well distinctions. I see this a lot in our code which leads me to believe that we need to do some work around labware. Having a big method can sometimes obfuscate this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, absolutely. I fully agree with your point, and breaking down the structures absolutely made more sense. As you said, it is more readable, modular, and clear now. Though it was mostly trial and error, I learned more through this process. Thanks for your suggestions 👍
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are correct. It would have been better to have private in-class methods. Looking at what you have done I think there is a way we could reduce the size significantly by changing the structure. For example if we turned joins into a hash we could probably use map or reduce and build the filter in one go using Arel. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not entirely sure how to do it, may be we can do by pairing on it at sometime later after finishing y24-149, if that's ok. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. I was going to have a crack but not quite sure either so pairing is a good idea. |
||
def apply_source_identifier_filter(records, value, # rubocop:disable Metrics/MethodLength,Metrics/PerceivedComplexity | ||
plate_join: :plate, tube_join: :tube, well_join: :well) | ||
rec_ids = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. personally I would use full names as there is no discernable difference from a syntactic view e.g record_ids, filtered_records. It removes ambiguity |
||
value.each do |val| | ||
# Check if the source identifier contains a colon | ||
if val.include?(':') | ||
# Split the source identifier into plate and well | ||
plate, well = val.split(':') | ||
# Filter records based on plate and well | ||
if plate.present? | ||
filtered_recs = records.joins(plate_join).where(plate_join => { barcode: plate }) | ||
if well.present? | ||
filtered_recs = filtered_recs.joins(well_join).where(well_join => { position: well }) | ||
end | ||
else | ||
Rails.logger.warn("Malformed source identifier: '#{val}'. Plate part is missing.") | ||
next | ||
end | ||
else | ||
filtered_recs = records.joins(plate_join).where(plate_join => { barcode: val }) | ||
if filtered_recs.empty? | ||
filtered_recs = records.joins(tube_join).where(tube_join => { barcode: val }) | ||
end | ||
end | ||
# Add the filtered record ids to the list | ||
rec_ids.concat(filtered_recs.pluck(:id)) | ||
rescue StandardError => e | ||
Rails.logger.warn("Invalid source identifier: #{val}, error: #{e.message}") | ||
end | ||
records.where(id: rec_ids) | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'rails_helper' | ||
|
||
RSpec.describe SourceIdentifierFilterable do | ||
let(:dummy_class) do | ||
Class.new do | ||
include SourceIdentifierFilterable | ||
|
||
def apply_source_identifier_filter(records, value, plate_join: :plate, tube_join: :tube, well_join: :well) | ||
self.class.apply_source_identifier_filter(records, value, plate_join: plate_join, tube_join: tube_join, well_join: well_join) | ||
end | ||
end | ||
end | ||
|
||
let(:dummy_instance) { dummy_class.new } | ||
|
||
describe '#apply_source_identifier_filter' do | ||
context 'when the source_identifier belongs to a plate' do | ||
it 'returns the correct records' do | ||
pacbio_plate = create(:plate_with_wells_and_requests, pipeline: 'pacbio') | ||
pacbio_plate_requests = pacbio_plate.wells.flat_map(&:pacbio_requests) | ||
records = Pacbio::Request.all | ||
|
||
filtered_records = dummy_instance.apply_source_identifier_filter(records, [pacbio_plate.barcode]) | ||
|
||
expect(filtered_records.count).to eq(pacbio_plate_requests.count) | ||
expect(filtered_records.map(&:id)).to match_array(pacbio_plate_requests.map(&:id)) | ||
end | ||
end | ||
|
||
context 'when the source_identifier belongs to a tube' do | ||
it 'returns the correct records' do | ||
pacbio_tube = create(:tube_with_pacbio_request) | ||
records = Pacbio::Request.all | ||
|
||
filtered_records = dummy_instance.apply_source_identifier_filter(records, [pacbio_tube.barcode]) | ||
|
||
expect(filtered_records.count).to eq(pacbio_tube.pacbio_requests.count) | ||
expect(filtered_records.map(&:id)).to match_array(pacbio_tube.pacbio_requests.map(&:id)) | ||
end | ||
end | ||
|
||
context 'when the source_identifier belongs to a plate and well separated by colon' do | ||
it 'returns the correct records' do | ||
pacbio_plate = create(:plate_with_wells_and_requests, pipeline: 'pacbio') | ||
pacbio_plate_requests = pacbio_plate.wells.first.pacbio_requests | ||
records = Pacbio::Request.all | ||
|
||
filtered_records = dummy_instance.apply_source_identifier_filter(records, ["#{pacbio_plate.barcode}:#{pacbio_plate.wells.first.position}"]) | ||
|
||
expect(filtered_records.count).to eq(pacbio_plate_requests.count) | ||
expect(filtered_records.map(&:id)).to match_array(pacbio_plate_requests.map(&:id)) | ||
end | ||
end | ||
|
||
context 'when the source_identifier contains multiple tubes, plates and invalid values' do | ||
it 'returns the correct records and logs warnings for invalid values' do | ||
pacbio_tube1 = create(:tube_with_pacbio_request) | ||
pacbio_tube2 = create(:tube_with_pacbio_request) | ||
pacbio_plate1 = create(:plate_with_wells_and_requests, pipeline: 'pacbio') | ||
pacbio_plate2 = create(:plate_with_wells_and_requests, pipeline: 'pacbio') | ||
pacbio_plate1_requests = pacbio_plate1.wells.first.pacbio_requests | ||
pacbio_plate2_requests = pacbio_plate2.wells.flat_map(&:pacbio_requests) | ||
|
||
source_identifiers = [ | ||
pacbio_tube1.barcode, | ||
pacbio_tube2.barcode, | ||
"#{pacbio_plate1.barcode}:#{pacbio_plate1.wells.first.position}", | ||
pacbio_plate2.barcode, | ||
'INVALID_IDENTIFIER' | ||
] | ||
|
||
records = Pacbio::Request.all | ||
|
||
filtered_records = dummy_instance.apply_source_identifier_filter(records, source_identifiers) | ||
|
||
total_requests = pacbio_tube1.pacbio_requests.length + | ||
pacbio_tube2.pacbio_requests.length + | ||
pacbio_plate1_requests.length + | ||
pacbio_plate2_requests.length | ||
|
||
expect(filtered_records.count).to eq(total_requests) | ||
expect(filtered_records.map(&:id)).to match_array( | ||
pacbio_tube1.pacbio_requests.map(&:id) + | ||
pacbio_tube2.pacbio_requests.map(&:id) + | ||
pacbio_plate1_requests.map(&:id) + | ||
pacbio_plate2_requests.map(&:id) | ||
) | ||
end | ||
end | ||
|
||
context 'when the source_identifier contains malformed strings' do | ||
it 'logs warnings for malformed strings' do | ||
source_identifiers = [':test'] | ||
records = Pacbio::Request.all | ||
|
||
expect(Rails.logger).to receive(:warn).with("Malformed source identifier: ':test'. Plate part is missing.").at_least(:once) | ||
|
||
filtered_records = dummy_instance.apply_source_identifier_filter(records, source_identifiers) | ||
|
||
expect(filtered_records.count).to eq(0) | ||
end | ||
end | ||
|
||
context 'when the source_identifer contains colon which includes plate barcode without well' do | ||
it 'returns the correct records for plate' do | ||
pacbio_plate1 = create(:plate_with_wells_and_requests, pipeline: 'pacbio') | ||
pacbio_plate_requests = pacbio_plate1.wells.flat_map(&:pacbio_requests) | ||
source_identifiers = ["#{pacbio_plate1.barcode}:"] | ||
records = Pacbio::Request.all | ||
filtered_records = dummy_instance.apply_source_identifier_filter(records, source_identifiers) | ||
expect(filtered_records.count).to eq(pacbio_plate_requests.count) | ||
end | ||
end | ||
|
||
context 'when using custom join conditions' do | ||
it 'returns the correct records for source_plate and source_tube' do | ||
pacbio_tube = create(:tube_with_pacbio_request) | ||
create(:pacbio_library, request: pacbio_tube.pacbio_requests[0]) | ||
pacbio_plate = create(:plate_with_wells_and_requests, pipeline: 'pacbio') | ||
pacbio_plate.wells.first.pacbio_requests[0] | ||
create(:pacbio_library, request: pacbio_plate.wells.first.pacbio_requests[0]) | ||
source_identifiers = [ | ||
pacbio_tube.barcode, | ||
"#{pacbio_plate.barcode}:#{pacbio_plate.wells.first.position}" | ||
] | ||
records = Pacbio::Library.all | ||
filtered_records = dummy_instance.apply_source_identifier_filter(records, source_identifiers, plate_join: :source_plate, | ||
tube_join: :source_tube, | ||
well_join: :source_well) | ||
|
||
expect(filtered_records.count).to eq(records.count) | ||
expect(filtered_records.map(&:id)).to match_array(records.map(&:id)) | ||
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in shared, resources/v1/shared instead of as a model concern, similar to run_suitability?