Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Y24 -473 generalize multiple source identifier filtering logic #1508

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions app/models/concerns/source_identifier_filterable.rb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in shared, resources/v1/shared instead of as a model concern, similar to run_suitability?

Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# frozen_string_literal: true

# SourceIdentifierFilterable
#
# This module provides functionality for filtering records based on source identifiers.
# A source identifier can be a plate barcode, tube barcode, or a combination of plate barcode
# and well position.
# Models that include this module will have methods to filter records using these identifiers.
#
# ## Methods:
#
# * apply_source_identifier_filter(records, value, plate_join: :plate,
# tube_join: :tube, well_join: :well)
# - Filters the given records based on the provided source identifiers.
# - Parameters:
# - records: The ActiveRecord relation to filter.
# - value: An array of source identifiers to filter by.
# - plate_join: The association name for joining with the plate table (default: :plate).
# - tube_join: The association name for joining with the tube table (default: :tube).
# - well_join: The association name for joining with the well table (default: :well).
#
# ## Example Usage:
#
# class MyModel < ApplicationRecord
# include SourceIdentifierFilterable
# end
#
# records = MyModel.all
# source_identifiers = ['PLATE123', 'TUBE456', 'PLATE789:A1']
# filtered_records = MyModel.apply_source_identifier_filter(records, source_identifiers)
#
# # Custom join associations
# filtered_records = MyModel.apply_source_identifier_filter(records, source_identifiers,
# plate_join: :source_plate,
# tube_join: :source_tube)
#
module SourceIdentifierFilterable
extend ActiveSupport::Concern

class_methods do # rubocop:disable Metrics/BlockLength
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it is the right idea to ignore this. This is a great piece of work but if you split it into functions it will be a lot more readable. Happy to go through it with you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will see I can break it down further. 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect Rubocop would have some ideas too but because of the Metrics/BlockLength it won't make any more suggestions.

Copy link
Contributor Author

@seenanair seenanair Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve broken the code into functions and modularized it using module functions. If these are changed to private class methods, RuboCop still raises concerns about the module size. Alternatively, the other option to wrap these methods into another module and include it within this concern will allow to remove 'block length' warning .

I’m not entirely convinced about the module size limits imposed by RuboCop or the need to artificially modify the code just to satisfy it. While there is definitely value in splitting functions to improve readability and modularity, the imposed limit and the way RuboCop evaluates the structure feels bit unjustified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your frustration and Rubocop can be dogmatic sometimes but looking at the result I find it far easier to understand what is going on. It is a question of maintainability.

I think if we ignore Rubocop when we disagree with it we just end up with lots of stuff disabled which can defeat the purpose of it. I really struggle with long methods because I get overwhelmed and need to break it down. I would now feel far more confident modifying it because I know what is going on. The method names help me to understand. I hope that adds some context?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing this does for me is it helps me to identify places where we can make stuff more flexible. For example in this filter you have used plate, tube and well distinctions. I see this a lot in our code which leads me to believe that we need to do some work around labware. Having a big method can sometimes obfuscate this.

Copy link
Contributor Author

@seenanair seenanair Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely. I fully agree with your point, and breaking down the structures absolutely made more sense. As you said, it is more readable, modular, and clear now.
What I was referring to was the structure it seems to be approving of. It would have been equally good, or even better, to have the new methods added as private in-class methods, but RuboCop is not happy with that because of the size. This is the only point I am not convinced about.

Though it was mostly trial and error, I learned more through this process. Thanks for your suggestions 👍

I understand your frustration and Rubocop can be dogmatic sometimes but looking at the result I find it far easier to understand what is going on. It is a question of maintainability.

I think if we ignore Rubocop when we disagree with it we just end up with lots of stuff disabled which can defeat the purpose of it. I really struggle with long methods because I get overwhelmed and need to break it down. I would now feel far more confident modifying it because I know what is going on. The method names help me to understand. I hope that adds some context?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. It would have been better to have private in-class methods. Looking at what you have done I think there is a way we could reduce the size significantly by changing the structure. For example if we turned joins into a hash we could probably use map or reduce and build the filter in one go using Arel.

Copy link
Contributor Author

@seenanair seenanair Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure how to do it, may be we can do by pairing on it at sometime later after finishing y24-149, if that's ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was going to have a crack but not quite sure either so pairing is a good idea.

def apply_source_identifier_filter(records, value, # rubocop:disable Metrics/MethodLength,Metrics/PerceivedComplexity
plate_join: :plate, tube_join: :tube, well_join: :well)
rec_ids = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I would use full names as there is no discernable difference from a syntactic view e.g record_ids, filtered_records. It removes ambiguity

value.each do |val|
# Check if the source identifier contains a colon
if val.include?(':')
# Split the source identifier into plate and well
plate, well = val.split(':')
# Filter records based on plate and well
if plate.present?
filtered_recs = records.joins(plate_join).where(plate_join => { barcode: plate })
if well.present?
filtered_recs = filtered_recs.joins(well_join).where(well_join => { position: well })
end
else
Rails.logger.warn("Malformed source identifier: '#{val}'. Plate part is missing.")
next
end
else
filtered_recs = records.joins(plate_join).where(plate_join => { barcode: val })
if filtered_recs.empty?
filtered_recs = records.joins(tube_join).where(tube_join => { barcode: val })
end
end
# Add the filtered record ids to the list
rec_ids.concat(filtered_recs.pluck(:id))
rescue StandardError => e
Rails.logger.warn("Invalid source identifier: #{val}, error: #{e.message}")

Check warning on line 68 in app/models/concerns/source_identifier_filterable.rb

View check run for this annotation

Codecov / codecov/patch

app/models/concerns/source_identifier_filterable.rb#L68

Added line #L68 was not covered by tests
end
records.where(id: rec_ids)
end
end
end
53 changes: 36 additions & 17 deletions app/resources/v1/pacbio/library_resource.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,46 @@

module V1
module Pacbio
# @todo This documentation does not yet include a detailed description of what this resource represents.
# @todo This documentation does not yet include detailed descriptions for relationships, attributes and filters.
# @todo This documentation does not yet include any example usage of the API via cURL or similar.
#
# @note Access this resource via the `/v1/pacbio/libraries/` endpoint.
#
# Provides a JSON:API representation of {Pacbio::Library}.
#
# For more information about JSON:API see the [JSON:API Specifications](https://jsonapi.org/format/)
# or look at the [JSONAPI::Resources](http://jsonapi-resources.com/) package
# for the service implementation of the JSON:API standard.
# This resource represents a Pacbio Library and can return all libraries, a single library or
# multiple libraries along with their relationships.
#
# ## Filters:
#
# * sample_name
# * barcode
# * source_identifier
#
# ## Primary relationships:
#
# * request {V1::Pacbio::RequestResource}
# * tube {V1::Pacbio::TubeResource}
# * pool {V1::Pacbio::PoolResource}
#
# ## Relationship trees:
#
# * request.sample
# * tube.requests
# * pool.libraries
#
# @example
# curl -X GET http://localhost:3000/v1/pacbio/libraries/1
# curl -X GET http://localhost:3000/v1/pacbio/libraries/
# curl -X GET http://localhost:3000/v1/pacbio/libraries/1?include=request,tube,pool
#
# https://localhost:3000/v1/pacbio/libraries?filter[sample_name]=sample_name
# https://localhost:3000/v1/pacbio/libraries?filter[barcode]=TRAC-2-12068
#
# https://localhost:3000/v1/pacbio/libraries?filter[barcode]=TRAC-2-12068,TRAC-2-12066,TRAC-2-12067
#
# https://localhost:3000/v1/pacbio/libraries?filter[barcode]=TRAC-2-12068,TRAC-2-12066,TRAC-2-12067&include=request.sample,tube.requests,pool.libraries
class LibraryResource < JSONAPI::Resource
include Shared::RunSuitability
include SourceIdentifierFilterable

model_name 'Pacbio::Library'

Expand Down Expand Up @@ -90,17 +117,9 @@ def self.default_sort
records.joins(:tube).where(tubes: { barcode: value })
}
filter :source_identifier, apply: lambda { |records, value, _options|
# First we check tubes to see if there are any given the source identifier
recs = records.joins(:source_tube).where(source_tube: { barcode: value })
return recs unless recs.empty?

# If no tubes match the source identifier we check plates
# If source identifier specifies a well we need to match samples to well
# TODO: The below value[0] means we only take the first value passed in the filter
# If we want to support multiple values in one filter we would need to update this
plate, well = value[0].split(':')
recs = records.joins(:source_plate).where(source_plate: { barcode: plate })
well ? recs.joins(:source_well).where(source_well: { position: well }) : recs
apply_source_identifier_filter(records, value, plate_join: :source_plate,
tube_join: :source_tube,
well_join: :source_well)
}

def self.records_for_populate(*_args)
Expand Down
37 changes: 3 additions & 34 deletions app/resources/v1/pacbio/request_resource.rb
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ module Pacbio
#
# https://localhost:3000/v1/pacbio/requests?filter[source_identifier]=TRAC-2-12068,TRAC-2-12066,TRAC-2-12067&include=well.plate,plate.wells,tube.requests
class RequestResource < JSONAPI::Resource
include SourceIdentifierFilterable

model_name 'Pacbio::Request', add_model_hint: false

# @!attribute [rw] library_type
Expand Down Expand Up @@ -89,40 +91,7 @@ class RequestResource < JSONAPI::Resource
}

filter :source_identifier, apply: lambda { |records, value, _options|
# Initialize an empty result set
rec_ids = []

# Iterate over each value in the filter
value.each do |val|
if val.include?(':')
# If the value contains a colon, it's a plate and well identifier
plate, well = val.split(':')
if plate.present?
filtered_recs = records.joins(:plate).where(plate: { barcode: plate })
if well.present?
filtered_recs = filtered_recs.joins(:well).where(well: { position: well })
end
else
Rails.logger.warn("Malformed source identifier: '#{val}'. Plate part is missing.")
next
end
else
# If the value does not contain a colon, it's a tube or plate identifier
filtered_recs = records.joins(:plate).where(plate: { barcode: val })
# If no records are found by plate, try to find by tube
if filtered_recs.empty?
filtered_recs = records.joins(:tube).where(tube: { barcode: val })
end
end
# Collect the IDs of the filtered records
rec_ids.concat(filtered_recs.pluck(:id))
rescue StandardError => e
# Log the error and continue with the next value
Rails.logger.warn("Invalid source identifier: #{val}, error: #{e.message}")
end
# Perform a final query to fetch the records by their IDs
combined_recs = records.where(id: rec_ids)
combined_recs
apply_source_identifier_filter(records, value)
}

def self.default_sort
Expand Down
138 changes: 138 additions & 0 deletions spec/models/concerns/source_identifier_filterable_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# frozen_string_literal: true

require 'rails_helper'

RSpec.describe SourceIdentifierFilterable do
let(:dummy_class) do
Class.new do
include SourceIdentifierFilterable

def apply_source_identifier_filter(records, value, plate_join: :plate, tube_join: :tube, well_join: :well)
self.class.apply_source_identifier_filter(records, value, plate_join: plate_join, tube_join: tube_join, well_join: well_join)
end
end
end

let(:dummy_instance) { dummy_class.new }

describe '#apply_source_identifier_filter' do
context 'when the source_identifier belongs to a plate' do
it 'returns the correct records' do
pacbio_plate = create(:plate_with_wells_and_requests, pipeline: 'pacbio')
pacbio_plate_requests = pacbio_plate.wells.flat_map(&:pacbio_requests)
records = Pacbio::Request.all

filtered_records = dummy_instance.apply_source_identifier_filter(records, [pacbio_plate.barcode])

expect(filtered_records.count).to eq(pacbio_plate_requests.count)
expect(filtered_records.map(&:id)).to match_array(pacbio_plate_requests.map(&:id))
end
end

context 'when the source_identifier belongs to a tube' do
it 'returns the correct records' do
pacbio_tube = create(:tube_with_pacbio_request)
records = Pacbio::Request.all

filtered_records = dummy_instance.apply_source_identifier_filter(records, [pacbio_tube.barcode])

expect(filtered_records.count).to eq(pacbio_tube.pacbio_requests.count)
expect(filtered_records.map(&:id)).to match_array(pacbio_tube.pacbio_requests.map(&:id))
end
end

context 'when the source_identifier belongs to a plate and well separated by colon' do
it 'returns the correct records' do
pacbio_plate = create(:plate_with_wells_and_requests, pipeline: 'pacbio')
pacbio_plate_requests = pacbio_plate.wells.first.pacbio_requests
records = Pacbio::Request.all

filtered_records = dummy_instance.apply_source_identifier_filter(records, ["#{pacbio_plate.barcode}:#{pacbio_plate.wells.first.position}"])

expect(filtered_records.count).to eq(pacbio_plate_requests.count)
expect(filtered_records.map(&:id)).to match_array(pacbio_plate_requests.map(&:id))
end
end

context 'when the source_identifier contains multiple tubes, plates and invalid values' do
it 'returns the correct records and logs warnings for invalid values' do
pacbio_tube1 = create(:tube_with_pacbio_request)
pacbio_tube2 = create(:tube_with_pacbio_request)
pacbio_plate1 = create(:plate_with_wells_and_requests, pipeline: 'pacbio')
pacbio_plate2 = create(:plate_with_wells_and_requests, pipeline: 'pacbio')
pacbio_plate1_requests = pacbio_plate1.wells.first.pacbio_requests
pacbio_plate2_requests = pacbio_plate2.wells.flat_map(&:pacbio_requests)

source_identifiers = [
pacbio_tube1.barcode,
pacbio_tube2.barcode,
"#{pacbio_plate1.barcode}:#{pacbio_plate1.wells.first.position}",
pacbio_plate2.barcode,
'INVALID_IDENTIFIER'
]

records = Pacbio::Request.all

filtered_records = dummy_instance.apply_source_identifier_filter(records, source_identifiers)

total_requests = pacbio_tube1.pacbio_requests.length +
pacbio_tube2.pacbio_requests.length +
pacbio_plate1_requests.length +
pacbio_plate2_requests.length

expect(filtered_records.count).to eq(total_requests)
expect(filtered_records.map(&:id)).to match_array(
pacbio_tube1.pacbio_requests.map(&:id) +
pacbio_tube2.pacbio_requests.map(&:id) +
pacbio_plate1_requests.map(&:id) +
pacbio_plate2_requests.map(&:id)
)
end
end

context 'when the source_identifier contains malformed strings' do
it 'logs warnings for malformed strings' do
source_identifiers = [':test']
records = Pacbio::Request.all

expect(Rails.logger).to receive(:warn).with("Malformed source identifier: ':test'. Plate part is missing.").at_least(:once)

filtered_records = dummy_instance.apply_source_identifier_filter(records, source_identifiers)

expect(filtered_records.count).to eq(0)
end
end

context 'when the source_identifer contains colon which includes plate barcode without well' do
it 'returns the correct records for plate' do
pacbio_plate1 = create(:plate_with_wells_and_requests, pipeline: 'pacbio')
pacbio_plate_requests = pacbio_plate1.wells.flat_map(&:pacbio_requests)
source_identifiers = ["#{pacbio_plate1.barcode}:"]
records = Pacbio::Request.all
filtered_records = dummy_instance.apply_source_identifier_filter(records, source_identifiers)
expect(filtered_records.count).to eq(pacbio_plate_requests.count)
end
end

context 'when using custom join conditions' do
it 'returns the correct records for source_plate and source_tube' do
pacbio_tube = create(:tube_with_pacbio_request)
create(:pacbio_library, request: pacbio_tube.pacbio_requests[0])
pacbio_plate = create(:plate_with_wells_and_requests, pipeline: 'pacbio')
pacbio_plate.wells.first.pacbio_requests[0]
create(:pacbio_library, request: pacbio_plate.wells.first.pacbio_requests[0])
source_identifiers = [
pacbio_tube.barcode,
"#{pacbio_plate.barcode}:#{pacbio_plate.wells.first.position}"
]
records = Pacbio::Library.all
filtered_records = dummy_instance.apply_source_identifier_filter(records, source_identifiers, plate_join: :source_plate,
tube_join: :source_tube,
well_join: :source_well)

expect(filtered_records.count).to eq(records.count)
expect(filtered_records.map(&:id)).to match_array(records.map(&:id))
end
end
end
end
31 changes: 31 additions & 0 deletions spec/requests/v1/pacbio/libraries_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,37 @@
'created_at' => pacbio_library.created_at.to_fs(:us)
)
end

it 'when the source_identifier contains multiple values' do
pacbio_tube = create(:tube_with_pacbio_request)
pacbio_library1 = create(:pacbio_library, request: pacbio_tube.pacbio_requests[0])

pacbio_plate = create(:plate_with_wells_and_requests, pipeline: 'pacbio')
pacbio_requests = pacbio_plate.wells.flat_map(&:pacbio_requests)
pacbio_library2 = create(:pacbio_library, request: pacbio_requests[0])

source_identifiers = [
pacbio_tube.barcode,
"#{pacbio_plate.barcode}:#{pacbio_plate.wells.first.position}"
]

get "#{v1_pacbio_libraries_path}?filter[source_identifier]=#{source_identifiers.join(',')}",
headers: json_api_headers

expect(response).to have_http_status(:success)
expect(json['data'].length).to eq(2)
[pacbio_library1, pacbio_library2].each do |library|
library_attributes = find_resource(type: 'libraries', id: library.id)['attributes']
expect(library_attributes).to include(
'concentration' => library.concentration,
'volume' => library.volume,
'template_prep_kit_box_barcode' => library.template_prep_kit_box_barcode,
'insert_size' => library.insert_size,
'state' => library.state,
'created_at' => library.created_at.to_fs(:us)
)
end
end
end

context 'filters - barcode' do
Expand Down
Loading