Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates from production, Support for new human-filtering impl #141

Merged
merged 15 commits into from
Jun 13, 2024

Conversation

charles-cowart
Copy link
Collaborator

@charles-cowart charles-cowart commented Jun 11, 2024

Updates from production.
Support for new MOVI-based human-filtering implementation.
Tests updated to support recent changes.

Updates from production.
Support for new MOVI-based human-filtering implementation.
@coveralls
Copy link

coveralls commented Jun 12, 2024

Pull Request Test Coverage Report for Build 9474339226

Details

  • 5 of 8 (62.5%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-1.6%) to 81.079%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/tests/test_QCJob.py 2 3 66.67%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Files with Coverage Reduction New Missed Lines %
sequence_processing_pipeline/tests/test_QCJob.py 2 90.11%
Totals Coverage Status
Change from base Build 8445280085: -1.6%
Covered Lines: 2417
Relevant Lines: 2761

💛 - Coveralls

Copy link
Contributor

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @charles-cowart; looking great, just a few comments.

sequence_processing_pipeline/NuQCJob.py Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
Removed legacy QCJob from the current source tree as it's no longer
needed.
Updates for NuQCJob including the use of MMI files based on machine and
assay-type based configuration.
@coveralls
Copy link

coveralls commented Jun 12, 2024

Pull Request Test Coverage Report for Build 9490067895

Details

  • 29 of 33 (87.88%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.349%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 24 26 92.31%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2035
Relevant Lines: 2323

💛 - Coveralls

@charles-cowart charles-cowart changed the title WIP: Updates from production, Support for new human-filtering impl Updates from production, Support for new human-filtering impl Jun 12, 2024
@charles-cowart
Copy link
Collaborator Author

charles-cowart commented Jun 12, 2024

@antgonza ready for review! It's missing a test for the new method that creates the minimap2/samtools commands, but I thought it best for you confirm that this is what you were thinking of first. I can confirm the method does work correctly in my stub.

Copy link
Contributor

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, thank you.

sequence_processing_pipeline/NuQCJob.py Outdated Show resolved Hide resolved
sequence_processing_pipeline/NuQCJob.py Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9505416011

Details

  • 33 of 37 (89.19%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.367%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2038
Relevant Lines: 2326

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9505701263

Details

  • 39 of 43 (90.7%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

sequence_processing_pipeline/NuQCJob.py Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
cat ${r1_filt} | \
gzip -c > ${r1_adapter_only} &
cat ${r2_filt} | \
gzip -c > ${r2_adapter_only} &
wait

rm ${r1_filt} &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rm ${r1_filt} &

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antgonza this appears related to the comment above.

sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
sequence_processing_pipeline/templates/nuqc_job.sh Outdated Show resolved Hide resolved
@wasade
Copy link
Member

wasade commented Jun 13, 2024

For a reference for how this can be done with mxdx, which I'm testing out-of-band for something else, but which does work on test data. Adapting here would assume fastp is moved into an upstream isolated step.

The change to the filter_pmls script is minor, just adding the named options and only writing out the non-human data.

This does not currently split/re-pair. I most likely will adopt seqkit pair which importantly will write gzip and doesn't require writing unpaired. seqkit's --id-regexp might allow for the split, and remove the need for splitter, but I haven't tried it just yet.

function mx () {                                                             
    mxdx mux \                                                                  
         --file-map ${FILEMAP} \                                                
         --batch ${SLURM_ARRAY_TASK_ID} \                                       
         --batch-size ${BATCHSIZE}                                              
}                                                                               
                                                                                
function dx () {                                                             
    mxdx demux  \                                                               
         --file-map ${FILEMAP} \                                                
         --batch ${SLURM_ARRAY_TASK_ID} \                                       
         --batch-size ${BATCHSIZE} \                                            
         --output-base ${OUTPUT} \                                              
         --extension ${EXT}                                                     
}                                                                               
set -x                                                                          
set -e                                                                          
set -o pipefail                                                                 
                                                                                
${movi} query --stdout -i ${DBIDX} -r <(mx) > ${mpmlbin}                     
python filter_pmls_click.py \                                                   
    --pml-txt ${mpmlbin} \                                                      
    --fastq-data <(mx) | dx

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9506270527

Details

  • 39 of 44 (88.64%) changed or added relevant lines in 5 files are covered.
  • 5 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
sequence_processing_pipeline/NuQCJob.py 26 29 89.66%
Files with Coverage Reduction New Missed Lines %
sequence_processing_pipeline/NuQCJob.py 5 62.42%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9506297323

Details

  • 39 of 43 (90.7%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9506329160

Details

  • 39 of 43 (90.7%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9506334894

Details

  • 39 of 43 (90.7%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9506339410

Details

  • 39 of 43 (90.7%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9506380184

Details

  • 39 of 43 (90.7%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9506389996

Details

  • 39 of 43 (90.7%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

@coveralls
Copy link

coveralls commented Jun 13, 2024

Pull Request Test Coverage Report for Build 9506399088

Details

  • 39 of 43 (90.7%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.3%) to 81.402%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sequence_processing_pipeline/NuQCJob.py 26 28 92.86%
sequence_processing_pipeline/Pipeline.py 2 4 50.0%
Totals Coverage Status
Change from base Build 8445280085: -1.3%
Covered Lines: 2044
Relevant Lines: 2332

💛 - Coveralls

@wasade
Copy link
Member

wasade commented Jun 13, 2024

Quick follow up. If using seqkit for pairing, it's necessary to inform how PE reads are described:

$ seqkit pair -1 some.R1.fastq.gz -2 some.R2.fastq.gz -O foo --id-regexp '^(\S+)\/[12]'

On the upside, it accepts gzip'd files, will write gzip, and will only save unpaired if you really want it

@charles-cowart charles-cowart merged commit b9bdc00 into biocore:master Jun 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants