Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating bisulfite pipeline to accept bams and fastqs #996

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

chrisamiller
Copy link
Collaborator

Also bumps the biscuit version - dependent on merging genome/docker-biscuit#4 first for the docker container

@chrisamiller
Copy link
Collaborator Author

docker image is live - this is ready for review

Copy link
Member

@tmooney tmooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good from a CWL perspective. I'm kind of wondering if there might be a way to avoid having the subworkflow adapter layer (both here and for bwa mem sequence alignment) along the lines of

- { valueFrom: "$(inputs.sequence.sequence.hasOwnProperty('bam')? inputs.sequence.sequence.bam : null)", prefix: 'bam', position: -1 }
- { valueFrom: "$(inputs.sequence.sequence.hasOwnProperty('fastq1')? inputs.sequence.sequence.fastq1 : null)", prefix: '--FASTQ' }
- { valueFrom: "$(inputs.sequence.sequence.hasOwnProperty('fastq2')? inputs.sequence.sequence.fastq2 : null)", prefix: '--FASTQ2' }
, but we could worry about that later.

/usr/bin/biscuit align -t "$NTHREADS" -M -R "$READGROUP" "$REFERENCE" "$FASTQ1" "$FASTQ2" | /usr/bin/sambamba view -S -f bam -l 0 /dev/stdin | /usr/bin/sambamba sort -t "$SORT_THREADS" -m 8G -o "$OUTDIR/aligned.bam" /dev/stdin
else
/opt/flexbar/flexbar --adapters "$TRIMMING_ADAPTERS" --reads "$FASTQ1" --reads2 "$FASTQ2" --adapter-trim-end LTAIL --adapter-min-overlap "$TRIMMING_ADAPTER_MIN_OVERLAP" --adapter-error-rate 0.1 --max-uncalled 300 --stdout-reads \
| /usr/bin/biscuit align -t "$NTHREADS" -M -R "$READGROUP" "$REFERENCE" /dev/stdin | /usr/bin/sambamba view -S -f bam -l 0 /dev/stdin | /usr/bin/sambamba sort -t "$SORT_THREADS" -m 8G -o "$OUTDIR/aligned.bam" /dev/stdin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the various uses of biscuit align after a pipe in this script need the -p option?

       -p            smart pairing (ignoring in2.fq)

as seen here, for example:

| /usr/local/bin/bwa mem -K 100000000 -t "$NTHREADS" -Y -p -R "$READGROUP" "$REFERENCE" /dev/stdin | /usr/local/bin/samblaster -a --addMateTags | /opt/samtools/bin/samtools view -b -S /dev/stdin

@chrisamiller chrisamiller marked this pull request as draft February 9, 2021 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants