`.sam` to amino acids – can I just translate ? #1556

gordonkoehn · 2024-12-02T08:34:18Z

gordonkoehn
Dec 2, 2024

Dear Nextclade Community,

I have nucleotide sequences I have to translate. So far, I have been running Nextclade via the CLI nextclade run ... to get the amino acid translations and alignments, e.g. nextclade.cds_translation.ORF1a.fasta.

In my workflow, my nucleotide sequences are already aligned in a .sam / .bams. Can I avoid realigning at the first step of Nextclade ? Currently, I've just converted by .sams to fastas and give them to nextcalde.

I understand Nextclade assumes the nucleotide sequences are not aligned and will align them as a first step. Can I tell Nextclade to skip this step and take the existing alignment info?

Looking forward to your responses,

Gordon

Answered by rneher

Dec 2, 2024

Hi Gordon,

I think it would be quite twisted to use nextclade to translate already aligned short reads.

A simple script could translate those already aligned reads. You'd need to figure which ORFs you read falls into, the reading frame, and then use something like Bio.Sequence.translate from the biopython package to translate the sequence.

richard

View full answer

ivan-aksamentov · 2024-12-02T09:23:36Z

ivan-aksamentov
Dec 2, 2024
Maintainer

Hi Gordon,

Can you tell us a little more about your use case? Why do you want to skip the alignment process? Just to save compute cycles or is there another reason?

In any case, this is not possible currently.

If we imagine how this could be implemented: note that Nextclade performs pairwise (reference) alignment and after alignment it also strips insertions, so that it can operate on the sequences in reference coordinates. This is a prerequisite for many of the underlying algorithms currently. So if you feed, let's say a Multiple Sequence Alignment (MSA) - this may or may not work.

0 replies

gordonkoehn · 2024-12-02T09:42:58Z

gordonkoehn
Dec 2, 2024
Author

Hi Ivan – pleased to find your reply so instantly.

I'm working on V-Pipe, which currently outputs large 100MB+ aligned nucleotide files / .bam.

What I need is to get the aligned amino acids files from my aligned nucleotides– so ideally, the output of nextclade.

I've been running nextclade on small test data, ignoring my alignments by first getting the reference and other files with:

nextclade dataset get sars-cov-2 ...

And then replacing the sequences.fasta with my .fasta I get with pysam from my .bam.

Then run

nextclade run ...

This works perfectly fine on small files, yet in my current setup, it takes quite a lot of memory/time to process whole files.

I just assumed this was due to the realignment, and I hoped I could circumvent this.

My .fastas
>AV233803:AV044:2411515907:1:11404:1260:0787|79 AAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGC....

Nextclade's .fastas:

>AV233803:AV044:2411515907:1:11404:1260:0787|79 ------------------------------------------------------------------------------AAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGC....

Disclaimer: I recently joined Bioinformatics as a software engineer, so I may overlook some biological trivia.

For context, we are trying to import short-read wastewater data into Loculus Database.

So greetings from next door,

Kind Regards,

Gordon

0 replies

rneher · 2024-12-02T11:17:08Z

rneher
Dec 2, 2024
Maintainer

Hi Gordon,

I think it would be quite twisted to use nextclade to translate already aligned short reads.

A simple script could translate those already aligned reads. You'd need to figure which ORFs you read falls into, the reading frame, and then use something like Bio.Sequence.translate from the biopython package to translate the sequence.

richard

1 reply

rneher Dec 2, 2024
Maintainer

happy to help and provide further input here, but I imagine others in your group can also advise you on this topic.

gordonkoehn · 2024-12-02T12:44:17Z

gordonkoehn
Dec 2, 2024
Author

Thank you for your guidance! Much appreciated!

Yes, that sounds like exactly what I want. I was cautious about writing my custom solution for this, expecting I'd run into corner cases beyond my biological understanding. Hence, I looked for well-tested tools for the task so far.

Thank you for the recommendation. I'll try this route, then. I appreciate any further suggestions you may have.

Kind Regards,
Gordon

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`.sam` to amino acids – can I just translate ? #1556

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

.sam to amino acids – can I just translate ? #1556

gordonkoehn Dec 2, 2024

Replies: 4 comments · 1 reply

ivan-aksamentov Dec 2, 2024 Maintainer

gordonkoehn Dec 2, 2024 Author

rneher Dec 2, 2024 Maintainer

rneher Dec 2, 2024 Maintainer

gordonkoehn Dec 2, 2024 Author

`.sam` to amino acids – can I just translate ? #1556

gordonkoehn
Dec 2, 2024

Replies: 4 comments 1 reply

ivan-aksamentov
Dec 2, 2024
Maintainer

gordonkoehn
Dec 2, 2024
Author

rneher
Dec 2, 2024
Maintainer

rneher Dec 2, 2024
Maintainer

gordonkoehn
Dec 2, 2024
Author