Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standard output: h1 to h3, h5 to h1, etc #3

Open
tkanderson opened this issue Dec 7, 2020 · 4 comments
Open

standard output: h1 to h3, h5 to h1, etc #3

tkanderson opened this issue Dec 7, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@tkanderson
Copy link
Member

Read in a nt or aa fasta, export a tab delimited text file where first column is "HX" numbering for every position (based upon the desired --subtype flag), and each position is then annotated by desired annotation tables or --wiley/caton.

The usage here is to read in hundreds/thousands of sequences, and then have each position numbered using the desired numbering scheme + annotations. So - a "desktop" version of the IRD numbering translator tool with the additional benefit of being able to include custom annotations and the published epitope sites.

@tkanderson tkanderson added the enhancement New feature or request label Dec 7, 2020
@tkanderson
Copy link
Member Author

This may no longer fall into the "aadiff" tool - so, perhaps an "annotation" option would be better.

Read in a nt or aa fasta, export a tab delimited text file where first column is "HX" numbering for every position (based upon the desired --subtype flag), and each position is then annotated by desired annotation tables or --wiley/caton.

The usage here is to read in hundreds/thousands of sequences, and then have each position numbered using the desired numbering scheme + annotations. So - a "desktop" version of the IRD numbering translator tool with the additional benefit of being able to include custom annotations and the published epitope sites.

@arendsee
Copy link
Collaborator

arendsee commented Dec 7, 2020

This would be fairly easy to do. I'd probably want to batch the input sequences for performance sake. The command might look something like:

$ flutile annotate --subtype=H1 --caton82 mydata.faa > mydata.txt

I am not quite happy trusting the --subtype=H1 info from the user. I would like a way to double check that the user's sequence are really members of the subtype. This would involve finding some sort of scoring metric and cutoff that can be calculated from the alignments. An obvious choice would be to use the same metrics as BLAST.

@tkanderson
Copy link
Member Author

I like that approach - with the same (--annotation-tables, --join-annotations) as aadiff.

I think you may be misreading my subtype query. The most common use here is most likely going to be someone who wants H3 numbering on their H1/H5 sequence (or people with H3 who want H3 + annotations). So, the --subtype flag, maybe needs to be reworded to --output-numbering.

@arendsee
Copy link
Collaborator

arendsee commented Dec 7, 2020

That makes sense. I still want some sort of "crap filter" or diagnostic function in flutile that I help identify and/or remove bad data. But make this can be a flutile diagnose functiion. So this is a different topic for a different issue.

Adding --annotation-tables and --join-annotations makes sense. Actually, I can keep exactly the same API as aadiff and just change the output format. This would reuse all the functions I've already implemented and simplify maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants