-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genome class to handle fasta files and chromsizes throughout package #76
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
Thanks for this, I was just hoping we'd get something like this! I'll give it a spin to see if I run into something, but it looks very good already. One thing I was considering is that we should maybe add a |
Yes, good point. We indeed already had the crested.utils.fetch_sequences (which wasn't in the tutorials anywhere, only the API docs) but now it would make more sense to only have it as a method in the Genome class |
Updates
Added a genome class and a register_genome(...) function to make working with genomes easier.
Updated all functions in the package to allow for this genome instance as input while keeping backward compatibility (you can still provide a path if you want).
Added unit tests.
The advantages of this are:
@casblaauw The genome also has an "annotations" attribute that is currently unused and not implemented, but we should use that when working with genes.
Haven't updated the tutorial yet, will do so when I finish the functional refactor.
There's one breaking change in this PR, since the crested.tl.data.AnnDataset now only expects a crested.Genome object instead of a genome_path and chromsizes. However, since this is more of a backend functionality that a normal user will never have used, I don't think this is such a big deal.
Example usage
Genome class
Registering