Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible for RNAStructure to get initialized with a file #129

Open
valentin994 opened this issue Jun 26, 2022 · 3 comments
Open

Comments

@valentin994
Copy link

Hi, I'm working on an API where I need to get the sequence out of a .pdb file. The way the code base is set up now you need to have a local copy of the file, which would mean that you have to upload it twice in an API environment. First, the user uploads it to the server and then I would need to save it locally so I can provide a path to the RNAStructure class. I took out the parsing part of the code and implemented it, but if you would like I could create a PR for supporting this functionality and maybe a bit of a cleanup.

@mmagnus
Copy link
Owner

mmagnus commented Jul 8, 2022

dear @valentin994 it's great that you found the code useful. Let me know if you need any help.

And yeah, let me see what you have so we can improve the package here.

@valentin994
Copy link
Author

In the end, I ended up creating a parser, it might be useful here too, or biopython so let me know what you think.

So the problem I stumbled upon when using RnaStructure().get_sequence is that I wouldn't always get the sequence expected (I can't really explain in biological terms as I'm a developer but I'll run through examples that might give you insight onto this).

For example these pdb files "2l8h", "6b14", "6las" the output from get_sequence() would be:

  • GGCAGAUCUGAGCCUGGGAGCUCUCUGCCRh
  • GACGCGACCGAAAUGGUGAAGGACGGGUCCAGUGCGAAACACGCACUGUUGAGUAGAGUGUGAGCUCCGUAACUGGUCGCGUCghhhhhhhhhhhhhhhhhhhhhhhhh
  • GUUGAUAUGGAUUUACUCCGAGGAGACGAACUACCACGAACAGGGGAAACUCUACCCGUGGCGUCUCCGUUUGACGAGUAAGUCCUAAGUCAACAggggooooghhhhhhhh
  • GGCAUUGUGCCUCGCAUUGCACUCCGCGGGGCGAUAAGUCCUGAAAAGGGAUGUCmhhhhhhhhh GGCAUUGUGCCUCGCAUUGCACUCCGCGGGGCGAUAAGUCCUGAAAAGGGAUGUChh RPNHTIYINNLNEKIKKDELKKSLHAIFSRFGQILDILVKRSLKeRGQAFVIFKEVSSATNALRSeQGFPFYDKPeRIQYAKTDSDIIAKehh TRPNHTIYINNLNEKIKKDELKKSLHAIFSRFGQILDILVKRSLKeRGQAFVIFKEVSSATNALRSeQGFPFYDKPeRIQYAKTDSDIIAKeAhhhhh RPNHTIYINNLNEKIKKDELKKSLHAIFSRFGQILDILVKRSLKeRGQAFVIFKEVSSATNALRSeQGFPFYDKPeRIQYAKTDSDIhhhh

While I would expect:

  • GGCAGAUCUGAGCCUGGGAGCUCUCUGCC
  • GACGCGACCGAAAUGGUGAAGGACGGGUCCAGUGCGAAACACGCACUGUUGAGUAGAGUGUGAGCUCCGUAACUGGUCGCGUC
  • GUUGAUAUGGAUUUACUCCGAGGAGACGAACUACCACGAACAGGGGAAACUCUACCCGUGGCGUCUCCGUUUGACGAGUAAGUCCUAAGUCAACA
  • GGCAUUGUGCCUCGCAUUGCACUCCGCGGGGCGAUAAGUCCUGAAAAGGGAUGUC

The expected sequences is what you can get if you get the fasta version of the files mentioned and you pull out the sequence with SeqIo parser. Now I'm not sure if the sequences I get from this package are the expected behaviour, or if there is any need to be able to parse them out like I do now. If you find it useful I can create a PR or show you more in depth how it would look. I hope that I managed to explain it well enough 😓

@valentin994
Copy link
Author

Oh, I sidetracked from the original question. But yeah initializing with bytes can also be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants