Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid spades assembly question #206

Open
aleuQUT opened this issue May 30, 2024 · 2 comments
Open

Hybrid spades assembly question #206

aleuQUT opened this issue May 30, 2024 · 2 comments

Comments

@aleuQUT
Copy link

aleuQUT commented May 30, 2024

Hey Rhys

I was wondering why the same long reads used for the metaflye assembly are being used for the spades hybrid assembly with Short reads that did not map to the long read assembly. Wouldn't it make sense to use only low quality long reads?

Cheers,
Andy

@rhysnewell
Copy link
Owner

Hey Andy,

It's been awhile since I've looked at that section but I've thought about this a fair bit in the past, and there are a couple of reasons:

  1. To try and maximise any connections between the very fragmented secondary spades assembly:
  • This assembly usually has a very small amount of short reads (comparatively) being thrown into it, and including all long reads helps to try and bridge any gaps that might exist in this assembly that can be easily fixed.
  • This poses the risk of potentially "doubling" up usage of long reads which I think is probably reason enough to look at an alternative
  • A smarter method would recognise when a long read has been used twice and then try and re-incorporate that back into the main assembly:
    • Like, if a gap is bridged by a long read but that long read is known to have been used previously then we try and intelligently align the new bridged contig back into the long read assembly
  1. This is how the OG slamM assembly pipeline was laid out. It would use all the long reads in the spades assembly. It would also then follow this up with use of unicycler which I've made optional because it is slow and doesn't improve results. Unicycler also doesn't fix the "doubling up" usage problem, if it did occur.

I don't think either of these are good enough reasons to keep it as is. It would be trivial to filter out the long reads that haven't been used in assembly or polishing (I think?) without mapping again, but it might be better long term to try and thread re-used reads back into the main assembly.

I guess I was never sure what the best method was and never got around to a better solution. If you think that the current method should be done differently then I'm happy to figure out how to get that implemented. I don't think we should limit it to just "low quality" long reads though, I'm guessing you just mean "previously unused" long reads? Keen to hear yours and others thoughts on the matter

Cheers,
Rhys

@AroneyS
Copy link
Collaborator

AroneyS commented May 30, 2024

I guess alignment of the relevant contigs (those with reused long-reads) might work. But what do you do if they disagree? Like if the same long-read is present in incongruent contigs from metaflye and metaspades? Preferentially dump metaspades?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants