Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

result from "modkit pileup" command #288

Open
Flower9618 opened this issue Oct 22, 2024 · 5 comments
Open

result from "modkit pileup" command #288

Flower9618 opened this issue Oct 22, 2024 · 5 comments
Labels
question Looking for clarification on inputs and/or outputs

Comments

@Flower9618
Copy link

Hello, thank you so much for such a helpful tool.
I would like to ask whether we can know which CpG sites from the same molecule from the result of "modkit pileup" command.

@ArtRand
Copy link
Contributor

ArtRand commented Oct 22, 2024

Hello @Flower9618,

You'll have to use modkit extract calls (documentation) to determine the co-occurance of base modifications on a individual read level.

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Oct 22, 2024
@Flower9618
Copy link
Author

I see. Thank you so much. I will try this command.

In addition, for the ‘modkit repair’ command, both the input and output files must be in BAM format. Is there an easy way to handle data processing when different tools require different input formats? For example, after sequencing, the FASTQ file is used to trim adapters, which also outputs a FASTQ file. To repair the MM and ML tags, I need to convert the FASTQ file to a BAM file. Then, for mapping to the reference genome with Minimap2, I have to convert the BAM file back to a FASTQ file.

@ArtRand
Copy link
Contributor

ArtRand commented Oct 24, 2024

Hello @Flower9618,

The easiest way is to minimize the number of conversions. I recommend staying in (mod)BAM as much as possible. dorado will perform adapter trimming and mapping (find the docs here). The team on that project have made special effort to maintain the modified base tags so you shouldn't have to use modkit repair except in special cases.

@Flower9618
Copy link
Author

Hello, @ArtRand ,

Thank you so much for your reply. I have molecules that have been modified base-calling by Dorado and saved in a BAM file (with MM,ML tag). Now, if I use the 'dorado trim' command to trim the adapter for these molecules and also choose the BAM output format, will the MM,ML tag be updated in the trimmed.bam file based on the trimming?

@ArtRand
Copy link
Contributor

ArtRand commented Oct 31, 2024

Hello @Flower9618,

Yes they should be. I'm going to add a "check" command to a future version of Modkit that will help to make sure the tags are correct. But running modkit summary with --log-filepath will do a quick check of the same. If some reads have incorrect tags - they will be logged out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants