Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Guidance on Converting NOTALs to Presence in Syri Outputs #246

Open
Lancer-sudo-png opened this issue Apr 23, 2024 · 3 comments

Comments

@Lancer-sudo-png
Copy link

Dear @mnshgl0110,
I hope this message finds you well. I am reaching out to you because I am currently utilizing Syri, which I find to be an exceptionally useful tool for calling structural variants (SVs). However, I have encountered some challenges with presence/absence variations (PAVs) that I hope you can help clarify.

After reviewing the discussions in the Issues section #107 , I concur with the perspective that NOTALs could be considered as PAVs. My specific question concerns the process of converting a NOTAL, found in the query sequence, into a "Presence" status. In the Syri output file (syri.out), the locations are noted only for the query genome, with no corresponding descriptions for the reference genome,Like this:

NOTAL-query pic

Could you please advise on how to appropriately define a NOTAL-query as "Presence" in a VCF file format? Any guidance or suggestions you could provide would be greatly appreciated.

Thank you for your time and assistance.

Best regards,
Baoyue

@mnshgl0110
Copy link
Member

Consider the following example, where the syntenic regions (between and query genomes) are in blue, translocation in red, and the Notal region is shown as green circle.

image

Now, compared to the reference, the notal could correspond to either of the two locations marked with arrows. As there is no obvious answer as to which position "mutated" to generate the notal, we do not assign a reference coordinate to notals in query.

But, I guess for your task, it might be ok to assign one of the two values (as long as it is clearly described). You will need to fetch the neighboring blocks in the query genome and then get the corresponding breakpoints in the reference.

@Lancer-sudo-png
Copy link
Author

Lancer-sudo-png commented Apr 26, 2024

Thank you for your response! @mnshgl0110

Indeed, I have identified a pattern based on adjacent annotation blocks:

subtracting 1 from the left breakpoint of a NOTAL-query yields the right breakpoint of the adjacent block on the left. Using the right breakpoint of the adjacent block, I can locate its corresponding segment on the reference genome. Similarly, adding 1 to the right breakpoint of a NOTAL-query yields the left breakpoint of the adjacent block on the right, and I can locate its corresponding segment on the reference genome using this left breakpoint.

However, I found that the segments on the reference genome identified using the adjacent block breakpoints may be located on different chromosomes. In such cases, how should I describe this NOTAL-query (i.e., what we consider as Presence) using the coordinates of the reference genome? Here is my variant information and a schematic diagram:
image

NOTAL-query: - - - - - Chr1 36665 37065 NOTAL27497 - NOTAL -
The left block of the NOTAL-query: Chr1 20046 29279 - - Chr1 27494 36664 SYN3 - SYN -
The right block of the NOTAL-query: Chr3 4532378 4532617 - - Chr1 37066 37306 INVDP47167 - INVDP copygain

@mnshgl0110
Copy link
Member

Indeed, that is the complication in describing Notals using reference genomes as sometimes there is no obvious answer. I guess, you can pick one convention for your analysis and then use that. Unfortunately, I do not have a more helpful answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants