-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporating known TEs #6
Comments
Hello @davidaray, Thank you for your interest in HiTE. If I understand correctly, you are looking to analyze the shared and differing parts between TEs identified by HiTE and curated TEs. You might consider following the benchmarking method of RepeatModeler2 for this analysis. The You can run the following commands:
This will generate several useful files in the current directory, such as |
Sorry for not replying sooner. I was comparing methods you described to my own established pipelines. I have some rather concerning results. While many of the classifications proposed by HiTE are excellent and I find some good correlations between what HiTE proposes and previous manual curations, I'm finding many misclassifications. I include one example here. In the image, I analyzed what HiTE labeled as a TIR DNA transposon. However, when I examined it using the methods available through TE-Aid (https://github.com/clemgoub/TE-Aid), I am getting a quite different result. As you can see from the image, this is quite obviously a fragment of a LINE element. It shows the characteristic reduction in copies as you move from 3'-5', it has a very nice match to a known L1 polymerase, and it has a repetitive tail typical of LINEs. Furthermore, it does not harbor any Terminal Inverted Repeats that I can find. As you can see from the upper left box, there are well over 34,000 of these in this genome assembly. Were I to continue calling this a TIR DNA transposon, I would be mislabeling over 34,000,000 bp of the assembly. This is a little disturbing because I'm finding this to be the case for many of the elements discovered in this species. I've been characterizing TEs for over 20 years and know the importance of getting a good characterization of the TEs in a genome assembly. Another potential issue is that the software appears to generate quite a few potential false positives. From this image, you can see that this sequence, labled as an LTR INT by HiTE, probably does contain the coding sequences of an LTR retrotransposon but it's just represented in the genome by a single instance. The other issue is that it appears to be cobbled together from a bunch of different fragments scattered throughout the genome rather than a single element. Over the past few days, I've written some pipelines to correct these problems for my own analyses but this is problematic for others who may not be as experienced as I am. I strongly recommend that you include some sort of warning about misclassification in your github repository. |
Dear Professor David A. Ray, I had the pleasure of reading your outstanding paper, "Insights into mammalian TE diversity through the curation of 248 genome assemblies" (https://www.science.org/doi/10.1126/science.abn1430), and found it incredibly insightful. Thank you for your remarkable contributions to the TE field. I look forward to learning more from your work. While HiTE has shown promise, there is still room for improvement. I am currently working on enhancing the detection modules for all types of TEs, including LTR, TIR, Helitron, and non-LTR, to achieve higher performance. Could you kindly provide me with the DNA sequences and genomes of the two examples mentioned in your response above? This would greatly assist me in further improving HiTE. Best regards, Kang Hu |
Dear Professor David A. Ray, I am very pleased to hear that you found HiTE helpful, and I am grateful for the data you provided. I will carefully study your suggestions to improve the tool further. Additionally, I am eagerly looking forward to your next publication. Wishing you continued success in your research. Best regards, Kang Hu |
I'm curious as to whether one can use a library of already curated TEs to enhance the analysis and eliminate duplication with previous library work.
I have several species that we have manually curated and I'm hoping to use HiTE. I plan to compare the HiTE libraries to our curated TEs using any of several tools but was wondering if there is a mechanism built in that would allow me to do this automatically.
The text was updated successfully, but these errors were encountered: