Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versions of the json profiles #43

Open
karynne7 opened this issue Aug 3, 2021 · 1 comment
Open

Versions of the json profiles #43

karynne7 opened this issue Aug 3, 2021 · 1 comment

Comments

@karynne7
Copy link

karynne7 commented Aug 3, 2021

I ran a whole bunch of samples on an older version of EHdn (0.6.2), and have since downloaded and been running the outlier analyses with the current version (0.9.0). However, I didn't immediately notice that the profile step that creates the individual jsons makes files that are actually pretty different than the older version's. I still have results, but I'm worried the older json data may have a bug or a huge reason for differences in the files. Many of the bams I've used are archived, and so it isn't trivial for me to just reprocess them individually. Any explanation between the releases would be helpful. Thanks!

Example differences run on the same bam, with the same settings and reference:
Older version -
"AAAAAAAAAAAAAAAAAAT": {
"AnchoredIrrCount": 6,
"IrrPairCount": 0,
"RegionsWithIrrAnchors": {
"12:47693944-47693945": 1,
"13:36128685-36128686": 1,
"2:200884347-200884348": 1,
"4:85302549-85302550": 1,
"5:19171777-19171778": 1,
"9:74481855-74481856": 1
},
"RepeatUnit": “AAAAAAAAAAAAAAAAAAT"

Newest version-
"AAAAAAAAAAAAAAAAAAT": {
"AnchoredIrrCount": 1,
"IrrPairCount": 0,
"RegionsWithIrrAnchors": {
"12:47693944-47693945": 1
},
"RegionsWithIrrs": {
"2:97886466-97886467": 1
},
"RepeatUnit": "AAAAAAAAAAAAAAAAAAT"

@egor-dolzhenko
Copy link
Contributor

Thanks for the question Karynne.

The algorithm for detecting in-repeat reads has changed between versions 0.6.2 and 0.9.0. So, it would be best to limit the analysis to STR profiles generated by the same version of EHdn. If you run the outlier analysis on a mixed dataset where most profiles were generated by, say, the older version of EHdn, the program may produce incorrect results.

I hope this answer is helpful. Please let me know if you have any other questions.

Best wishes,
Egor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants