Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle overlapping deletions #62

Merged
merged 11 commits into from
Aug 27, 2024
Merged

Conversation

alkaZeltser
Copy link
Collaborator

@alkaZeltser alkaZeltser commented Aug 23, 2024

#61 Outlines a bug: the notation used to indicate an overlapping deletion in a genotype call is not recognized by convert.alleles.to.pgs.dosage.

More info here: https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele

The asterisk * is used in VCF format to denote such an allele, and it is not a recognized allele format by the dosage calculator function convert.alleles.to.pgs.dosage.

I have decided to add * to the allowed list of sample alleles but not to the list of allowed risk alleles. This means that an * will always be treated as non-risk and will contribute a dosage of 0.

@RoniHaas would you mind weighing in on whether you've encountered this notation in your research and how you think they should be handled?

  • Updated convert.alleles.to.pgs.dosage to recognize * as a non-risk allele.
  • Small bug fixes discovered during testing in apply.polygenic.score and create.pgs.with.continuous.phenotype
  • Added new test cases where needed.
  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added the changes included in this pull request to NEWS under the next release version or unreleased, and updated the date.

  • I have updated the version number in metadata.yaml and DESCRIPTION.

  • Both R CMD build and R CMD check run successfully.

Closes #61

Testing Results

All tests PASS

@@ -29,5 +29,5 @@ Config/testthat/edition: 3
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why this incremented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done automatically by devtools::document(), possibly I updated Roxygen since the last time, not sure though.

@@ -345,7 +345,12 @@ apply.polygenic.score <- function(
sample.coordinate.to.row.dict.hash <- new.env(hash = TRUE, parent = emptyenv());

for (i in 1:nrow(merged.vcf.with.pgs.data)) {
# skip if the variant is missing from all samples
if (is.na(merged.vcf.with.pgs.data[i, 'Indiv'])) {
next;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to include a message about the skipped variant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deep in the multiallelic site handling algorithm, it's not useful to the user. Missing sites are reported in the final output.

Copy link
Contributor

@dan-knight dan-knight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me! Just make sure to update the date/version and add the functional changes to the changelog.

@alkaZeltser
Copy link
Collaborator Author

Code looks good to me! Just make sure to update the date/version and add the functional changes to the changelog.

Should I do that with this PR or wait until the release PR?

Copy link

@forbiddenpersimmon forbiddenpersimmon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dan-knight
Copy link
Contributor

Should I do that with this PR or wait until the release PR?

I usually just keep it up to date as I go along, but the other PR is up already so I'm fine with just handling it there.

@alkaZeltser
Copy link
Collaborator Author

Should I do that with this PR or wait until the release PR?

I usually just keep it up to date as I go along, but the other PR is up already so I'm fine with just handling it there.

I guess what I'm confused about is I already did update NEWS/changelog with my changes under "unreleased" but it sounds like you want me to also increment the version even when I am not ready to make another release. I didn't think the version should be incremented after every PR?

@dan-knight
Copy link
Contributor

I guess what I'm confused about is I already did update NEWS/changelog with my changes under "unreleased" but it sounds like you want me to also increment the version even when I am not ready to make another release.

Ah, sorry for the confusion. You did it right, I just didn't see the NEWS.md changes in this PR. All good!

Copy link

@RoniHaas RoniHaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about how you handled the * as a non-risk allele!

A small unrelated suggestion is to consider testing strand flip.

@alkaZeltser
Copy link
Collaborator Author

I agree about how you handled the * as a non-risk allele!

A small unrelated suggestion is to consider testing strand flip.

Good point, I've started an issue for potential new feature in the future.

@alkaZeltser alkaZeltser merged commit b681d82 into main Aug 27, 2024
6 checks passed
@alkaZeltser alkaZeltser deleted the nzeltser-handle-overlapping-deletions branch August 27, 2024 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add handling of overlapping deletion alleles ("*")
4 participants