copy_number -> copy_call bins inconsistent #268

ymahlich · 2024-12-12T18:25:02Z

There seem to be differences in how copy_number is converted to copy_call for HCMI in comparison to all other datasets.
Is there a specific reason for this or is this an oversight / bug? If this intentional, that does in turn mean that the copy_number value (not copy_call) won't be comparable between HCMI and other?

Maybe I am deeply misunderstanding something here but it seems odd to me.

I attached the code for every conversion below with references to the code itself below:

Broad Sanger (02-broadSangerOmics.R: lines 119-122):

dplyr::mutate(IMPROVE=ifelse(copy_number<0.5210507,'deep del',
                               ifelse(copy_number<0.7311832,'het loss',
                                      ifelse(copy_number<1.214125,'diploid',
                                             ifelse(copy_number<1.422233,'gain','amp')))))|>

CPTAC (getCptacData.py: lines 213-224):

    for a in arr:
        a = 2**float(a)
        if float(a) < 0.5210507:
            b = 'deep del'
        elif float(a) < 0.7311832:
            b = 'het loss'
        elif float(a) < 1.214125:
            b = 'diploid'
        elif float(a) <1.42233:
            b = 'gain'
        else:
            b = 'amp'

HCMI (02-getHCMIData.py: lines 479-488):

a_val = math.log2(float(a)+0.000001) ###this should not be exponent, should be log!!! 2**float(a)
if a_val < 0.0: #0.5210507:
          return 'deep del'
      elif a_val < 0.7311832:
          return 'het loss'
      elif a_val < 1.214125:
          return 'diploid'
      elif a_val < 1.731183:
          return 'gain'
      else:
          return 'amp'

MPNST (01_mpnst_get_omics.R: lines 173-176):

dplyr::mutate(copy_call=ifelse(copy_number<0.5210507,'deep del',
                               ifelse(copy_number<0.7311832,'het loss',
                                      ifelse(copy_number<1.214125,'diploid',
                                             ifelse(copy_number<1.422233,'gain','amp')))))|>

The text was updated successfully, but these errors were encountered:

sgosline self-assigned this Dec 18, 2024

sgosline added this to CoderData Dec 18, 2024

sgosline assigned jjacobson95 and unassigned sgosline Dec 18, 2024

jjacobson95 mentioned this issue Dec 18, 2024

Fix Copy Calls in HCMI #276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copy_number -> copy_call bins inconsistent #268

copy_number -> copy_call bins inconsistent #268

ymahlich commented Dec 12, 2024

copy_number -> copy_call bins inconsistent #268

copy_number -> copy_call bins inconsistent #268

Comments

ymahlich commented Dec 12, 2024

Broad Sanger (02-broadSangerOmics.R: lines 119-122):

CPTAC (getCptacData.py: lines 213-224):

HCMI (02-getHCMIData.py: lines 479-488):

MPNST (01_mpnst_get_omics.R: lines 173-176):