Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In ggcorr, why are 0s replaced with NA? #504

Open
winterstat opened this issue Jun 18, 2024 · 1 comment
Open

In ggcorr, why are 0s replaced with NA? #504

winterstat opened this issue Jun 18, 2024 · 1 comment

Comments

@winterstat
Copy link

Hello,

I use the ggcorr function extensively and generally love it, so thank you! However, I recently ran into an issue where I noticed that if correlations are exactly 0, their label is removed from the plot (i.e., that box is empty). See the example below where the correlation between X1 and X2 is 0, and is omitted, while the correlation between X1 and X3 is .001 and is shown as 0 because of the label_round = 1 default:

library(reprex)
library(GGally)
#> Loading required package: ggplot2
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2

cors <- matrix(c(1, 0, .001,
                 0, 1, .2,
                 .001, .2, 1), nrow = 3, byrow = T)

row.names(cors) <- colnames(cors) <- c("X1", "X2", "X3")

ggcorr(data = NULL, cor_matrix = cors, label = T)

Created on 2024-06-18 with reprex v2.0.2

I found this in the ggcorr function file, which is where I think this is being done:

m_long$coefficient[m_long$coefficient == 0] <- NA

Here is a link to the location of this line: https://github.com/ggobi/ggally/blob/9d954c1731d481028f0c6609e7152aef7e526677/R/ggcorr.R#L219C1-L232C52

Would it be possible to add an argument to the ggcorr function that allows users to decide if they want to include exact 0s or not? Showing the zeroes is very important in communicating my results (and it doesn't make sense to tell readers "when you see an empty space, that is actually a zero").

@winterstat
Copy link
Author

winterstat commented Jun 19, 2024

After trying simply commenting out that one line in the ggcorr function, I now know that 0s are replaced by NA to get the upper triangle of the correlation plot to be empty. As a fix for myself, I tried the following.

#m <- data.frame(m * lower.tri(m))
  # replace above with this: 
  m[upper.tri(m, diag = T)] <- NA
  rownames(m) <- colnames(m)
  
  # need to make it a dataframe:
  m <- data.frame(m)
  m$.ggally_ggcorr_row_names <- rownames(m)
  # m = reshape::melt(m, id.vars = ".ggally_ggcorr_row_names")
  # names(m) = c("x", "y", "coefficient")
  m_long <- m %>%
    tidyr::pivot_longer(
      cols = -.ggally_ggcorr_row_names,
      names_to = "y",
      values_to = "coefficient"
    ) %>%
    dplyr::rename(x = .ggally_ggcorr_row_names) %>%
    dplyr::mutate(y = factor(y, levels = rownames(m)))
  
  #m_long$coefficient[m_long$coefficient == 0] <- NA

This works for me, but I'm not a great R programmer, so if someone else has a more elegant solution please use that instead.

Edited to add:

Could I also suggest using label = format(label, nsmall = label_round) in several places in the ggcorr function to ensure that rounding is consistent (e.g., "0.00" shows up as "0.00" not "0")?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant