Data treatment/tidying #7

ajstewartlang · 2019-04-18T09:23:22Z

It looks like we need to do a fair bit of data tidying at the start. For example, for the variables comp_int_bed_16 and comp_noint_bed_16 the values in the dataset are "Yes" and "NA". Presumably the NA corresponds to "No" rather than reflecting real missing data:

> unique(maps_synthetic_data$comp_int_bed_16)
[1] NA "Yes"

> unique(maps_synthetic_data$comp_noint_bed_16)
[1] NA "Yes"

So I'm thinking we replace the NAs in these two variables with "No" and then turn them into 2-level factors:

maps_synthetic_data[is.na(maps_synthetic_data$comp_int_bed_16), ]$comp_int_bed_16 <- "No"
maps_synthetic_data[is.na(maps_synthetic_data$comp_noint_bed_16), ]$comp_noint_bed_16 <- "No"
maps_synthetic_data$comp_int_bed_16 <- factor(maps_synthetic_data$comp_int_bed_16)
maps_synthetic_data$comp_noint_bed_16 <- factor(maps_synthetic_data$comp_noint_bed_16)

For the for anxiety measure at age 15 variable, is looks like the NAs correspond to 0 rather than missing data:

> unique(maps_synthetic_data$anx_band_15)
[1] "~0.5%" NA "~3%" "~15%" "~50%" "<0.1%"

So we might want to replace the NAs there with zeros.

maps_synthetic_data[is.na(maps_synthetic_data$anx_band_15), ]$anx_band_15 <- 0

But what about the other values? Should we make this an ordered factor or treat as numerical? If treating as numerical we could recode as:

maps_synthetic_data <- maps_synthetic_data %>% mutate(anx_band_15 = as.integer(recode(anx_band_15, "~0.5%" = ".5", "~3%" = "3", "~15%" = "15", "~50%" = "50", "<0.1%" = "0")))

Although that forces the <0.1% values to be 0. Would an ordered factor be better do you think? @wjchulme, @jspickering, @OliJimbo

The text was updated successfully, but these errors were encountered:

ajstewartlang · 2019-04-18T10:09:04Z

Perhaps recoding as an ordered factor is better as it seems to capture the difference between the discrete scores better:

maps_synthetic_data <- maps_synthetic_data %>% mutate(anx_band_15 = recode_factor(anx_band_15, "0" = "0", "<0.1%" = "0.1", "~0.5%" = ".5", "~3%" = "3", "~15%" = "15", "~50%" = "50", ))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data treatment/tidying #7

Data treatment/tidying #7

ajstewartlang commented Apr 18, 2019 •

edited

Loading

ajstewartlang commented Apr 18, 2019

Data treatment/tidying #7

Data treatment/tidying #7

Comments

ajstewartlang commented Apr 18, 2019 • edited Loading

ajstewartlang commented Apr 18, 2019

ajstewartlang commented Apr 18, 2019 •

edited

Loading