-
Notifications
You must be signed in to change notification settings - Fork 0
/
Lecture 4 Subsetting.Rmd
141 lines (101 loc) · 2.7 KB
/
Lecture 4 Subsetting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: 'Lecture 4: Subsetting vectors'
author: "David McAllister"
date: "Monday 31st October, 2016"
output:
ioslides_presentation: default
---
## Objective of section
* To be able to use different types of indexing ([, [[ and $)
* To be able to use integer, negative integer and logical values within these
## Much of the following is taken from
Advanced R by Hadley Wickham
"http://adv-r.had.co.nz/"
## The little operator that could `[`
R does all of this
* Filtering
* Deleting
* Changing values
* Merging
## Why subsetting is hard
* three subsetting operators, [, [[ and $
* four types of subsetting, +ve integer, -ve integer, logical and character
* Different for different objects (e.g. vectors,lists, factors, matrices and data frames)
* The dreaded drop, subsetting changes object types
## Atomic vectors
```{r}
x <- c(2.1, 4.2, 3.3, 5.4)
```
The number after the decimal point gives the original position in the vector.
## __Positive integers__ return elements at the specified positions:
```{r}
x[c(3, 1)]
x[order(x)]
# Duplicated indices yield duplicated values
x[c(1, 1)]
# Real numbers are silently truncated to integers
x[c(2.1, 2.9)]
```
## __Negative integers__ omit elements at the specified positions:
```{r}
x[-c(3, 1)]
```
You can't mix positive and negative integers in a single subset:
```{r, error = TRUE}
x[c(-1, 2)]
```
## __Logical vectors__
select elements where the corresponding logical value is `TRUE`
Most useful type of subsetting
```{r}
x[c(TRUE, TRUE, FALSE, FALSE)]
x > 3
x[x > 3]
```
## Logical versus integer vectors
For logical vector, length of indexing vector = length of vector
```{r}
x[c(TRUE, FALSE)]
# Equivalent to due to vector recycling
x[c(TRUE, FALSE, TRUE, FALSE)]
```
## Missing indexing
Missing value in indexing vector yields missing value in output
```{r}
x[c(TRUE, TRUE, NA, FALSE)]
```
## __Nothing__ returns the original vector.
```{r}
x[]
```
## __Character vectors__
Used to return elements with matching names.
```{r}
(y <- setNames(x, letters[1:4]))
y[c("d", "c", "a")]
# Like integer indices, you can repeat indices
y[c("a", "a", "a")]
# When subsetting with [ names are always matched exactly
z <- c(abc = 1, def = 2)
z[c("a", "d")]
```
# Assigning
## Typical data cleaning exercise
```{r, echo = c(-1,-2,-3, -10)}
x <- round(rnorm(10, mean = 120, sd = 20), 0)
x[c(1,4,8)] <- -99
x
# Integer subsetting with assignment
x[1] <- NA
# Logical subsetting with assignment
x[x == -99] <- NA
x
```
## Typical data modifying exercise
```{r, echo = -5}
# More complex logical subsetting with assignment
sex <- rep(c("Male", "Female"), each = 5)
x_mod <- x
x_mod [sex == "Male"] <- x_mod[sex == "Male"] * 1.2
data.frame (sex = sex, x = x, x_mod = x_mod)
```