-
Notifications
You must be signed in to change notification settings - Fork 0
/
Week6-Lecture2.qmd
168 lines (108 loc) · 7.27 KB
/
Week6-Lecture2.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
date: "`r (lubridate::ymd('20241002')+lubridate::dweeks(5))`"
title: "Class 12 A/B Testing"
---
# Randomized Controlled Trials
## Randomized Controlled Trials
::: block
### Randomized Controlled Trials {.unnumbered}
A **randomized controlled trial** (RCT) is an experimental form of impact evaluation in which the population receiving the intervention is chosen *at random* from the eligible population, and a control group is also chosen *at random* from the same eligible population.
:::
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics("images/Week 6/RCT.jpg")
```
## Types of RCTs: Based on Location
| | **Lab Experiment** | **Field Experiment** |
|---------------------|---------------------------------------------|---------------------------|
| **Location** | In a controlled, laboratory environment | In the field |
| **Internal validity** | High | Low |
| **External validity** | Low (Hawthorne effect) | High |
- Internal validity refers to the extent to which the experiment is free from selection bias.
- External validity refers to the extent to which the results can be generalized to the real world or other contexts.
## Types of RCTs: Based on Treatment Design
- A/B testing (treatment group + control group)
1. Loyalty program
2. No loyalty program
- A/B/N testing (multiple treatment groups + control group)
1. Point-based loyalty program; points can be redeemed for price vouchers
2. Point-based loyalty program; points can be redeemed for gifts
3. Point-based loyalty program; points can be redeemed for free top ups
4. No loyalty program
- Factorial design
- more than 2 dimensions of treatments, used if we care about the interaction effects
# Procedures of A/B Testings
## Motivating Example of Tom's Loyalty Program
- Should we introduce a loyalty program for our customers?
- Cons: increased costs due to free drinks
- Pros: it may increase spending and retention rate, and hence future CLV
- How to estimate the causal effect of introducing a loyalty program?
## Step 1: Decide on the Unit of Randomization
We decide **the granularity of randomization unit** based on the research question at hand.
- **individual**
- household
- store
- other levels more granular (e.g., device level) or even less granular (e.g., city level)
## Step 1: Decide on the Unit of Randomization
**Proposal 1**: Randomize the treatment based on West London and East London.
- Do you expect the "randomize" to be true randomization?^[All answers to questions in the slides are on the webpage version of lecture notes.]
::: {.content-hidden when-format="beamer"}
> No, because East London and West London are intrinsically different. Randomization can only work well when the number of randomization units are large enough.
:::
**Proposal 2**: Randomize the treatment among individual customers.
- Is this true randomization?
::: {.content-hidden when-format="beamer"}
> Yes, as long as we have a large number of individual customers, after randomization, we should see the treatment group and control group customers to have balanced characteristics.
:::
- What problems can we still have?
> ::: {.content-hidden when-format="beamer"}
> 1. Spillover: Customers may talk to each other, so individual customers in the control group (who are not supposed to see the loyalty program) may also become aware of the loyalty program.
> 2. Crossover: the same individual may accidentally receive different treatments.
> :::
## Step 1: Pros and Cons of Granularity
**Disadvantages of granularity:**
- Costs and logistics
- Spillovers and crossovers
**Advantages of granularity:**
- Increase the chance of successful randomization, thereby mitigating any systematic unbalance of covariates before the experiment.
**Exercise:**
- If we would like to randomize prices, how can we randomize individualized price discounts to customers?
::: {.content-hidden when-format="beamer"}
> We can send individually personalized coupons to customers, e.g., Co-op weekly offers, Uber's dynamic pricing.
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 6/coopcoupon.jpeg')
```
:::
## Step 2: Mitigate Spillover and Crossover Effects
- **Crossover Effects**: A crossover occurs when an individual who was supposed to be assigned to one treatment is accidentally exposed to another or more treatments.
- e.g., For online A/B testing, a notorious crossover effect is that when browsers reset the cookies, the same individual customer may be treated as a different new customer.
- **Spillover effects**: The behavior of the treatment group can affect control group as well
- e.g., customers may share the promotions with family members and friends.
**Question**: How should we mitigate spillover and crossover effects?
> ::: {.content-hidden when-format="beamer"}
> - Make sure that the same unit receives the same treatment throughout the experiment, e.g., forcing the customers to log into the website using their user accounts. User IDs should be unique.
>
> - Randomize at the level of plausibly isolated social networks such as a community, rather than individual level.
>
> - However, we must acknowledge that, it is really challenging to implement an A/B testing without any crossover or spillover in the field.
> :::
## Step 3: Decide on Randomization Allocation Scheme
- Individuals (or the relevant unit of randomization) are allocated at random into a treatment condition based on some decision rules.
- Due to the high costs and potential risks of A/B testing, we often select a small percentage of customers into the treatment condition, while the remaining customer should do "business-as-usual".
## Step 4: Collect Data
- Any field experiment should be aware of the need for a sufficiently large sample size, or sufficient statistical power.
- The larger sample size, the higher statistical power for the experiment; meanwhile, larger sample size brings higher costs and risks.
- Run a [power calculation in R](https://cran.r-project.org/web/packages/pwrss/vignettes/examples.html)
- Collect both data on the outcome variables of interest and consumer characteristics data
**Proposal:** We need to collect customers' spending and retention data and link the data with their treatment assignment.
## Step 5: Interpreting Results from a Field Experiment
**Step 5.1: Randomization check**
- We need to check if the treatment group and control group are well-balanced in terms of their **pre-treatment** characteristics, especially the outcome variables.
**Step 5.2: Analyze the data and estimate the ATE**
- **t-test** to examine the difference in the average outcome between the treatment group and control group. In R, we can use `t.test()`
- **Regression analysis** when analyzing A/B/N testing or multivariate experiments.
## After-Class Readings
- (optional) [Varian, Hal R. 'Causal Inference in Economics and Marketing'. Proceedings of the National Academy of Sciences 113, no. 27 (5 July 2016): 7310--15.](https://ucl.primo.exlibrisgroup.com/permalink/44UCL_INST/18kagqf/cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4941501)