-
Notifications
You must be signed in to change notification settings - Fork 0
/
Week10-Lecture2.qmd
493 lines (332 loc) · 14.9 KB
/
Week10-Lecture2.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
---
date: "`r (lubridate::ymd('20241002')+lubridate::dweeks(9))`"
title: "Class 20 Looking Back & Moving Forward"
suppress-bibliography: true
---
# Causal Machine Learning
## When Machine Learning Meets Causal Inference
- **Causal Machine Learning** (CML) represents the state-of-the-art development in the field of data science.
- Conventional machine learning excels at finding patterns and making predictions, but it often falls short in understanding causation.
- Conventional causal inference techniques (instrumental variable, DiD, RDD) estimate average treatment effects, and they mostly rely on linear regressions and are not good at estimating heterogeneous treatment effects across individuals.
- This is where causal machine learning steps in, aiming to uncover these causal relationships borrowing the predictive power of machine learning tools.
- Microsoft Research has developed [`EconML`](https://www.microsoft.com/en-us/research/project/econml/overview/), which is the industrial pioneer in CML.
## Causal Forest: A Powerhouse in Causal Machine Learning
- Causal Forest developed by @atheyGeneralizedRandomForests2019 (Generalized Random Forest), a part of the CML toolkit, is an extension of the original random forest algorithm.
- The core idea of Causal Forest is to estimate the causal effect of a treatment using recursive binary splitting similar to decision trees in random forest. It does so by building a large number of *causal trees*, each based on a subset of data and features.
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics("images/Week 10/causal forest.png")
```
- [`grf`](https://grf-labs.github.io/grf/) is the R package that can implement causal forest. Stanford Youtube channel also provides comprehensive [tutorial videos](https://youtube.com/playlist?list=PLxq_lXOUlvQAoWZEqhRqHNezS30lI49G-&si=evF_BJyNkymdvhow) on causal forest.
```{r}
#| message: false
#| warning: false
pacman::p_load(grf,fixest,dplyr,ggplot2,ggthemes)
## Use the DiD data to illustrate the causal forest
data("base_did")
data_Y <- base_did %>%
mutate(Post = ifelse(period >=6,1,0))%>%
group_by(id,Post)%>%
summarise(avg_outcome = mean(y)) %>%
group_by(id) %>%
summarise(first_diff = avg_outcome[2] - avg_outcome[1] )%>%
ungroup()
data_W <- base_did %>%
select(id, treat) %>%
unique()
data_X <- base_did %>%
filter(period <6) %>%
group_by(id) %>%
summarise(avg_x = mean(x1)) %>%
ungroup()
```
## Application of Causal Forest in Causal Inference
- We can use causal forest to estimate the treatment effects for each individual and plot the histogram.
::: {.columns}
::: {.column width='50%'}
\tiny
```{r}
#| echo: true
#| eval: false
cf <- causal_forest(
X = data.matrix(data_X$avg_x),
Y = data.matrix(data_Y$first_diff),
W = data_W$treat
)
predicted_CATE <- predict(cf)
ggplot() +
geom_histogram(
data = predicted_CATE,
aes(x = predictions),
color = "black", fill = "white"
) +
theme_stata()
```
:::
::: {.column width='50%'}
```{r}
#| message: false
#| warning: false
#| fig-align: center
#| echo: false
cf <- causal_forest(
X = data.matrix(data_X$avg_x),
Y = data.matrix(data_Y$first_diff),
W = data_W$treat
)
predicted_CATE <- predict(cf)
ggplot() +
geom_histogram(
data = predicted_CATE,
aes(x = predictions),
color = "black", fill = "white"
) +
theme_stata()
```
:::
:::
- Once we know the treatment effects for each individual, we can further automate the targeting decision using the estimated treatment effects. This is called policy learning in the causal machine learning field.
# NLP and LLM
## Text Mining in Marketing Analytics
- Natural language processing (NLP) and text mining are powerful tools for analyzing unstructured text data in marketing analytics. Refer to the book [Text Mining with R](https://www.tidytextmining.com/) for more details.
- **Sentiment analysis** is the process of determining the sentiment (positive, negative, or neutral) of a piece of text. It is widely used in social media monitoring, customer feedback analysis, and brand reputation management.
- In R, the [`tidytext`](https://cran.r-project.org/web/packages/tidytext/vignettes/tidytext.html) package implements sentiment analysis.
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 10/sentimentanalysis.png')
```
- Application: Use sentiment analysis to analyze customer reviews, social media posts, and other text data to understand customer sentiment.
## Topic Modeling
- **Topic modeling** is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling algorithms.
- In R, [`topicmodels`](https://cran.r-project.org/web/packages/topicmodels/vignettes/topicmodels.pdf) package implements LDA topic modeling.
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 10/topicmodeling.png')
```
- Application: Use topic modeling to analyze customer feedback, social media posts, and other text data to identify key topics and themes.
## Transformers, BERT, GPT, and LLM
- BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) are two of the most popular models in the field of natural language processing (NLP) at the moment.
- BERT is designed to understand the context of words in a sentence, while GPT is designed to generate human-like text.
- LLM is the model that combines the strengths of BERT and GPT. It is designed to understand the context of words in a sentence and then generate human-like text.
- Applications in marketing analytics:
- Use GPT to copilot with human managers regarding product descriptions, email messages, or social media posts.
- Use LLM to generate survey responses for your term 3 dissertation project. Follow this [guide](https://www.sciencedirect.com/science/article/pii/S2949719123000171).
- More applications of LLM in marketing: [Generative AI in innovation and marketing processes: A roadmap of research opportunities](https://link.springer.com/article/10.1007/s11747-024-01044-7)
# Marketing Analytics: Our Journey
- Let’s reflect on our journey this term and see how you can apply them in your dissertation project and future career.
```{r}
#| echo: false
#| fig-align: 'center'
#| out-width: "10cm"
knitr::include_graphics('images/CourseStructure2024.png')
```
## Week 1: Marketing Process
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics("images/Week 1/marketingprocess.png")
```
\vspace{0.5cm}
::: {.callout-tip}
Conduct a situation analysis in the Introduction section of your dissertation.
:::
## Week 1-2: Profitability Analysis
- Break-even analysis is essential to any business activity
- For business campaigns: Break-even quantity (BEQ) and Net present value (NPV)
- For customers: Customer lifetime value (CLV)
- **Case study**:
- CLV Analysis for M&S’s Delivery Pass (Week 2)
- CLV for Tom’s Bubble Tea Shop (1st assignment)
\vspace{0.5cm}
::: {.callout-tip}
- Fulton: "Calculate the predictive lifetime value of customers"
- Lebara: "building a tenure prediction model that will feed into and enhance our CLTV model."
- Economist: "goes through an A/B test to assess its impact on key metrics such as Customer Lifetime Value (CLTV)"
:::
## Week 1: Hey, I'm Wei, and I'm a Youtuber!
```{r}
#| echo: false
#| fig-align: 'center'
#| out-width: "8cm"
knitr::include_graphics('images/Week 10/Youtube2024.png')
```
## Week 3: Data Wrangling and Descriptive Analytics with `dplyr`
- Data wrangling with `dplyr`
- basic operations: `filter`, `mutate`, `select`, `arrange`
- group aggregation: `group_by`
- multi-data joining: `left_join`
- Descriptive analytics with `ggplot2` (visualization), `modelsummary` (summary statistics), and `dplyr` (data wrangling).
\vspace{0.5cm}
::: {.callout-tip}
- You will need to submit data and code for your dissertation. Therefore, it's important to use version control tools like Git and GitHub.
:::
## Week 3: Hey, I'm Wei, and I'm a musician!
::: columns
::: {.column width="50%"}
```{r}
#| echo: false
#| fig-align: 'center'
#| out-width: "2cm"
knitr::include_graphics('images/Week 10/ShapeOfYou.png')
```
:::
::: {.column width="50%"}
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 10/survey.png')
```
:::
:::
## Week 4: Unsupervised Learning for Customer Segmentation
- Unsupervised learning such as K-means clustering help classify individuals into different segments. We then decide which segment(s) to serve based on our business objective.
\vspace{0.5cm}
::: {.callout-tip}
- ITV: "a summary of key points for each campaign test based on free text fields, using topic modelling (e.g. K-means, LDA) to identify thematic trends"
- DataVisionServices: "... we will build a framework of site selection for the client. To do so we see the student using multiple techniques, such as location analysis, clustering and building algorithms...
:::
## Week 5: Supervised Learning for Customer Targeting
- Supervised learning models learn the relationship between outcome $Y$ and $X$ and can make **individualized** prediction.
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 5/supervisedlearning.png')
```
- Fundamental concepts in supervised learning:
- Bias-variance trade-off
- Overfitting and underfitting
- Accuracy-interpretability trade-off
- Decision tree and random forest
## Week 5: Application in Marketing: Personalized Targeting
- With targeted marketing from supervised learning, we can effectively reduce marketing costs and boost the ROI.
- Improving Marketing Efficiency Using Predictive Analytics for M&S (Week 5)
- 2nd assignment: Amazon Prime case
\vspace{0.5cm}
::: {.callout-tip}
- British Transport Police: "use ML to predict victims of crime"
- Economist: "which picture to attach to social media post to increase engagement"
- Barclay: "use predictive analytics to predict house prices"
- etc.
:::
## Week 6: Why Causal Inference Matters?
- Managers easily make costly mistakes if they do not understand causal inference.
::: {.columns}
::: {.column width='30%'}
```{r}
#| echo: false
#| fig-align: 'center'
#| out-width: "3cm"
knitr::include_graphics('images/Week 6/Bubble tea ads.png')
```
:::
::: {.column width='30%'}
```{r}
#| echo: false
#| fig-align: 'center'
#| out-width: "2cm"
knitr::include_graphics('images/Week 6/pricesales.png')
```
:::
::: {.column width='30%'}
```{r}
#| echo: false
#| fig-align: 'center'
#| out-width: "3cm"
knitr::include_graphics('images/Week 6/survivalbias.png')
```
:::
:::
## Week 6: I'm Wei and I'm from Hogwarts!
::: {.columns}
::: {.column width='50%'}
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 10/magic2024.png')
```
:::
::: {.column width='50%'}
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 10/costume.jpeg')
```
:::
:::
## Week 6: Potential Outcomes and A/B Testing
- The gold standard for causal inference in marketing analytics is A/B testing.
- Basic Identity of Causal Inference
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 6/BasicIdentityofCausalInference.png')
```
\vspace{0.5cm}
::: {.callout-tip}
- Designing or analyzing A/B testing data: Economist/Fehmida/4thewords/etc.
- Generally not recommended if you need to run A/B testing for your dissertation due to higher risks. Analyzing previous A/B testing data is a better choice.
:::
## Week 7 & Week 8: Linear Regression, Endogeneity, and Instrument Variables
- Linear regression on secondary data can **never give causal inference** due to endogeneity problems.
- Endogeneity: (1) omitted variable bias; (2) reverse causality/simultaneity; (3) measurement error.
- An instrument variable can give causal inference, which satisfies (1) exogeneity (2) exclusion restriction (3) relevance (4) observable (implicit).
\vspace{0.5cm}
::: {.callout-tip}
- Marketing Mix Modelling (MMM) is a common application of linear regression in marketing analytics. Many dissertation companies require students to build MMM models.
:::
## Week 9: Regression Discontinuity Design
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 9/RDD.png')
```
- Receiving "Distinction" on students' salaries: 69.9 versus 70
- Review stars on sales on e-Commerce platforms: 4.49 versus 4.5
- Surge pricing on demand: 1.249 versus 1.250
## Week 10: Difference-in-Differences Design
```{r}
#| echo: false
#| fig-align: 'center'
knitr::include_graphics('images/Week 10/DiDGraph.png')
```
- A new policy/regulation (GDPR, lockdown, etc.) and we have a control group which remains unaffected
- If parallel trend is violated, we can use synthetic difference-in-differences method.
# Concluding Remarks
## 10 Weeks Not Enough?
- More learning materials on marketing analytics
- Optional reading materials in each week
- I will keep uploading R tutorials/data analytics tools tutorials on my Youtube channel. **It's never too late to subscribe!**
- I love new challenges so my door is always open even after the module is over. Welcome to talk to me about your dissertation ideas; I'm more than happy to help with your dissertation project.
## What I learned
- Impressed with your perseverance and willingness to learn
- My bestie predicts you would chase me out of the classroom for making you learn Marketing, R, and many new models at the same time
- You've given me a lot of inspiration and motivation to keep innovating, learning and improving (b^_^)b
- It gives me a huge sense of achievement to see that you are able to apply the tools learned in various scenarios!
- It gives me a weird sense of achievement to receive questions for other modules (´・_・`)
## **Thank you for being the BEST Students I can ever dream of!!**
```{r}
library(ggplot2)
t <- seq(0, 2*pi, length.out = 1000)
x <- 16 * sin(t)^3
y <- 13 * cos(t) - 5 * cos(2*t) - 2 * cos(3*t) - cos(4*t)
data_of_love <- data.frame(love_x = x,
love_y = y)
ggplot(data_of_love, aes(x = love_x, y = love_y)) +
geom_point(color = "red", size = 0.5) +
theme_minimal() +
annotate("text", x = 0, y = 0, label = "To My Lovely BA Students", color = "red", size = 5, hjust = 0.5, vjust = 0.5)
```
In 5 years you may forget the lectures but only remember the following
- A module leader who is crazy about bubble teas and makes lousy weekly pre-class videos; he wants to be a good musician, Youtuber, magician, and lecturer
- A lame senior named Tom, who messed up everything because he spent too much time on Python (I forgive you, Tom)
## One Last Thing...
- I owe you one ...
```{r}
#| echo: false
#| fig-align: 'center'
#| out-width: "3cm"
knitr::include_graphics('images/Week 10/HeyTom.png')
```