-
Notifications
You must be signed in to change notification settings - Fork 0
/
summary_crypto.Rmd
343 lines (250 loc) · 9.68 KB
/
summary_crypto.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
---
title: "Crypto Stock Analysis"
output:
html_document:
df_print: paged
theme: paper
toc: yes
toc_float:
collapsed: no
smooth_scroll: yes
---
- BRETON Arthur
- GAMBOA Vinchi
- DORANGE Romain
- HU Clement
- LONGO Giuliano
- NATH VARMA Vitten
# Objectives
**Can we issue a buy/sell recommendation of cryptostock on a 7days holding period?**
![](./images/recommendations.png)
# What is the relevant Data for our model ?
- Price
- Volume
- Correlation
- Social Media
- Google Trends
- Utility Indicator
# Data Preparation
```{r include=FALSE}
rm(list=ls(all=TRUE)) # Remove everything from environment
# To automatically install require packages
if (!require(DBI)) install.packages("DBI")
if (!require(RSQLite)) install.packages("RSQLite")
if (!require(ggplot2)) install.packages("ggplot2")
if (!require(grid)) install.packages("grid")
if (!require(corrplot)) install.packages("corrplot")
if (!require(zoo)) install.packages("zoo")
if (!require(magrittr)) install.packages("magrittr")
# Check if you have universal installer package, install if not
if("pacman" %in% rownames(installed.packages()) == FALSE){
install.packages("pacman")
}
# devtools::install_github("PMassicotte/gtrendsR")
#Check, and if needed install the necessary packages
pacman::p_load("TTR","xts","gtrendsR","caret","ROCR","lift","glmnet","MASS", "partykit", "tidyverse", "scales", "xts", "grid", "gridExtra", "smooth", "Mcomp", "psych", "plyr","ggplot2", "forecast","knitr","kableExtra","rpart","e1071","lubridate", "magrittr", "DBI","corrplot", "zoo","gtable")
# Make sure to use identitcal seed for reproductible results
set.seed(1234)
source("tools.R")
source("scrapper.R")
```
We used an external Python script to scrape the website www.coinmarketcap.com
![](./images/coinmarketcap.png)
Besides technical analysis on the stock, we also wanted to include a trend factor with our data so we looked at Google Trend.
![](./images/googletrends.png)
Our hypothesis was that there is a strong correlation between google searches and stock prices.
### Feature Overview
- Volume
- Momemtum
- Volatility
- Trend
- Buy / Sell classifier
### Downloading Data
After scrapping coinmarketcap website, we had the following data:
```{r echo=FALSE, warning=FALSE}
# Import from sqlite
con <- dbConnect(RSQLite::SQLite(), dbname='database.db') # Database connection
currencies <- dbGetQuery(con, "SELECT * FROM currency") # Import currencies
vals <- dbGetQuery(con, "SELECT * FROM val") # Import values
rm(con) # Close database connection
google.trends = read.csv("gtrends.csv")
google.trends$datetime = as.Date(google.trends$datetime)
currencies
# knitr::kable(currencies, caption = "Currencies", align = 'c') %>%
# kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
# column_spec(1, bold = T, border_right = T, width = "200px") %>%
# row_spec(11:11, bold = T, color = "white", background = "#D7261E")
head(vals, 3)
```
We also managed to download google data using a specific R library.
```{r eval = FALSE}
scrapGTrendsForKeywords(c("BTC","ETH","XRP","EOS","LTC"), "gtrends.csv")
```
```{r echo=FALSE}
head(google.trends)
```
Atht he moment we are unable to sort our currency by market cap.
### Cleanup data
- Remove unused IDs
- Format Dates
- Interpolate missing data
- Locate missing dates, insert row, and interpolate values
```{r include=FALSE}
# Clean and prepare data
vals$id = NULL # Drop database IDs
currencies$id = NULL # Drop database IDs
vals$datetime <- as.Date(vals$datetime) # Format dates
vals <- vals[!duplicated(vals[,6:7]),] # Remove duplicates/one price per day
vals <- interpolate.missing.data(vals) # For missing dates, insert fields and interpolate values (takes some time)
```
### Visualize initial Data
```{r}
plotGTrends(google.trends)
```
# Feature Engineering
We build these categorical variables on different lag periods:
- 7 days
- 14 days
- 21 days
### Create Overall Market statistics
The initial part of the analysis is to be able to create a new dataframe that will contain the total history of the market and useful indicators for technical analysis.
- Total market Cap daily
- Volatility 7d, 30d, 90d
- returns + logreturns
- Volume
```{r echo=FALSE, warning=FALSE}
### Calculate overall market statistics
market <- market.data(vals)
plot.market(market)
# Fetch latest market capitalisation per currency
latestMarketCapPerCurrency = function(x) {
vals[vals$currency_slug==x & vals$datetime==max(vals[vals$currency_slug==x,]$datetime),]$market_cap_usd
}
# Sort the currencies by market value
currencies$mcap = NULL
currencies$mcap <- sapply(currencies$slug, FUN=latestMarketCapPerCurrency)
currencies <- currencies[order(currencies$mcap,currencies$slug, decreasing=TRUE),];
order(currencies$mcap,currencies$slug, decreasing=TRUE)
rownames(currencies) <- 1:nrow(currencies) # Sort
# Calculate returns for all values
vals$return <- Reduce(c,sapply(unique(vals$currency_slug), FUN=function(x) c(0,diff(vals[vals$currency_slug==x,]$price_usd)/(vals[vals$currency_slug==x,]$price_usd)[-length(vals[vals$currency_slug==x,]$price_usd)])))
vals$logreturn <- Reduce(c,sapply(unique(vals$currency_slug), FUN=function(x) c(0,log(vals[vals$currency_slug==x,]$price_usd[-1]/vals[vals$currency_slug==x,]$price_usd[-length(vals[vals$currency_slug==x,]$price_usd)]))))
# Compute betas
currencies$beta <- sapply(currencies$slug, FUN=currency.beta, vals[vals$datetime>as.Date("2016-12-31"),], market)
```
```{r echo=FALSE, message=FALSE, warning=FALSE}
knitr::kable(currencies, caption = "", align = 'c') %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```
### "Scaled" Google Trends
- Google only provide daily data up to 3months (automatically switch to weekly above 3months range)
- Data provided is "scaled" relative to the period (meaning 1-100 on each period instead of absolute value)
```{r}
plotGTrendsIssue(google.trends)
```
![](./images/scrapping_googletrends.png)
### Volume
Represent the average volume for a period:
- volume.7d
- volume.14d
- volume.21d
### Volatility
Standard deviation of the returns for a period:
- volatility.7d
- volatility.14d
- volatility.21d
### Momemtum
Score representing the delta of Upwards VS Downwards returns.
- momentum.7d
- momentum.14d
- momemtum.21d
### Trends
Linear regression of the Trends for a period from which we extract the coeeficient.
- trends.7d
- trends.14d
- trends.21d
### Buy / Sell Classifier
Starting from a specific day, we look back at a period returns to determine if we should have issued a buy or sell. This will allow us to test our model.
- buy.7d
- buy.14d
- buy.21d
## Putting it together
We are able to define the state of the market by mixing different features together. For example, we can determine how bullish/bearish a market is by multiplying our trend coefficient with the momentum.
```{r warning=FALSE}
btcValues = coinDataEngineering("BTC")
ethValues = coinDataEngineering("ETH")
xrpValues = coinDataEngineering("XRP")
ltcValues = coinDataEngineering("LTC")
eosValues = coinDataEngineering("EOS")
```
# Techincal Analysis
```{r echo=FALSE, message=FALSE, warning=FALSE}
corrplot(cor(analysis.return.data(currencies[1:25,]$slug,vals[vals$datetime>as.Date("2016-12-31"),])[,-1],
use = "pairwise.complete.obs"), method="ellipse")
plot.beta.vs.mcap.num(20, currencies)
```
```{r echo=FALSE, warning=FALSE}
slugs = c("bitcoin","ethereum", "ripple", "litecoin", "eos")
plot.currencies(vals, slugs)
plot.beta.timeline(slugs, 30, 90, vals, market)
```
```{r warning=FALSE, include=FALSE}
btcResults = doLogisticReg(btcValues)
ethResults = doLogisticReg(ethValues)
xrpResults = doLogisticReg(xrpValues)
ltcResults = doLogisticReg(ltcValues)
eosResults = doLogisticReg(eosValues)
```
## Logistic Regression
**buy.7 ~ volume.7 + volume.14 + volume.21 + volatility.7 + volatility.14 + volatility.21 + momentum.7 + momentum.14 + momentum.21 + gtrend.7 + gtrend.14 + gtrend.21**
```{r eval = FALSE}
model = buy.7 ~ volume.7 + volume.14 + volume.21 + volatility.7 + volatility.14 + volatility.21 + momentum.7 + momentum.14 + momentum.21 + gtrend.7 + gtrend.14 + gtrend.21
# Train our model first
logistic_reg = glm(model, data=training, family="binomial"(link="logit"))
btcResults = doLogisticReg(btcValues)
ethResults = doLogisticReg(ethValues)
xrpResults = doLogisticReg(xrpValues)
ltcResults = doLogisticReg(ltcValues)
eosResults = doLogisticReg(eosValues)
```
### Bitcoin
```{r echo=FALSE, warning=FALSE}
plotCoinData(btcValues)
plotLogisticReg(btcResults)
```
### Ethereum
```{r echo=FALSE, warning=FALSE}
plotCoinData(btcValues)
plotLogisticReg(btcResults)
```
### Ripple
```{r echo=FALSE, warning=FALSE}
plotCoinData(xrpValues)
plotLogisticReg(xrpResults)
```
### EOS
```{r echo=FALSE, warning=FALSE}
plotCoinData(eosValues)
plotLogisticReg(eosResults)
```
### Litecoin
```{r echo=FALSE, warning=FALSE}
plotCoinData(ltcValues)
plotLogisticReg(ltcResults)
```
# Summary Results
```{r echo=FALSE}
results = list(btcResults, ethResults, xrpResults, ltcResults, eosResults)
df = compareResults(results)
knitr::kable(df, caption = "", align = 'c') %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
column_spec(1, bold = T, border_right = T)
```
# Going Further
We have built a solid base to complete a better analysis in the future. Here is a list of topics we can investigate building on our current status:
+ Use probability to have categorized recommendation (Strong buy, Strong sell, neutral, ...)
+ Portfolio Management / Optimization
- Instead of choosing top 5 by market cap, analysis can be updated daily/hourly on all top currencies
- Portfolio rebalancing
+ Identitfy arbitrage opportunities
+ Dynamic Horizons (already ready for 7,14,21 days)