forked from DataSlingers/clustRviz
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
132 lines (97 loc) · 4.53 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
[![TravisCI Build Status](https://travis-ci.com/DataSlingers/clustRviz.svg?branch=develop)](https://travis-ci.com/DataSlingers/clustRviz) [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/DataSlingers/clustRviz?branch=develop&svg=true)](https://ci.appveyor.com/project/michaelweylandt/clustRviz) [![codecov Coverage
Status](https://codecov.io/gh/DataSlingers/clustRviz/branch/develop/graph/badge.svg)](https://codecov.io/gh/DataSlingers/clustRviz/branch/develop)
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/clustRviz)](https://cran.r-project.org/package=clustRviz)
[![Project Status: Active – The project has reached a stable, usable
state and is being actively
developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
# clustRviz
`clustRviz` aims to enable fast computation and easy visualization of Convex Clustering
solution paths.
## Installation
You can install `clustRviz` from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("DataSlingers/clustRviz")
```
Note that `RcppEigen` (which `clustRviz` internally) triggers many compiler warnings
(which cannot be suppressed per
[CRAN policies](http://cran.r-project.org/web/packages/policies.html#Source-packages)).
Many of these warnings can be locally suppressed by adding the line `CXX11FLAGS+=-Wno-ignored-attributes`
to your `~/.R/Makevars` file.
`clustRviz` also depends on the development version of the `gganimate` package,
which is not yet on CRAN, but may be found [here](https://github.com/thomasp85/gganimate).
## Quick-Start
There are two main entry points to the `clustRviz` package, the `CARP` and `CBASS`
functions, which perform convex clustering and convex biclustering respectively.
We demonstrate the use of these two functions on a text minining data set,
`presidential_speech`, which measures how often the 44 U.S. presidents used certain
words in their public addresses.
```{r load_data}
library(clustRviz)
data(presidential_speech)
presidential_speech[1:6, 1:6]
```
### Clustering
We begin by clustering this data set, grouping the rows (presidents) into clusters:
```{r carp_example}
carp_fit <- CARP(presidential_speech)
print(carp_fit)
```
The algorithmic regularization technique employed by `CARP` makes computation of
the whole solution path almost immediate.
We can examine the result of `CARP` graphically. We begin with a standard dendrogram,
with three clusters highlighted:
```{r carp_dendro}
plot(carp_fit, type = "dendrogram", k = 3)
```
Examing the dendrogram, we see two clear clusters, consisting of pre-WWII and post-WWII
presidents and Warren G. Harding as a possible outlier. Harding is generally considered
one of the worst US presidents of all time, so this is perhaps not too surprising.
A more interesting visualization is the dynamic path visualization, whereby we can
watch the clusters fuse as the regularization level is increased:
```{r carp_dynamic, eval = FALSE}
plot(carp_fit, type = "dynamic_path")
```
```{r carp_dynamic_impl, echo = FALSE, results = "hide", message=FALSE}
if (!dir.exists("inst")) {
dir.create("inst")
}
saveviz(carp_fit, file.name = "man/figures/path_dyn.gif", type = "path", dynamic = TRUE)
```
<img src="./man/figures/path_dyn.gif" width = "70%">
`clustRviz` also provides interactive dendrograms using the
[`heatmaply`](https://cran.r-project.org/package=heatmaply) package:
```{r carp_js}
plot(carp_fit, type = "js")
```
### BiClustering
The use of `CBASS` for convex biclustering is similar, and we demonstrate it here
with a cluster heatmap, with the regularization set to give 3 observation clusters:
```{r cbass}
cbass_fit <- CBASS(presidential_speech)
plot(cbass_fit, k.row = 3)
```
Setting `type = "js"` gives the traditional cluster heatmap:
```{r cbass_js}
plot(cbass_fit, type = "js", k.row = 3)
```
By default, if a regularization level is specified, all plotting functions in `clustRviz`
will plot the clustered data. If the regularization level is not specified, the
raw data will be plotted instead:
```{r cbass_js2}
plot(cbass_fit, type = "js")
```
More details about the use and mathematical formulation of `CARP` and `CBASS`
may be found in the package documentation.