Synopsis: This is a COINSTAC computation which approximates a linear regression analysis run on data from multiple sites using multiple rounds of communication.
Analytical Description:
Suppose there are
This computation uses an interactive protocol for computing the least-squares regression coefficients corresponding to a single aggregated data set:
$$\beta = \mathop{\mathrm{argmin}}{\beta} \sum{s=1}^{S} \sum_{j=1}^{n_s} (y^{(s)}_j - \beta^{\top} x^{(s)}_j )^2$$
This computation uses an interactive protocol to do this by emulating gradient descent (GD) at the aggregator. The aggregator sends an initialized
Each site
The aggregator sends
In a linear regression model, we are given covariate-response pairs
using least squares. To simplify the problem, we append a 1 to the covariate vector and define $$\begin{bmatrix} x_j(1) \ x_j(2) \ \vdots \ x_j(d) \end{bmatrix} = \begin{bmatrix} v_j(1) \ v_j(2) \ \vdots \ v_j(d-1) \ 1 \end{bmatrix}$$
so that the model is
where
The local script at site
- Reads a local data set
${ (x^{(s)}_j, y^{(s)}_j) : j = 1, 2, \ldots, n_s}$ from disk.
At each iteration
- Receives
$\beta_t$ from the aggregator. - Computes
$g_s = \nabla F_s(\beta_t)$ . - Sends
$g_s$ to the aggregator. - Deletes the previous iterate
$\beta_t$ .
The aggregator does the following:
- Receives
${ g_{s,t} : s = 1, 2, \ldots, S}$ from the$S$ sites. - Updates the coefficients
$\beta_{t+1} = \beta_{t} - \eta_t \sum_{s=1}^{S} g_{s,t}$ . - Sends
$\beta_{t+1}$ to all sites. - Deletes the messages
${ g_{s,t} : s =1, 2, \ldots, S}$ .
Let
What data must sites provide?
- Each site
$s$ needs to provide access to their covariate-response pairs${ (x^{(s)}_j, y^{(s)}_j) : j = 1, 2, \ldots, n_s}$ .
What is shared from the sites to the aggregator?
- The site ID
- The local gradients
${ g_{s,t} : t = 1, 2, \ldots, T}$ from each iteration.
What intermediate resultes are stored locally at the sites?
- The sites receive the sequence of coefficient updates
${ \beta_t : t = 1, 2, \ldots, T}$ .
What intermediate results are stored at the aggregator?
- The aggregator deletes
${ g_{s,t} : s = 1, 2, \ldots S}$ after computing$\beta_{t+1}$ at each iteration.
what is the output from the computation?
This computation produces a single output file:
- format: CSV
-
content:
-
$d$ -dimensional vector$\beta_{T+1}$ - ...
-