From 8231e2b4352f8a6c228d13666cc0b519446c596d Mon Sep 17 00:00:00 2001 From: "Jeff L." Date: Thu, 14 Nov 2013 10:00:16 -0500 Subject: [PATCH] Update README.md --- README.md | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 376cb1e7..09854153 100644 --- a/README.md +++ b/README.md @@ -7,4 +7,35 @@ This is a guide for anyone who needs to share data with a statistician. The targ * Students or postdocs in scientific disciplines looking for consulting advice * Junior statistics students whose job it is to collate/clean data sets -The goal of this guide is to ensure the most reproducible and the most +The goals of this guide are to provide some instruction on the best way to share data to avoid the most common pitfalls +and sources of delay in the transition from data collection to data analysis. The Leek group works with a large +number of collaborators and the number one source of variation in the speed to results is the status of the data +when they arrive at the Leek group. Based on my conversations with other statisticians this is true nearly universally. + +My strong feeling is that statisticians should be able to handle the data in whatever state they arrive. It is important +to see the raw data, understand the steps in the processing pipeline, and be able to incorporate hidden sources of +variability in one's data analysis. On the other hand, for many data types, the processing steps are well documented +and standardized. So the work of converting the data from raw form to directly analyzable form can be performed +before calling on a statistician. This can dramatically speed the turnaround time, since the statistician doesn't +have to work through all the pre-processing steps first. + + +What you should deliver to the statistician +==================== + +For maximum speed in the analysis this is the information you should pass to a statistician: + +1. The raw data. +2. A [tidy data set](http://vita.had.co.nz/papers/tidy-data.pdf) +3. An explicit and exact recipe you used to go from 1 -> 2 + +Let's look at each part of the data package you will transfer. + + + +What you should expect from a statistician +==================== + + + +