Data Science Specialization Community Mentor Content Repository

Author: Len Greski

This repository contains content developed during my time as either a student or Community Mentor in the Data Science Specialization from Johns Hopkins University that is offered over Coursera. A number of people have developed content to help students work through the ten courses in the specialization. The main index for this content is datasciencespecialization.github.io.

Repository Contents

As a participant and Community Mentor in courses in the curriculum, there are patterns of similar issues experienced by students. Migrating the content to github will facilitate reposting it to new runs of courses within the curriculum. This will make it easier for students to have access to the experiences from prior students without me having to regularly cut and paste content from past sessions into Discussion Forums, which are the primary mechanism for communication between students and Community Mentors.

File	Description
/markdown	Directory containing markdown files, the primary form of documentation for the content in the repository.
/markdown/images	Directory containing portable network graphics files, which are used to illustrate the narrative content in other documentation.
README.md	File explaining the purpose and contents of the repository, listing of links to specific content by course.

The remainder of this document serves as a directory of the content, aligning individual documents with the course(s) for which the content is relevant.

Course 1: Data Scientist's Toolbox

Course Prerequisites and Difficulty Levels Provides an overview of the Data Science Specialization courses, explaining from a practical perspective the courses a student needs as prerequisites to other courses. While students may take more than one class at a time, it's important to know how information from earlier courses is used in subsequent ones.

The article also ranks the difficulty levels from most to least difficult, based on the author's experience in the curriculum as well as Discussion Forum feedback contributed by other students.
Configuring RStudio to work with git / github - Mac OSX
Configuring RStudio to work with git / github - Windows 7, 8, and 10
Using Editor Modes in Discussion Forum Posts
Buying a Computer for Data Science
R and RStudio on Chromebook

Issue: Students Struggle to find URLs in Lecture Slides

If you're interested in the URLs for the lecture slides, they are available in the Data Science Specialization Courses github repository. Each course is stored in a subdirectory within the repository, and the slides are built in R Markdown language, a technique you'll learn in Developing Data Products.

Course 2: R Programming

START HERE

If you're new to the course and trying to figure out what to do in what order, start with these articles.

Resources for R Programming Provides a summary of student-generated content to support the course, some of which is indexed on the Data Science Specialization's github.io site
References for R Programming Provides a list of references for R programming, ranging from beginning to advanced topics.
Data Science Specialization: what is the value? Addresses a common question raised by students in R Programming who are frustrated by the amount of work they have to do on their own to complete quizzes and assignments.
R versus Python Roundup of articles and surveys comparing R and Python, including usage, history, and pros / cons.

The next set of articles includes general commentary about the course, R programming in general, and R in relationship to other statistics packages.

Commercial Statistics Packages: An Historical Perspective
Configuring RStudio to work with git / github - Mac OSX
A Data Frame is Also a List
Forms of the Assignment Operator
Forms of the Extract Operator
S Objects, R Objects, and Lexical Scoping
Thinking in R versus Thinking in SAS
Strategy for the Programming Assignments
Why is R More Difficult than SAS?
R Onboarding for SAS Users
References for R Programming Provides a list of references for R programming, ranging from beginning to advanced topics.
Object Oriented Programming and R Explains how object oriented programming concepts are implemented in R, in response to a student question about accessing content output by the R linear models function, lm().
Scoping in C/C++ vs. R Compares variable scoping in R versus C/C++.

Posts regarding specifics of programming assignments

Miscellaneous Code Examples and Instructions

Permanently Setting R Working Directory Link to R-bloggers.com article that explains how to set your working directory permanently in R (instead of RStudio)
Tutorial: Downloading Files
Creative Use of R: Downloading Course Lectures Article illustrating how to use R to automate the download of lectures from Data Science Specialization courses, such as R Programming. Techniques used in this article are helpful to make research reproducible, as required for courses like Getting and Cleaning Data and Reproducible Research.
How to Upgrade R without Losing Your Packages article by Kris Eberwein on datascienceriot.com that includes code to save your list of packages to an rds file, and reinstall any packages that don't make it through the upgrade process.
Common R Mistakes: Overwriting R Functions with Output Variables

Interesting R News and Blog Articles

R vs. Python: 2016 Survey of Software used for Data Science Overview of results from a 2016 KDNuggets Software Poll, written by Gregory Piatetsky. The follow up article with expanded analysis is What Big Data, Data Science, Deep Learning software goes together, also on kdnuggets.com.
Scaling R for Data Science August 2016 article by Federico Castanedo explaining three ways to scale R.
Lexical Scoping and Statistical Computing Article by Robert Gentleman and Ross Ihaka at the University of Auckland describing how lexical scoping works, and why it is valuable in statistical computing.

Course 3: Getting and Cleaning Data

Real World Example: Reading American Community Survey data Illustrates concepts covered in Getting and Cleaning Data with U.S. Census data, including how to process a hierarchical file format in R, as well as using an electronic codebook to generate the parameters required to read the data file into a data frame.
Common Problems: Quiz 1 - Missing Java Runtime Explains how to solve the problem of a missing Java Runtime for the question that requires students to process a Microsoft Excel spreadsheet.
Strategy for Reading Files & APIs / Quiz 2
Common Problems: Quiz 2 - sqldf() driver fails to connect

Course 5: Reproducible Research

Assignment 2 Checklist

Course 6: Statistical Inference

Reference Materials for Statistical Inference Start here if you're looking for help on the statistical techniques taught in this course.
Using MathJax with Discussion Forums, R Markdown, and Github Pages
Power Calculations: Optimal Sample size
Permutation Tests Explained

Articles Related to the Course Project

Exponential Distribution / Central Limit Theorem - Assignment Checklist
ToothGrowth Analysis - Assignment Checklist
Exploratory Data Analysis in ToothGrowth Assignment, explaining the exploratory data analysis requirement for students who have not taken the Exploratory Data Analysis course prior to taking Statistical Inference.
Accessing R Code from an Appendix in Knitr
Theoretical Variance of Sampling Distribution of the Mean
Kable Tables with Data Frames illustrates how to display a custom table in a knitr() document by creating a data frame to contain the information to be rendered with kable().
Installing MiKTeX on Windows 10 / Generating a PDF with knitr
Commentary on Factorial Design in Toothgrowth Analysis Illustrates how to conduct a full factorial analysis of variance with the toothgrowth data, comparing it to the techniques used in the course project for Statistical Inference.

Course 7: Regression Models

Course 8: Practical Machine Learning

Course 9: Developing Data Products

Configuring shinyapps.io Application Timeout A walkthrough on how to configure a Shiny application so it doesn't waste the free monthly server processing time.

Course 10: Capstone

Speech and Language Processing, 3rd Edition Working version of Jurafsky, et. al. book on natural language processing whose content on n-grams is helpful for the capstone.
n-gram Computations and Computer Capacity Explains the amount of memory required to convert the text files for the course project into n-grams, using the quanteda package.

Content for Community Mentors

Tips for New Community Mentors A list of tips for new mentors supporting the Data Science Specialization, ranging from when to direct students to paid / professional resources such as the Coursera Learner Help Center, to how to optimize the value of content that is posted by mentors.

Name		Name	Last commit message	Last commit date
Latest commit History 372 Commits
markdown		markdown
.gitignore		.gitignore
Global Environment.mm		Global Environment.mm
LICENSE		LICENSE
README.md		README.md
README2.md		README2.md
pml-exampleSonarRandomForest.R		pml-exampleSonarRandomForest.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Specialization Community Mentor Content Repository

Repository Contents

Course 1: Data Scientist's Toolbox

Issue: Students Struggle to find URLs in Lecture Slides

Course 2: R Programming

START HERE

Posts regarding specifics of programming assignments

Miscellaneous Code Examples and Instructions

Interesting R News and Blog Articles

Course 3: Getting and Cleaning Data

Course 5: Reproducible Research

Course 6: Statistical Inference

Articles Related to the Course Project

Course 7: Regression Models

Course 8: Practical Machine Learning

Course 9: Developing Data Products

Course 10: Capstone

Content for Community Mentors

About

Releases

Packages

Languages

License

zhangbei123/datasciencectacontent

Folders and files

Latest commit

History

Repository files navigation

Data Science Specialization Community Mentor Content Repository

Repository Contents

Course 1: Data Scientist's Toolbox

Issue: Students Struggle to find URLs in Lecture Slides

Course 2: R Programming

START HERE

Posts regarding specifics of programming assignments

Miscellaneous Code Examples and Instructions

Interesting R News and Blog Articles

Course 3: Getting and Cleaning Data

Course 5: Reproducible Research

Course 6: Statistical Inference

Articles Related to the Course Project

Course 7: Regression Models

Course 8: Practical Machine Learning

Course 9: Developing Data Products

Course 10: Capstone

Content for Community Mentors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages