Skip to content

Latest commit

 

History

History
143 lines (108 loc) · 11.3 KB

README.md

File metadata and controls

143 lines (108 loc) · 11.3 KB

JSC370: Data Science II (Winter 2022), University of Toronto

Where and When

Weekly Course Schedule

Topics/Weekly Activities Due Dates
by 11:59 pm Thursdays
Week 1
January 11 lecture
January 13 lab
Introduction to Data Science tools: R, markdown Lab 1
Week 2
January 17/18 (guest speaker/lecture
January 20 lab
Aaron Sonabend (Google) 1/17 @1pm zoom
Version Control & Reproducible Research, Git
Lab 2, Reflection
Week 3
January 25 guest speaker/lecture
January 27 lab
Stefanie Nickels (Verily)
Exploratory Data Analysis
Lab3, Reflection
Week 4
January 31/February 1 (guest speaker/lecture)
February 3 (lab)
Kathy Evans (NBA)
Data visualization
HW1, Lab4, Reflection
Week 5
February 9 (guest speaker/lecture)
February 10 (lab)
Graduate student panel (U of T)
Data cleaning and wrangling
Lab5, Reflection
Week 6
February 14/15 (guest speaker/lecture)
February 17 (lab)
Paul Varghese (Verily)
Regular Expressions, Big Data, Data scraping, using APIs
HW2, Lab6, Reflection
Week 7
February 22/24
Reading Week
Week 8
March 1 (guest speaker/lecture)
March 3 (lab)
Lisa Strug(U of T)
Text mining
Midterm, Lab8, Reflection
Week 9
March 7/8 (guest speaker/lecture)
March 10 (lab)
Alistair Johnson (Sick Kids)
High performance computing, cloud computing
HW3, Lab9, Reflection
Week 10
March 15 (guest speaker/lecture)
March 17 (lab)
Ellen Stephenson (U of T)
ML (elastic net, xgboost)
Lab10, Reflection
Week 11
March 22 (guest speaker/lecture)
March 24 (lab)
Amy Braverman(NASA)
Interactive visualization and effective data communication I
HW4. Lab10, Reflection
Week 12
March 29 (guest speaker/lecture)
March 31 (lab)
Sofia Ruiz (National University of Rosario) and Yunyi Shen (U Wisconsin-Madison)
Interactive visualization and effective data communication II
Lab12, Reflection
Week 13
April 4/5 (guest speaker/lecture)
April 6 (lab)
Radu Craiu (U of T)
Final Presentations
HW5, Lab 13, Reflection, Final Project

Grading Breakdown

Task % of Grade
Labs 10
Guest speaker reflections 5
Homework (5) 50
Midterm report 10
Final project 25

Website reference:

[1] https://github.com/JSC370/jsc370.github.io

Resources

Markdown

Helpers and Templates

  • RMarkdown Cheatsheet An overview of Markdown and RMarkdown conventions.
  • RStudio Cheatsheets Other quick guides, including a more comprehensive RMarkdown reference and a information about using RStudio's IDE, and some of the main tools in R.

Guides

Tools

  • Apple's Developer Tools Unix toolchain. Install directly with xcode-select --install, or just try to use e.g. git from the terminal and have OS X prompt you to install the tools.
  • Homebrew package manager. A convenient way to install several of the tools here, including Emacs and Pandoc.
  • R. A platform for statistical computing.
  • knitr. Reproducible plain-text documents from within R.
  • Python and SciPy. Python is a general-purpose programming language increasingly used in data manipulation and analysis.
  • RStudio. An IDE for R. The most straightforward way to get into using R and RMarkdown.
  • TeX and LaTeX. A typesetting and document preparation system. You can write files in .tex format directly, but it is more useful to just have it available in the background for other tools to use. The MacTeX Distribution is the one to install for macOS.
  • Pandoc. Converts plain-text documents to and from a wide variety of formats. Can be installed with Homebrew. Be sure to also install pandoc-citeproc for processing citations and bibliographies, and pandoc-crossref for producing cross-references and labels.
  • Git. Version control system. Installs with Apple's Developer Tools, or get the latest version via Homebrew.
  • GNU Make. You tell make what the steps are to create the pieces of a document or program. As you edit and change the various pieces, it automatically figures out which pieces need to be updated and recompiled, and issues the commands to do that. See Karl Broman's Minimal Make for a short introduction. Make will be installed automatically with Apple's developer tools.
  • lintr and flycheck. Tools that nudge you to write neater code.

Other Applications and Services

  • Backblaze. Secure off-site backup.
  • GitHub. Host public Git repositories for free. Pay to host private ones. Also a source for publicly available code (e.g. R packages and utilities) written by other people.
  • Marked 2. Live HTML previewing of Markdown documents. Mac OS X only.
  • Sublime Text. Python-based text editor.
  • Zotero, Mendeley, and Papers are citation managers that incorporate PDF storage, annotation and other features. Zotero is free to use. Mendeley has a premium tier. Papers is a paid application after a trial period. I don't use these tools much, but that's not for any strong principled reason---mostly just intertia. If you use one and want to integrate with the material here, just make sure it can export to BibTeX/BibLaTeX files. Papers, which I've used most recently, can handily output citation keys in pandoc's format amongst several others.

Data

Many of these websites have API to download the data. We recommend you using APIs to get data.

Health and Biological data

Academic Publications and related

Other data

Social Networks