Fall 2022
BIOL 792 - 1028
Prof: Dr. Thomas Parchman; SFB 209; [email protected]
Class: Tuesday/Thursday 6:00 – 7:10 pm FH 106
Office Hours: By appointment
Modern Biology is increasingly shaped by data sets that are orders of magnitude larger than most biologists are trained to work with. This is especially the case for the fields of genetics and genomics where major advances in DNA sequencing technology have cut the time and cost investment of DNA sequencing more than 100,000 fold. Spreadsheet software and graphical user interface statistical analysis packages (e.g., Excel, Statistica, JMP) are useless for this scale of data for any scientific discipline. This course will introduce students to basic computational tools (focusing on Unix and Python) to enhance data competency for modern biology. The course will involve hands on approaches and the manipulation and analysis of data collected for graduate student research projects as well as examples from genomic data sets generated in my lab. The course should provide a starting point for students to gain further expertise in simple programming and efficient manipulation and analysis of large-scale data. The tools to be learned in this class are not unique to genomics, and will be of value to research in any scientific field. This course serves as a prerequisite for Data Science for Biology II, taught by Dr. Julie Allen during the spring. Prerequisites: must be enrolled as M.S. or Ph.D. student.
By the end of the semester, students should have learned enough to start working more confidently with Unix and Python. We will emphasize tools often used in genomics and bioinformatics, but the applications learned will apply generally to data science. This course should also prepare students for the more advanced sister course, Data Science for Biology II, taught during spring semesters.
-
Students will be able to operate in a Unix computing environment, and will understand the basic use of Unix computing clusters for research.
-
Students will be able to write basic programs in Python in order to efficiently manipulate and work with large scale data.
-
Students will be able to use basic Unix and Python to manipulate large genomic data sets, and to conduct basic analyses of genome level DNA sequencing data.
-
Practical computing for biologists Haddock, S.H.D. and Dunn, C.W., 2011. Sunderland, MA, USA: Sinauer Associates. The book provides an excellent guide to the much of content of the course, is filled with excellent examples and problems, and will also be utilized in Data Science for Biology II during spring semesters.
-
Computer with Unix/Linux operating system Students with Mac computers already have machines running Unix and are ready to go. Same goes for students running Linux (Ubuntu, Centos, etc.). Students using computers running a windows OS or will need to figure out how to install Linux or a Linux emulator on their machine.
-
Supplemental primers, readings and assignments will be announced during the class and provided on the course github page.
We will meet twice a week (Tues/Thurs 6:00-7:15) but I typically reserve two hours of time during each window to allow discusion/troubleshooting on coding related to assignments and independent projects. At the beginning of each class, I will introduce new concepts and material that will form the basis of the exercises, assignments, or projects we will work through that week. We will cover questions regarding previous material, and then you will spend at least half of each class working independently, or in small groups, on writing code. All students should come to class having thoroughly read the assigned material and prepared to try new coding exercises. We plan to hold class in person this semester, but will have contingency plans in place for covid19 scenarios that will involve using zoom for students who need to miss class for covid19 related illness. In other words, if you have been exposed and might be coming down with symptoms of covid19 or have tested positive, there will be no cost to participating remotely, rather than in person.
All readings, primers, problem set instructions, datasets, as well as ample supplemental materials will be available on the course github page. This will include primers for Unix and Python content, and additional resources for learning more about Python, Unix, and genomic workflows, and information and data sets related to assignments.
Your grade in this course will be based on the following:
-
Weekly assignments (50%) Assignments will involve working in the Unix environment, writing simple Bash and Python scripts, and working with a variety of large data sets that will be provided over the course of the semester. Assignments will be evaluated based on completion and effort. You can work in teams of 2 or 3 but will turn in your own Python.py or Bash.sh scripts for each assignment. Code should be annotated, step by step, to explain what you did to complete the task. More guidelines on these files and each specific assignment will be available on the course website. Assignments will be due before class on Tuesdays unless otherwise specified.
-
Participation (30%) This is a graduate course, with full attendance and participation expected. Participation entails showing up for class prepared and doing your best to work through assigned tasks and programming example problems. Some of the material we cover might be easy and quick to figure out. Other material and tasks will present roadblocks that will be difficult to figure out. No questions will be stupid questions.
-
Independent project (20%) Everyone will be responsible for an independent project which can be organized either individually, or as a group. This will involve identifying a task or problem in your research (or the research of the group you work within) that either requires, or can be made much more efficient, using Python or Unix scripting. Each group (or individual) will need to turn in a one to two page description of the task and how they will solve it by week 5. By week 12, each group will need to turn in final scripts and a one to three page description of how the problem was solved, and how the code works. In addition, each group will give a short (5-10 minute) presentation that describes the data, the problem, and how their scripting tools work. For those without data or ideas, I can supply some options.
Grading scale as follows:
Percentage | Grade |
---|---|
90-100% | A |
80-89% | B |
70-79% | C |
60-69% | D |
Last day to drop a class and receive a full refund: Sep. 9, 2022
Final day to withdrawal from classes (W, no refund): Nov. 2, 2022
A student may request an "I" if he/she has made satisfactory progress in the majority of the work in the course, but for unavoidable absences or other conditions beyond his/her control, is unable to complete the course. Non-attendance, poor performance or requests to repeat the course are unacceptable reasons for issuance of the "I" mark.
Academic dishonesty (cheating, plagiarism or other dishonest behavior related to grades and performance) will not be tolerated under any circumstances.
Surreptitious or covert video-taping of class or unauthorized audio recording of class is prohibited by law and by Board of Regents policy. This class may be videotaped or audio recorded only with the written permission of the instructor. In order to accommodate students with disabilities, some students may have been given permission to record class lectures and discussions. Therefore, students should understand that their comments during class may be recorded.
Qualified, self-identified students with documented physical and learning disabilities have the right to accommodations to ensure equal access to educational opportunities. For assistance, contact the Disability Resource Center (DRC) at 784-6000 to determine eligibility and appropriate accommodations.
Statement on COVID-19, COVID-19 Like Symptoms, and Contact with Someone Testing Positive for COVID-19
Students testing positive for COVID 19, exhibiting COVID 19 symptoms regardless of vaccination status will not be allowed to attend in-person instructional activities and must leave the venue immediately. Students should contact the Student Health Center or their health care provider to receive care and who can provide the latest direction on quarantine and self-isolation. Contact your instructor immediately to make instructional and learning arrangements.
For students who are quarantining or self-isolating due to 1) COVID 19 infection or 2) exposure while not vaccinated, instructors must provide opportunities to make-up missed course work, including assignments, quizzes or exams. In courses with mandatory attendance policies, instructors must not penalize students for missing classes while quarantined.
*Tentative Course Schedule. All contents are subject to change.
Week | Date | Class | Due |
---|---|---|---|
Week 1 | Aug. 30, Sep. 1 | Course introduction, Unix I | |
Week 2 | Sep. 6, Sep. 8 | Unix II | |
Week 3 | Sep. 13, 15 | Unix III | Homework 1 |
Week 4 | Sep. 20, 22 | Unix IV | Homework 2 |
Week 5 | Sep. 27, 29 | Python I | Homework 3 |
Week 6 | Oct. 4, 6 | Python II | Homework 4 |
Week 7 | Oct. 11, 13 | Python III | Homework 5; *1-2 page project description |
Week 8 | Oct. 18, 20 | Python IV | Homework 6 |
Week 9 | Oct. 25, 27 | Python V, Data science careers | Homework 7 |
Week 10 | Nov. 1, 3 | Python VI | Homework 8 |
Week 11 | Nov. 8, 10 | Python VII (pandas/numpy) | Homework 9 |
Week 12 | Nov. 15, 17 | Jupyter, Rmarkdown | nothing |
Week 13 | Nov. 22 | Python VIII | nothing |
Week 14 | Nov. 29, Dec. 1 | HPC/Pronghorn/Project prep | |
Week 15 | Dec. 6, 8 | Project prep/presentation | projects due |
Week 16 | Dec. 13 | Present Projects | *projects due |