From 481c2c39d614470c34e52bf86a0e6d130ce4007d Mon Sep 17 00:00:00 2001 From: Anton Nekrutenko Date: Tue, 23 Apr 2024 10:23:27 -0400 Subject: [PATCH] Update project.md --- 2024/project.md | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/2024/project.md b/2024/project.md index 3f1a9a4..959bf55 100644 --- a/2024/project.md +++ b/2024/project.md @@ -14,7 +14,7 @@ I have subdivided this class into the following groups: These numbers correspond to colony labels in Fig S3 of the [supplement](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5534434/bin/NIHMS874162-supplement-Supplemental_Methods_And_Figures.pdf). -## Task 1 +## Task 1: Prep (due March 26, 2024 in class) 1. Go to NCBI SRA @@ -34,3 +34,39 @@ Here: **Start point** - point with the number that was assigned to your group (e.g., Point 4 for Manifold4) Now all numbers were sequenced. So you objective is to find adaptation trajectory which has the data in SRA. + +## Task 2: Analysis +(due May 1st, 2024 by email) + +### Assumptions + +1. You have a final varinat dataset for your samples (see below for an example) +2. You have created a mapping between your samples and accession numbers as was [described here](https://github.com/nekrut/BMMB554/blob/master/2024/assessimg_variants.md#establish-the-relationship-between-samples-and-accessions) and downloaded it as a .csv file named `names.csv`. + +Example of variant dataset: + +``` +Sample CHROM POS REF ALT AF DP DP4 EFF[*].GENE EFF[*].CODON EFF[*].FUNCLASS +SRR3722117 CP009273 360103 C T 0.075949 79 30,43,0,6 . . NONE +SRR3722117 CP009273 870516 G T 0.157895 19 10,6,2,1 yliE ctG/ctT SILENT +SRR3722117 CP009273 1330682 G T 0.363636 11 4,3,2,2 acnA Ggt/Tgt MISSENSE +SRR3722117 CP009273 1631797 C A 0.25 16 4,8,2,2 ydfJ . NONE +``` +### What do do + +1. Create a copy of this nodebook -> https://colab.research.google.com/drive/1hnoNGQx7MEORWv7KcQAMzpFIsRxvdYVM?usp=sharing +2. Upload you `names.csv` file into notebook disk +3. Run your samples through the notebook + +#### Group 1 (Sym) + +1. Identify which FIXED mutations are shared within clusters. +2. Plot a comparison of these mutations across clusters + +#### Gloups 2 - 4 (Manifold) + +1. Identify all fixed mutations that are present in the terminal points but are absent at the start +2. Trace their trajectroies through the time points from beginning to end +3. Plot the change in frequencies + +