This repository has been archived by the owner on Apr 16, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
07-HumanDraftPangenome.Rmd
126 lines (75 loc) · 3.43 KB
/
07-HumanDraftPangenome.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# Draft Human Pangenome Reference
![https://www.nature.com/articles/s41586-023-05896-x](./Figures/HumanPangenomePaper.png){width=60%}
## Samples
47 phased, diploid genomes (aim for 350)
* 29 lymphoblastoid cell lines \
+ “limiting selection to those lines classified as karyotypically normal and with low passage (to avoid artefacts from cell culture)”
* 18 sequenced by others (some supplemented) \
+ Aimed for “genetic and biogeographic diversity”
![](./Figures/diversity.png){width=70%}
**Population_Code Description Super_Population_Code**
ASW American's of African Ancestry in SW USA AFR\
ACB African Carribean in Barbados AFR\
PUR Puerto Rican from Puerto Rica AMR\
CLM Colombian from Medellian, Colombia AMR\
PEL Peruvian from Lima, Peru AMR\
MSL Mende in Sierra Leone AFR\
GWD Gambian in Western Division AFR\
YRI Yoruba in Ibadan, Nigera AFR\
ESN Esan in Nigera AFR\
MKK Maasai in Kinyawa, Kenya AFR\
PJL Punjabi in Lahore, Pakistan SAS\
CHS Southern Han Chinese EAS\
KHV Kinh in Ho Chi Minh City, Vietnam EAS\
**Super-Populations**
AFR, African\
AMR, Ad Mixed American\
EAS, East Asian\
EUR, European\
SAS, South Asian\
## Strategy
**Sequencing** \
PacBio HiFi \
Oxford Nanopore \
Bionano optical maps \
High-coverage Hi-C \
Illumina short-read sequencing \
High-coverage Illumina sequencing data for both parents
**Assembly**\
Trio-HiFiasm
**Graphs** \
<span style="color: green;">*Minigraph*</span> \
* Fast pangenome graph builder based on the minimap2 aligner \
* Only structural variation >=50nt
<span style="color: green;">*Minigraph-Cactus (MC)*</span> \
* Refines minigraph output to include SNPs and other small variants \
* Rewrote minigraph to write chains of minimizers \
* Rewrote cactus to be able to read in minigraph output
<span style="color: green;">*PanGenome Graph Builder (PGGB)*</span> \
* All pairwise genome assembly alignments -> graph \
* Uses graph normalization to make sure that chromosome paths are linear \
* Allows for cyclic graph structures that capture structural variation.
![](./Figures/graphcompare.png){width=80%}
## Results
**More Genetic Variation Captured**
<span style="color: green;">*The pangenome captures more polymorphic sequences*</span>
* 119 Mb of euchromatic polymorphic sequences
* 90 MB = structural variation
* 1,115 gene duplications
<span style="color: green;">*We can align more (short) reads to the pangenome*</span>
![](./Figures/morevariation2.png){width=80%}
<!-- If we had captured all human diversity in the pangenome, the regression lines wouldn't slope downward; the most diverse samples are still not finding perfect matches for all their reads-->
<span style="color: green;">*We can call more variants more accurately*</span>
Aligning short reads to the pangenome
* lowered error in small variants by 34%\
* increased structural variants calls per haplotype by 104% (“vast majority”)
<span style="color: green;">*We can call variants across a broad set of populations*</span>
<!-- "Small variants" -->
![](./Figures/morevariation.png){width=80%}
<span style="color: green;">*Variation in complex, medically-relevant Regions*</span>
HLA region (helps the immune system distinguish between self and invader)
![HLA genes](./Figures/hla_genes.png){width=80%}
![HLA haplotypes](./Figures/hla_haplotypes.png){width=80%}
Rh region (involved in Rh blood type)
![Rh genes](./Figures/rh_genes.png){width=80%}
![Rh haplotypes](./Figures/rh_haplotypes.png){width=80%}