Skip to content

Commit

Permalink
Merge pull request #1 from Hannah-Doerpholz/master
Browse files Browse the repository at this point in the history
Fixing typos, missing words, harmonizing some descriptions
  • Loading branch information
ataulhaleem authored Sep 11, 2024
2 parents 63f8134 + c0821ca commit 024acf1
Show file tree
Hide file tree
Showing 8 changed files with 52 additions and 52 deletions.
10 changes: 5 additions & 5 deletions pages/datasets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ import PhenotypeData from '../components/phenotypeData'
## Datasets Overview

Welcome to the datasets page. Here you will find detailed information about the various types of data available, including phenotype data, genotype data, RNAseq data, metabolomics data, and sequencing data.
The datasets are mainly collected as part of untwist project. The hub also enables exploring publickly available datasets curated from open source publications.
The datasets are mainly collected as part of the untwist project. The hub also enables exploring publicly available datasets curated from open source publications.


<Tabs items={['Phenotype', 'Genotype', 'Sequencing Data', 'Transcriptomics', 'Metabolomics', 'Lipidomics']}>
<Tabs items={['Phenotype', 'Genotype', 'Sequencing', 'Transcriptomics', 'Metabolomics', 'Lipidomics']}>



Expand Down Expand Up @@ -64,11 +64,11 @@ The datasets are mainly collected as part of untwist project. The hub also enabl
| Untwist | Illumina | 54 | Fastq | -- | -- |
| | Nanopore | 5 | Fastq | -- | -- |
| | PacBio | -- | -- | -- | -- |
| | HiC | -- | -- | -- | -- |
| | Hi-C | -- | -- | -- | -- |
| Public | Illumina | -- | -- | -- | -- |
| | Nanopore | -- | -- | -- | -- |
| | PacBio | -- | -- | -- | -- |
| | HiC | -- | -- | -- | -- |
| | Hi-C | -- | -- | -- | -- |

</Tabs.Tab>

Expand Down Expand Up @@ -137,4 +137,4 @@ The datasets are mainly collected as part of untwist project. The hub also enabl
</Tabs>


For specific details on individual dataset or accessing and using these datasets, please refer to the [Downloads](https://www.camelina-hub.org/router?component=downloads) page on camelina-hub or contact our support team at [ib4@fz-juelich.com]([email protected]).
For specific details on individual datasets or accessing and using these datasets, please refer to the [Downloads](https://www.camelina-hub.org/router?component=downloads) page on camelina-hub or contact our support team at [ibg4@fz-juelich.com]([email protected]).
4 changes: 2 additions & 2 deletions pages/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ import { Bleed } from 'nextra-theme-docs'
# Introduction

The Camelina Knowledge Hub is a comprehensive web-based platform designed to empower scientists, farmers, and the general public with advanced tools and resources for Camelina research, farming, and exploration.
This documentation provides an abstract knowledge of the key features which include
This documentation provides an abstract knowledge of the key features which include:

- the ability to perform genome-wide association studies (GWAS)
- identify significant genetic variations
- annotate genomic targets
- visualization of genetic diversity (PCA and MDS)
- visualization of phenotypic diversity

Please refer to Functional modules on the left side column of this documentation for detailed exploration.
Please refer to Functional Modules on the left side column of this documentation for detailed exploration.

<div style={{ marginTop: '2em', marginBottom: '2em', textAlign: 'left' }}>
#### Quick overview of the core functionality
Expand Down
27 changes: 13 additions & 14 deletions pages/modules/GWAS/Analysis.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ import Image from '../../../components/Image'
WebAssembly is a web technology that allows compiling code written in languages like C++ to run efficiently within web browsers.
This enables complex computations like GWAS to be performed directly in the browser without requiring users to download and install software.
This WebAssembly module leverages functionalities from PLINK 1.07, a widely used open-source software suite for whole-genome association studies.
PLINK provides tools for data management, quality control, and various GWAS analysis methods. For further details, please refer to [Plink documentation](https://zzz.bwh.harvard.edu/plink/)
PLINK provides tools for data management, quality control, and various GWAS analysis methods. For further details, please refer to the [Plink documentation](https://zzz.bwh.harvard.edu/plink/).

#### GWAS analysis without correction for population structure:

This approach utilizes the `--assoc` option in PLINK, which likely performs a simple chi-square test for each SNP (single nucleotide polymorphism) to assess its association with the chosen camelina phenotype (trait) data.
This approach utilizes the `--assoc` option in PLINK, which performs a simple chi-square test for each SNP (single nucleotide polymorphism) to assess its association with the chosen camelina phenotype (trait) data.
No explicit correction for population stratification is applied in this method. While this can be faster, it is susceptible to identifying false positives due to ancestry differences.

To perform the GWAS analysis without correction for population structure, the WebAssembly module runs the following command in the background of your browser
To perform the GWAS analysis without correction for population structure, the WebAssembly module runs the following command in the background of your browser:

```bash
plink \
Expand All @@ -32,7 +32,7 @@ Here is an explanation of each flag used in the command:
| Flag | value | Environment |
| :--------: | :------------: | :------------ |
| --bfile | plink | This flag specifies the base name of the binary fileset. In PLINK, a binary fileset typically consists of three files: .bed (binary genotype file), .bim (binary SNP information file), and .fam (family information file). By providing the base name plink, the module knows to look for plink.bed, plink.bim, and plink.fam|
| --assoc | -- | This flag tells PLINK to perform a basic case/control association test, which is a chi-square test for each SNP. This test examines whether allele frequencies at each SNP differ significantly between cases (individuals with the phenotype) and controls (individuals without the phenotype). If the phenotype is quantitative PLINK will automatically treat the analysis as a quantitative trait analysis and apply regression model.|
| --assoc | -- | This flag tells PLINK to perform a basic case/control association test, which is a chi-square test for each SNP. This test examines whether allele frequencies at each SNP differ significantly between cases (individuals with the phenotype) and controls (individuals without the phenotype). If the phenotype is quantitative, PLINK will automatically treat the analysis as a quantitative trait analysis and apply a regression model.|
| --allow-no-sex | -- | This flag allows PLINK to proceed with the analysis even if some individuals have unknown sex information. In genetic studies, sex is often a critical covariate, but for some datasets or specific analyses, it may be permissible to ignore this information. |


Expand All @@ -43,7 +43,7 @@ Here is an explanation of each flag used in the command:
Therefore, any significant associations found may be influenced by underlying population structure.
This means that some associations could be false positives, resulting from differences in ancestry rather than a true genetic association with the trait.

<b> Data Quality: </b> The is preprocessed for minor allele frequency (>= 0.05), Missingness per SNP ( < 0.1), quality score at SNP site ( >= 20) and a min depth ( >= 3).
<b> Data Quality: </b> The data quality is preprocessed for minor allele frequency (>= 0.05), missingness per SNP ( < 0.1), quality score at SNP site ( >= 20), and a min depth ( >= 3).

<b> Interpretation of Results: </b> Always be cautious when interpreting GWAS results without population structure correction.
It is recommended to validate significant findings using independent datasets or additional methods that account for population stratification.
Expand All @@ -58,7 +58,7 @@ Here is an explanation of each flag used in the command:
This method is more robust and helps to reduce spurious associations arising from population stratification.


To perform the GWAS analysis with correction for population structure, the WebAssembly module runs the following command in the background of your browser
To perform the GWAS analysis with correction for population structure, the WebAssembly module runs the following command in the background of your browser:


```bash
Expand All @@ -79,7 +79,7 @@ Here is an explanation of each flag used in the command:
|--bfile | plink | This flag specifies the base name of the binary fileset. In PLINK, a binary fileset typically consists of three files: .bed (binary genotype file), .bim (binary SNP information file), and .fam (family information file). By providing the base name plink, the module knows to look for plink.bed, plink.bim, and plink.fam.|
|--linear | -- | This flag tells PLINK to perform a linear regression analysis, which models the relationship between each SNP and the phenotype while adjusting for covariates. This approach helps in controlling for confounding variables.|
|--covar | plink.cov | This flag specifies the file containing covariates to be included in the regression model. In this case, plink.cov is the file that contains the first two principal components (PCs) derived from a separate analysis.|
|--covar-name | COV1,COV2 | This flag specifies the names of the covariates in the plink.cov file that should be included in the analysis. Here, COV1 and COV2 are the first two principal components used to correct for population structure.|
|--covar-name | COV1,COV2 | This flag specifies the names of the covariates in the plink.cov file which should be included in the analysis. Here, COV1 and COV2 are the first two principal components used to correct for population structure.|
|--allow-no-sex | -- | This flag allows PLINK to proceed with the analysis even if some individuals have unknown sex information. In genetic studies, sex is often a critical covariate, but for some datasets or specific analyses, it may be permissible to ignore this information.|
|--standard-beta | -- | This flag outputs standardized regression coefficients, which can be useful for comparing the effects of different SNPs on the phenotype.|
|--hide-covar | -- | This flag suppresses the output of covariate effects in the results, focusing the output on the SNP associations.|
Expand All @@ -92,8 +92,7 @@ This helps reduce false positives due to underlying population structure, leadin

<b> Data Quality: </b> It follows the same quality measures for the data as outlined above.

<b> Interpretation of Results: </b>
Always interpret GWAS results cautiously.
<b> Interpretation of Results: </b> Always interpret GWAS results cautiously.
Even with population structure correction, it is recommended to validate significant findings using independent datasets or additional methods.


Expand All @@ -107,7 +106,7 @@ Even with population structure correction, it is recommended to validate signifi

#### How to perform GWAS in your browser:

- Choose either of the with or without correction option
- Choose either of the GWAS options with or without correction option
- Press run


Expand All @@ -121,7 +120,7 @@ import { Steps } from 'nextra/components'
### Select a trait
<Image src="/GWAS_2_select_a_trait.png" alt="GWAS_2_select_a_trait"/>

### Choose one of the with or without correction option
### Choose one of the GWAS options with or without correction option
<Image src="/GWAS_3_SelectCorrection.png" alt="GWAS_3_SelectCorrection"/>

### Click run
Expand All @@ -135,11 +134,11 @@ import { Steps } from 'nextra/components'
### Visualizations/Results
After the completion of GWAS analysis, the following plots are created based the selected phenotype.

- [x] Manhattan Plot
- [x] Manhattan plot
- [x] Qunatile quantile plot
- [x] Functional gene Annotation
- [x] Functional gene annotation
- [x] Multiomics data analysis with genome browser

Click [Results](/modules/GWAS/Results) to read more on GWAS results.

</Steps>
</Steps>
26 changes: 13 additions & 13 deletions pages/modules/GWAS/Results.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ import Image from '../../../components/Image'
A Manhattan plot is a type of scatter plot used in genomics to display data, typically the results of a genome-wide association study (GWAS).
Each point on the plot represents a single nucleotide polymorphism (SNP). The x-axis corresponds to the position of the SNPs along the genome,
while the y-axis represents the negative logarithm of the p-value for the association between the SNP and a trait.
After conducting GWAS analysis in Camelina Hub a Manhattan plot is autogenerated, Below is a description of how it is calculated and created.
After conducting GWAS analysis in Camelina Hub a Manhattan plot is autogenerated. Below is a description of how it is calculated and created.

1. Plink WebAssembly module performs the GWAS analysis and returns unadjusted pvalue.
1. The Plink WebAssembly module performs the GWAS analysis and returns unadjusted p-values.
2. P-values are converted to the negative logarithm base 10 scale: $−log⁡10(p_-value)$.
3. The is then sorted based on Chromosome and Position.
3. The converted p-values are then sorted based on chromosome and position.
4. Alternating colors are assigned to different chromosomes for visual clarity.
5. Custom javascript component is employed to render interactive Manhattan plot having genomic positions on x-axis and $−log⁡10(p_-value)$ on the y-axis.
5. A custom javascript component is employed to render an interactive Manhattan plot having genomic positions on the x-axis and the $−log⁡10(p_-value)$ on the y-axis.

<Image src="/GWAS_7_res_Manhattan.png" alt="image of GWAS_7_res_Manhattan"/>

Expand All @@ -23,14 +23,14 @@ The determination of significance thresholds can vary depending on the study des
By not including a fixed threshold, we allow for flexibility in interpretation and encourage viewers to consider multiple criteria for assessing the significance of associations.
This approach respects the various schools of thought regarding the identification of significant SNPs in genome-wide studies.

## 2. Qunatile quantile plot
## 2. Quantile-quantile plot

A QQ plot (Quantile-Quantile plot) is a graphical tool used to compare two probability distributions by plotting their quantiles against each other.
In the context of genome-wide association studies (GWAS), a QQ plot is commonly used to compare the distribution of observed p-values with the expected
distribution under the null hypothesis of no association. This helps to assess whether there are any deviations from the null hypothesis,
indicating potential associations between SNPs and the trait of interest.

Using the same p-values as of Manhattan plot
Using the same p-values as of Manhattan plot:

1. The Plink WebAssembly module performs the GWAS analysis and returns unadjusted p-values.
2. Expected p-values are generated under the null hypothesis. These expected p-values follow a uniform distribution between 0 and 1.
Expand All @@ -39,36 +39,36 @@ Using the same p-values as of Manhattan plot
5. Expected p-values are also sorted to match the order of the observed p-values.
6. The sorted observed p-values are plotted on the y-axis and the the sorted expected p-values are plotted on the x-axis.
7. A reference line is drawn with a slope of 1, representing the null hypothesis where observed and expected p-values follow the same distribution.
8. custom JavaScript component is employed to render an interactive QQ plot, allowing users to explore the distribution of p-values.
8. A custom JavaScript component is employed to render an interactive QQ plot, allowing users to explore the distribution of p-values.

#### QQ-plot for GWAS analyses conducted without correction for population structure


<Image src="/GWAS_9_res_QQplot.png" alt="image of GWAS_9_res_QQplot"/>

#### QQ-plot for GWAS analyses when corrected for population structure
#### QQ-plot for GWAS analyses with correction for population structure


<Image src="/GWAS_14_qq_corrected.png" alt="image of GWAS_14_qq_corrected"/>

## 3. Functional gene Annotation
## 3. Functional gene annotation

Following associationsion analysis functional gene annotation is a crucial step in the interpretation of genome-wide association study (GWAS) results.
Following association analysis, functional gene annotation is a crucial step in the interpretation of genome-wide association study (GWAS) results.
It helps researchers identify and understand the genes associated with significant single nucleotide polymorphisms (SNPs), facilitating further studies on the biological mechanisms underlying traits.
The gene annotation component developed in Camelina Hub allows users to find genes associated with SNP positions.
It provides the flexibility to filter genes based on user-defined p-value thresholds and window sizes around SNP positions.

<Image src="/GWAS_10_res_Annotations.png" alt="image of GWAS_10_res_Annotations"/>

## 4. Multiomics integration
## 4. Multi-omics integration

The integration of multi-omics data provides a comprehensive view of the biological processes underlying various traits and conditions.
In Camelina Hub, we have integrated RNA-seq data along with variant data and ensembl gene model, facilitating an in-depth analysis of gene expression across
In Camelina Hub, we have integrated RNA-seq data along with variant data and ensembl gene model data, facilitating an in-depth analysis of gene expression across
different conditions and tissues.

The RNA-seq data integrated into Camelina Hub is displayed in a genome browser, providing users with an intuitive way to explore gene expression patterns alongside genomic variants and gene models.

<Image src="/RNAseq_Data.png" alt="image of RNAseq_Data"/>

For data sources and experiments see datasets
For data sources and experiments see datasets.

12 changes: 6 additions & 6 deletions pages/modules/Phenology/vispheno.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,27 +46,27 @@ The Phenotype Data Component supports the generation of various types of plots t

### 1. All Possible Comparisons or Variables

- The component allows users to generate plots comparing all possible combinations of variables, providing a comprehensive view of the data.
The component allows users to generate plots comparing all possible combinations of variables, providing a comprehensive view of the data.

### 2. Filtering Options

- Users can filter data based on specific criteria to focus on relevant subsets of the data. Filters can be applied to any variable.
Users can filter data based on specific criteria to focus on relevant subsets of the data. Filters can be applied to any variable.

### 3. Complex Queries with Logical Operators

- The component supports complex queries using logical operators (AND, OR, NOT) to refine the data selection process.
The component supports complex queries using logical operators (AND, OR, NOT) to refine the data selection process.

### 4. Hover Info

- Interactive plots provide additional information when users hover over data points, making it easier to understand the details of each observation.
Interactive plots provide additional information when users hover over data points, making it easier to understand the details of each observation.

### 5. Real-Time Update

- The component updates plots in real-time as users interact with the data, apply filters, or change settings, ensuring the latest information is always displayed.
The component updates plots in real-time as users interact with the data, apply filters, or change settings, ensuring the latest information is always displayed.

### 6. Splitting Data Based on Experimental Factors

- Users can split data into groups based on experimental factors such as treatment, genotype, or environmental conditions, allowing for comparative analysis.
Users can split data into groups based on experimental factors such as treatment, genotype, or environmental conditions, allowing for comparative analysis.

## Importance for Stakeholders

Expand Down
Loading

0 comments on commit 024acf1

Please sign in to comment.