update:new additions

ataulhaleem · Oct 21, 2024 · 44b6d6d · 44b6d6d
1 parent be1e79c
commit 44b6d6d
Show file tree

Hide file tree

Showing 6 changed files with 161 additions and 1 deletion.
diff --git a/pages/_meta.json b/pages/_meta.json
@@ -2,6 +2,7 @@
   "index": "Home Page",
   "datasets": "Datasets",
   "modules":  "Functional Modules",
+  "downloads" : "Downloads",
   "SFTP" : "UNTWIST SFTP",
   "about": {
     "title": "About",

diff --git a/pages/downloads.mdx b/pages/downloads.mdx
@@ -0,0 +1,49 @@
+# Downloads Module
+
+The **Downloads Module** allows users to download study-specific data from the database. This tool is particularly useful for researchers and data analysts looking to work with raw or curated experimental data related to Camelina *sativa*. The module supports downloading data in multiple formats and ensures that the data is contextually rich with associated metadata.
+
+## Key Features
+
+- **Study-Specific Data Downloads**: Users can download data for specific studies based on the selected experiment and assay.
+- **Multiple Formats Supported**: Data can be downloaded in TSV (tab-separated values) format or as ARC-RO Crates for comprehensive and context-rich research object management.
+- **Metadata Display**: Provides detailed metadata for each study, including the project title, description, duration, species, treatments, plant anatomical entity, and study identifier.
+
+## How to Use
+
+### 1. **Select Study**
+   - From the dropdown, users can select a study title they are interested in. Each study contains multiple experiments with associated assays, locations, treatments, and plant anatomical entities.
+
+### 2. **Select Experiment and Assay**
+   - Once a study is selected, the module displays available **Experiments**. For each experiment, users can view relevant details, including:
+     - **Locations**: Geographical locations where the experiment was conducted.
+     - **Treatments**: Experimental conditions such as control, drought, or heat.
+     - **Plant Anatomical Entity**: The plant part or tissue used in the experiment (e.g., aerial tissue, leaf).
+     - **Time Points**: Time points for data collection (if available).
+
+   - Users can choose the experiments they want by checking the relevant boxes in the **Select Assay** column.
+
+### 3. **View Study Metadata**
+   - On the right side of the interface, detailed metadata for the selected study is shown under **Study Overview**. This includes:
+     - **Project**: The name of the project.
+     - **Title**: The full title of the study.
+     - **Description**: A brief description of the study’s objectives.
+     - **Duration**: Start and end dates for the study.
+     - **Species**: The species under investigation, such as *Camelina sativa*.
+     - **Locations**: Where the study was conducted.
+     - **Treatments**: The conditions applied to the experimental entities.
+     - **Institutes Involved**: Institutions or research organizations responsible for the study.
+     - **Plant Anatomical Entity**: The specific plant part used in the experiments.
+     - **Study Identifier**: A unique identifier for the study (e.g., UNTWiST2.1).
+
+### 4. **Download Data**
+   - After selecting the desired experiments, users can download the data in one of two formats:
+     - **TSV**: A tab-separated file for easy use in spreadsheets and statistical software.
+     - **ARC-RO Crate**: A format based on annotated research context (ARC), which bundles data with metadata for richer data preservation and reproducibility.
+
+   The download options are located at the bottom of the interface, allowing users to select the format they prefer.
+
+## Importance for Researchers
+
+This module provides a centralized and user-friendly interface for downloading experimental data. By offering detailed metadata alongside the data, it ensures that researchers can easily integrate the data into their own analyses. The ARC-RO Crate option allows for the preservation of experimental context, making it a valuable tool for data sharing and collaborative research. 
+
+By supporting multiple download formats and providing structured, metadata-rich information, the **Downloads Module** significantly streamlines the process of accessing and utilizing Camelina-specific experimental data.
diff --git a/pages/modules/_meta.json b/pages/modules/_meta.json
@@ -1,5 +1,6 @@
 {
   "GWAS": "GWAS",
   "Phenology": "Phenology",
-  "Stratification":  "Population Stratification"
+  "Stratification":  "Population Stratification",
+  "ginfo" : "Genome Informatics"
 }
diff --git a/pages/modules/ginfo/_meta.json b/pages/modules/ginfo/_meta.json
@@ -0,0 +1,4 @@
+{
+    "assembly_stats": "Genome Assembly Overview",
+    "assembly_stats_comp": "Compare Genome Assemblies"
+}
diff --git a/pages/modules/ginfo/assembly_stats.mdx b/pages/modules/ginfo/assembly_stats.mdx
@@ -0,0 +1,47 @@
+# Genome Assembly Overview 
+
+The Genome Assembly Overview module provides a comprehensive summary of the quality and characteristics of a selected genome assembly. It integrates various statistics and metrics to help users assess the completeness, quality, and structural features of genome assemblies. Below is a detailed explanation of the key components presented by the module.
+
+## Overview of Features
+
+### 1. **General Information**
+   - **Name**: The scientific name of the organism (e.g., *Camelina sativa*).
+   - **Common Name**: A more familiar or widely used name (e.g., DH55).
+   - **Assembly Accession**: A unique identifier for the genome assembly version (e.g., GCF_000633955.1).
+   - **Assembly Level**: Describes the status of the assembly (e.g., Chromosome).
+   - **Assembly Method**: The method or tool used for genome assembly (e.g., SOAPdenovo v. 2.01).
+   - **Sequencing Technology**: Describes the technology used to sequence the genome (e.g., Illumina HiSeq, 454).
+
+### 2. **BUSCO Completeness (v4.0.2)**
+   - **BUSCO Categories**: The assembly's completeness is evaluated using BUSCO (Benchmarking Universal Single-Copy Orthologs). The analysis provides four categories:
+     - **Single Copy**: Genes that are found as a single copy in the genome.
+     - **Duplicated**: Genes that are present in more than one copy.
+     - **Fragmented**: Genes that are partially present.
+     - **Missing**: Genes that are absent from the assembly.
+   - **BUSCO Completeness**: Overall genome completeness based on the percentage of conserved genes present (e.g., 99.84769%).
+   - **BUSCO Lineage**: The database used for comparison (e.g., *brassicales_odb10* with 4596 BUSCOs).
+
+### 3. **Assembly Stats**
+   - **GC Percent**: The percentage of the genome made up of guanine (G) and cytosine (C) nucleotides (e.g., 36.5%).
+   - **Total Sequence Length**: The total length of the assembled genome in base pairs (e.g., 641356059 bp).
+   - **Genome Coverage**: The sequencing coverage, indicating how many times the genome was sequenced on average (e.g., 100x).
+   - **Contig N50**: The length of the contig at which 50% of the total genome length is contained in contigs of this length or longer (e.g., 32728).
+   - **Scaffold N50**: The scaffold length at which 50% of the total genome length is contained in scaffolds of this length or longer (e.g., 30099736).
+   - **Scaffold PN50**: The scaffold length considering only pseudo-chromosomes at the 50th percentile (e.g., 0.9386279455106855).
+
+### 4. **Structural Features**
+   - **Number of Genes**: Total number of genes identified in the assembly (e.g., 98741.0).
+   - **Number of Protein Coding Genes**: The number of genes that code for proteins (e.g., 82569.0).
+   - **Number of Pseudo Genes**: The number of pseudogenes in the assembly (e.g., 7930.0).
+   - **Number of Noncoding Genes**: The number of genes that do not encode proteins but may perform other functions (e.g., 8242.0).
+
+## Usage
+
+This module allows researchers, plant breeders, and bioinformaticians to assess the quality of genome assemblies at a glance. It provides critical information about the completeness and structure of the genome, helping users make informed decisions about the reliability of the data for downstream analyses such as trait mapping, genome-wide association studies (GWAS), and functional genomics.
+
+The user-friendly interface and visual representation of BUSCO completeness make it easy to evaluate the assembly's reliability and its suitability for various applications. With integrated structural features and assembly stats, users gain a deeper understanding of the genome's architecture, aiding in tasks such as gene annotation, comparative genomics, and evolutionary studies.
+
+## Conclusion
+
+The Genome Assembly Overview module is an essential tool for evaluating the completeness, quality, and structural features of genome assemblies. It empowers users by providing key statistics in a clear, concise format, ensuring that genomic data is of sufficient quality for further analysis and research.
+
diff --git a/pages/modules/ginfo/assembly_stats_comp.mdx b/pages/modules/ginfo/assembly_stats_comp.mdx
@@ -0,0 +1,58 @@
+# Compare Genome Assemblies
+
+The Compare Genome Assemblies module enables users to compare key statistics across different genome assemblies. It is an essential tool for researchers and scientists who want to evaluate and contrast different assemblies of the same or related species. Below is an overview of the main features and metrics that can be compared using this module.
+
+## Key Features
+
+- **Multiple Genome Assemblies**: Users can select multiple genome assemblies for comparison. The module provides a user-friendly interface where assemblies can be added or removed easily.
+
+- **Custom Metric Selection**: Users can choose the metric they want to compare across the selected genome assemblies. This allows for flexible and targeted analysis of genome quality and characteristics.
+
+- **Visual Comparison**: The module generates bar charts that provide a clear and visual representation of the chosen metric for each genome assembly. This helps users quickly identify differences and trends among the assemblies.
+
+## Available Metrics for Comparison
+
+The module supports the comparison of the following metrics across different genome assemblies:
+
+### 1. **GC Percentage**
+   - The percentage of the genome that is composed of guanine (G) and cytosine (C) nucleotides. This can give insights into the genome's stability and structure.
+
+### 2. **Total Sequence Length**
+   - The total length of the assembled genome, measured in base pairs. It reflects the overall size of the genome assembly.
+
+### 3. **Genome Coverage**
+   - Indicates the average number of times a nucleotide in the genome has been sequenced. Higher coverage means more accurate and complete assemblies.
+
+### 4. **Scaffold N50**
+   - The length of the shortest scaffold such that 50% of the total assembly length is in scaffolds of this length or longer. A higher scaffold N50 value indicates a more contiguous assembly.
+
+### 5. **Contig N50**
+   - The length of the shortest contig such that 50% of the total assembly length is in contigs of this length or longer. Similar to scaffold N50 but focusing on contigs, it reflects the quality of the assembly at the contig level.
+
+### 6. **PN50 Ratio**
+   - The PN50 ratio is a metric for pseudochromosome assemblies. It compares scaffold lengths and is particularly useful in assemblies that aim to represent the genome at a chromosome level.
+
+## Example Comparison
+
+In the provided image, the selected metric for comparison is the **PN50 ratio**. The bar chart shows the PN50 ratio for four genome assemblies: 
+- **USDA_CsJoelle**
+- **ASM3068613v1**
+- **CO46V2.0**
+- **Cs**
+
+This visual comparison helps users quickly assess which assemblies have higher or lower PN50 ratios, providing insight into the quality and contiguity of the genome assemblies at the chromosome level.
+
+## How to Use
+
+1. **Select Genome Assemblies**: Use the dropdown menu to choose the genome assemblies you want to compare.
+
+2. **Choose a Metric**: Select the metric you wish to compare from the available options: GC Percentage, Total Sequence Length, Genome Coverage, Scaffold N50, Contig N50, or PN50 Ratio.
+
+3. **View Results**: The module generates a bar chart that visually represents the chosen metric for each genome assembly, enabling a quick comparison.
+
+## Importance for Researchers
+
+This module is highly beneficial for plant breeders, bioinformaticians, and researchers working on genomic studies, especially for species such as *Camelina sativa*. The ability to compare multiple assemblies across a variety of metrics helps users choose the best assembly for downstream analysis. It also provides insights into the assembly process, allowing for improvements in future assembly efforts.
+
+The visual format of the comparisons ensures that data can be easily interpreted and acted upon, making the Compare Genome Assemblies module a vital tool for comparative genomics.
+