Skip to content

Commit

Permalink
Merge pull request #13 from Illumina/GT-683
Browse files Browse the repository at this point in the history
GT-683 v2.1 release
  • Loading branch information
traxexx authored Dec 31, 2018
2 parents 8e5c307 + 6af6532 commit c486e32
Show file tree
Hide file tree
Showing 49 changed files with 888 additions and 721 deletions.
53 changes: 22 additions & 31 deletions README.md
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -156,26 +156,6 @@ to give the following genotypes for each event:
| swap2 | chrB | S1/S1 (homalt) |
| swap1 | chrC | REF/S1 (heterozygous) |

We can extract these genotypes from the output file using Python script at bin/paragraph-to-csv.py

```
bin/paragraph-to-csv.py /tmp/paragraph-test/genotype.json.gz --genotype-only
```

The output will be:

```
#FORMAT=GT
#ID SWAPS
chrA:1500-1509 REF/REF
chrB:1500-1509 S1/S1
chrC:1500-1699 REF/S1
```

The output JSON file contains more information which can be used to link
events back to the original VCF file, genotype likelihoods, and also to get the
genotypes of the individual breakpoints.

If the input is a VCF file, then the output folder will contain an updated
VCF file which allows us to quickly compare the original genotypes from the
input VCF, and those obtained by grmpy:
Expand All @@ -189,6 +169,8 @@ bcftools query -f '[%GT\t%OLD_GT]\n' /tmp/paragraph-test/genotypes.vcf.gz
0/1 0/1
```

Note if the input VCF doesn't contain any sample information, you won't be able to get **OLD_GT** field in this output VCF.

We can see that the genotypes match in our use case, as expected.

In [doc/multiple-samples.md](doc/multiple-samples.md), we show how ParaGRAPH can be run on multiple samples with snakemake.
Expand Down Expand Up @@ -233,11 +215,10 @@ The complete list of requrements can be found in [requirements.txt](requirements
[http://www.boost.org](http://www.boost.org) and is available under the Boost license:
[http://www.boost.org/users/license.html](http://www.boost.org/users/license.html).

You may use your system Boost version, on Ubuntu, you can install the required versions
of Boost as follows:
You may use your system Boost version, on Ubuntu, you can install the required versions of Boost as follows:
```bash
sudo apt install libboost-dev libboost-iostreams-dev libboost-program-options-dev \
libboost-math-dev libboost-system-dev libboost-filesystem-dev
libboost-math-dev libboost-system-dev libboost-filesystem-dev
```

Paragraph includes a copy of Boost 1.61 which can be built automatically during the
Expand Down Expand Up @@ -400,12 +381,23 @@ Alternatively, candidate SV events can be specified as vcf.
chr1 161 test-del TC T . . .
```

* **samples.txt**: Manifest that specifies some test BAM files. Required columns: ID, path, depth, read length. Optional column: sex. Tab delimited.
* **samples.txt**: Manifest that specifies some test BAM files. Tab delimited.

Required columns: ID, path, depth, read length.

Optional column:

- depth sd: Specify standard deviation for genome depth. Used for the normal test of breakpoint read depth. Default is sqrt(5*depth).

- depth variance: Square of depth sd.

- sex: Affects chrX and chrY genotyping. Allow "male", "female" and "unknown". If not specified, all samples will be treated as female.

```
id path depth read length sex
sample1 sample1.bam 1 50 male
sample2 sample2.bam 1 50 female
sample3 sample2.bam 1 50 unknown
id path depth read length depth sd sex
sample1 sample1.bam 1 50 20 male
sample2 sample2.bam 1 50 20 female
sample3 sample2.bam 1 50 20 unknown
```
* **dummy.fa** a short dummy reference which only contains `chr1`
Expand Down Expand Up @@ -518,8 +510,7 @@ We also have the paths induced by the edge labels (this was added by `vcf2paragr
Each node, edge, and path has reads associated with it. We provide read counts for forward
and reverse strands (`:READS`, `:FWD`, `:REV`) and fragment counts (these counts are corrected
for the same reads possibly originating from the same sequence fragment in the case of
paired-end sequencing data).
for the same reads possibly originating from the same sequence fragment in the case of paired-end sequencing data).
```javascript
"read_counts_by_edge": {
Expand Down Expand Up @@ -631,7 +622,7 @@ It is extracted and re-organized from [an expected output](share/test-data/multi

* In [doc/graph-models.md](doc/graph-models.md) we describe the graph and genotyping
models we implement.

* [Doc/graphs-ashg-2017.pdf](doc/graphs-ashg-2017.pdf) contains the poster about this method we showed at
[ASHG 2017](http://www.ashg.org/2017meeting/)

Expand Down
11 changes: 11 additions & 0 deletions RELEASES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Paragraph Release Notes / Change Log

# Version 2.1

| Date Y-m-d | Ticket | Description |
|------------|---------|----------------------------------------------------------------------|
| 2018-12-06 | GT-675 | Fix filters and alignment stats. Change depth test threshold on lower end |
| 2018-11-08 | GT-660 | Optimize GQ for variant genotypes |
| 2018-11-02 | GT-656 | Improvement for simple SV genotyping |
| 2018-07-19 | GT-501 | Breakpoint depth test based on normal distribution |
| 2018-07-16 | GT-539 | VCF now output genotypes for all samples in manifest and input VCF |
| 2018-06-28 | GT-527 | --graph-sequence-matching yes fails with boost 1.63 |

# Version 2.0

| Date Y-m-d | Ticket | Description |
Expand Down
32 changes: 18 additions & 14 deletions doc/filter-scheme.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,33 @@
# Filters used in genotyper output

## Variant level filters
## Breakpoint level filters

* **PASS**
* **GQ**

Variant PASS all filters
Low genotype quality for this breakpoint

* **CONFLICT**
* **NO_READS**

Variant has genotype conflicts in one or more breakpoints
No reads in this breakpoint

* **BP_DEPTH**

* **EXIST_BAD_BP**
Total number of reads on this breakpoint (from all alleles) fail the coverage test

Varaint has one or more breakpoint that fails breakpoint-level filter
## Variant level filters

* **ALL_BAD_BP**
* **PASS**

All breakpoints in this variant fail breakpoint-level filter
Variant PASS all filters

* **MISSING**
* **CONFLICT**

Variant has one or more breakpoints with no spanning read
Variant has genotype conflicts in one or more breakpoints

## Breakpoint level filters
* **BP_NO_GT**

Exist one or more breakpoint with missing genotypes

* **DEPTH**
* **NO_VALID_GT**

Total number of reads on this breakpoint (from all alleles) fail the coverage test
All breakpoints have missing genotypes
2 changes: 1 addition & 1 deletion doc/genotyping-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Below we show all allowed parameter fields:
// such as 0.00001 for a more conservative callset which only
// includes genotypes for calls which have read counts that are
// close to the median read depth in the BAM file.
"coverage_test_cutoff": -1.0,
"coverage_test_cutoff": 0.0001,

// Allele names in graph(s).
// If other alleles were observed in graph, they will be excluded from analysis.
Expand Down
Empty file modified external/graph-tools.tar.gz
100644 → 100755
Empty file.
Loading

0 comments on commit c486e32

Please sign in to comment.