Skip to content

Commit

Permalink
Merge pull request #16 from Illumina/GT-743
Browse files Browse the repository at this point in the history
GT-743 v2.2 release
  • Loading branch information
traxexx authored May 14, 2019
2 parents c486e32 + f4667dd commit bab03b8
Show file tree
Hide file tree
Showing 51 changed files with 1,190 additions and 19,963 deletions.
10 changes: 9 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ include(cxx)
include(configureFiles)

include(FindZLIB)
include(GetBoost)
include(GetHtslib)
include(GetGoogleTest)
include(GetGraphTools)
Expand All @@ -49,9 +48,18 @@ add_custom_command(TARGET copy_data

find_package (Threads REQUIRED)

set(Boost_USE_STATIC_LIBS ON) # only find static libs
set(Boost_USE_MULTITHREADED ON)
set(Boost_USE_STATIC_RUNTIME ON)
find_package(Boost 1.5 COMPONENTS iostreams program_options filesystem system REQUIRED)

include_directories(${ZLIB_INCLUDE_DIR})
include_directories(${BZIP2_INCLUDE_DIR})

# boost sometimes generates warnings; we won't patch them so let's disable them using SYSTEM
include_directories(SYSTEM ${Boost_INCLUDE_DIR})
link_directories(${Boost_LIBRARY_DIRS})

# make libraries first
add_subdirectory (src/c++/lib)
add_subdirectory (external)
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
FROM ubuntu:16.04
FROM ubuntu:17.04

RUN apt-get -qq update && apt-get install -yq \
autoconf \
automake \
build-essential \
cmake \
libboost-all-dev \
libfreetype6-dev \
liblzma-dev \
libboost-all-dev \
libpng-dev \
libtool \
m4 \
Expand Down
599 changes: 115 additions & 484 deletions README.md

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions RELEASES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# Paragraph Release Notes / Change Log

| Date Y-m-d | Ticket | Description |
|------------|---------|----------------------------------------------------------------------|

#Version 2.2

| Date Y-m-d | Ticket | Description |
|------------|---------|----------------------------------------------------------------------|
| 2019-05-11 | GT-743 | Update interface and error handling |
| 2018-12-11 | GT-696 | Fix newlines in validation scripts (public repo already fixed) |

# Version 2.1

| Date Y-m-d | Ticket | Description |
Expand Down
16 changes: 16 additions & 0 deletions data/download-instructions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Please use the following S3 link to download the output VCF from Paragraph manuscript:

Genotypes of HG002 Long-read ground truth (LRGT) SVs on the Illumina HiSeq X 34.5x bam (VCF format):
https://s3-us-west-1.amazonaws.com/paragraph-paper-data/hg002_sniffles_ccs.paragraph.vcf.gz


HG002 Long-read ground truth (LRGT) SVs on 100 individuals from Polaris (JSON format):
Site only:
https://s3-us-west-1.amazonaws.com/paragraph-paper-data/sniffles_ccs_polaris.filtered.autosome.del_ins.json.gz

Genotypes included:
https://s3-us-west-1.amazonaws.com/paragraph-paper-data/sniffles_ccs_polaris.json.gz

Sample name map (S3 ID to regular ID):
https://s3-us-west-1.amazonaws.com/paragraph-paper-data/sample_map.txt

2 changes: 2 additions & 0 deletions doc/filter-scheme.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ Total number of reads on this breakpoint (from all alleles) fail the coverage te

## Variant level filters

All breakpoint level filters can be included in variant level filters as well.

* **PASS**

Variant PASS all filters
Expand Down
2 changes: 1 addition & 1 deletion doc/graph-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ The script supports standard sequence alleles (REF/ALT), as well a subset of sym
* Deletions: explicit deletions or symbolic `<DEL>` alleles segment the reference sequence into three parts. The middle part can be bypassed using an edge. The `END` INFO field can be used to indicate the length of the deletion for long deletions.
* Forward breakends on the same chromosome, e.g. REF: `A`; ALT: `AC[chr1:10000[`, can be used to encode
long deletions with short inserted sequence (swaps / substitutions).
* Insertions with symbolic `<INS>` allele: the `SEQ` INFO field must be non-empty and contain the reference sequence. The `END` INFO field may contain the end of the reference sequence that is replaced with the inserted sequence. Note that our convention is to assume that the first reference base in the record is a padding base, and that this base is not present in the value of `INFO/SEQ`.
* Insertions with symbolic `<INS>` allele: must have a field in INFO field indicating insertion sequence (no padding base, default key `SEQ`). The `END` INFO field may contain the end of the reference sequence that is replaced with the inserted sequence. Note that our convention is to assume that the first reference base in the record is a padding base, and that this base is not present in the value of `INFO/SEQ`.
* When variants have an ID value then it must be unique and will be used to label the variant edges in the graph. A VCF file with non-unique IDs or other problematic INFO/FORMAT fields can be cleaned up using bcftools:
```bash
bcftools annotate --set-id '.' -x 'INFO,FORMAT,^FORMAT/GT' \
Expand Down
Binary file modified external/graph-tools.tar.gz
100755 → 100644
Binary file not shown.
Binary file removed external/htslib-1.8.tar.gz
Binary file not shown.
Binary file added external/htslib-1.9.tar.gz
Binary file not shown.
60 changes: 30 additions & 30 deletions share/test-data/genotyping_test_2/chrA.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,40 @@
"edges": [
{
"from": "source",
"name": "source_ref-chrA:1350-1499",
"to": "ref-chrA:1350-1499"
"name": "source_ref-chrA:1350-1502",
"to": "ref-chrA:1350-1502"
},
{
"from": "ref-chrA:1350-1499",
"name": "ref-chrA:1350-1499_ref-chrA:1500-1506",
"from": "ref-chrA:1350-1502",
"name": "ref-chrA:1350-1502_chrA:1503-1506:AGTAA",
"sequences": [
"REF",
"swap1:0"
"swap1:1"
],
"to": "ref-chrA:1500-1506"
"to": "chrA:1503-1506:AGTAA"
},
{
"from": "ref-chrA:1350-1499",
"name": "ref-chrA:1350-1499_chrA:1500-1506:CTAGTAA",
"from": "ref-chrA:1350-1502",
"name": "ref-chrA:1350-1502_ref-chrA:1503-1506",
"sequences": [
"swap1:1"
"REF",
"swap1:0"
],
"to": "chrA:1500-1506:CTAGTAA"
"to": "ref-chrA:1503-1506"
},
{
"from": "ref-chrA:1500-1506",
"name": "ref-chrA:1500-1506_ref-chrA:1507-1509",
"from": "chrA:1503-1506:AGTAA",
"name": "chrA:1503-1506:AGTAA_ref-chrA:1507-1509",
"sequences": [
"REF",
"swap1:0"
"swap1:1"
],
"to": "ref-chrA:1507-1509"
},
{
"from": "chrA:1500-1506:CTAGTAA",
"name": "chrA:1500-1506:CTAGTAA_ref-chrA:1507-1509",
"from": "ref-chrA:1503-1506",
"name": "ref-chrA:1503-1506_ref-chrA:1507-1509",
"sequences": [
"swap1:1"
"REF",
"swap1:0"
],
"to": "ref-chrA:1507-1509"
},
Expand All @@ -55,26 +55,26 @@
"to": "sink"
}
],
"model_name": "Graph from ../genotyping_test_2/chrA.vcf",
"model_name": "Graph from /Users/schen6/Documents/paragraph-tools/share/test-data/genotyping_test_2/chrA.vcf",
"nodes": [
{
"name": "source",
"sequence": "NNNNNNNNNN"
},
{
"name": "ref-chrA:1350-1499",
"reference": "chrA:1350-1499",
"reference_sequence": "GAGCGTCAAAGGCCCGTCCTATAGACCTGAAGAGTGTTGAGAGTTCTATGGAAAAATTTATAGAGTACTTCCCAGTGTTTTAACTGTAACTGACTTCCATGATCATTTCAGTAATGCAGCCCACATGCATGTTCAAGACTAACACACCGT"
"name": "ref-chrA:1350-1502",
"reference": "chrA:1350-1502",
"reference_sequence": "GAGCGTCAAAGGCCCGTCCTATAGACCTGAAGAGTGTTGAGAGTTCTATGGAAAAATTTATAGAGTACTTCCCAGTGTTTTAACTGTAACTGACTTCCATGATCATTTCAGTAATGCAGCCCACATGCATGTTCAAGACTAACACACCGTGCT"
},
{
"name": "ref-chrA:1500-1506",
"reference": "chrA:1500-1506",
"reference_sequence": "GCTGCCC"
"name": "chrA:1503-1506:AGTAA",
"position": "chrA:1503-1506",
"sequence": "AGTAA"
},
{
"name": "chrA:1500-1506:CTAGTAA",
"position": "chrA:1500-1506",
"sequence": "CTAGTAA"
"name": "ref-chrA:1503-1506",
"reference": "chrA:1503-1506",
"reference_sequence": "GCCC"
},
{
"name": "ref-chrA:1507-1509",
Expand All @@ -94,8 +94,8 @@
"paths": [
{
"nodes": [
"ref-chrA:1350-1499",
"ref-chrA:1500-1506",
"ref-chrA:1350-1502",
"ref-chrA:1503-1506",
"ref-chrA:1507-1509",
"ref-chrA:1510-1659"
],
Expand Down
2 changes: 1 addition & 1 deletion share/test-data/genotyping_test_2/chrA.vcf
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
##contig=<ID=chrB,length=3000>
##contig=<ID=chrC,length=3000>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT swaps
chrA 1500 swap1 GCTGCCCCTT CTAGTAACTT . PASS . GT 0/0
chrA 1500 swap1 GCTGCCCCTT GCTAGTAACTT . PASS . GT 0/0
60 changes: 30 additions & 30 deletions share/test-data/genotyping_test_2/chrB.json
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,40 @@
"edges": [
{
"from": "source",
"name": "source_ref-chrB:1350-1499",
"to": "ref-chrB:1350-1499"
"name": "source_ref-chrB:1350-1500",
"to": "ref-chrB:1350-1500"
},
{
"from": "ref-chrB:1350-1499",
"name": "ref-chrB:1350-1499_ref-chrB:1500-1509",
"from": "ref-chrB:1350-1500",
"name": "ref-chrB:1350-1500_chrB:1501-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT",
"sequences": [
"REF",
"swap2:0"
"swap2:1"
],
"to": "ref-chrB:1500-1509"
"to": "chrB:1501-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT"
},
{
"from": "ref-chrB:1350-1499",
"name": "ref-chrB:1350-1499_chrB:1500-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT",
"from": "ref-chrB:1350-1500",
"name": "ref-chrB:1350-1500_ref-chrB:1501-1509",
"sequences": [
"swap2:1"
"REF",
"swap2:0"
],
"to": "chrB:1500-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT"
"to": "ref-chrB:1501-1509"
},
{
"from": "ref-chrB:1500-1509",
"name": "ref-chrB:1500-1509_ref-chrB:1510-1659",
"from": "chrB:1501-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT",
"name": "chrB:1501-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT_ref-chrB:1510-1659",
"sequences": [
"REF",
"swap2:0"
"swap2:1"
],
"to": "ref-chrB:1510-1659"
},
{
"from": "chrB:1500-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT",
"name": "chrB:1500-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT_ref-chrB:1510-1659",
"from": "ref-chrB:1501-1509",
"name": "ref-chrB:1501-1509_ref-chrB:1510-1659",
"sequences": [
"swap2:1"
"REF",
"swap2:0"
],
"to": "ref-chrB:1510-1659"
},
Expand All @@ -45,26 +45,26 @@
"to": "sink"
}
],
"model_name": "Graph from ../genotyping_test_2/chrB.vcf",
"model_name": "Graph from /Users/schen6/Documents/paragraph-tools/share/test-data/genotyping_test_2/chrB.vcf",
"nodes": [
{
"name": "source",
"sequence": "NNNNNNNNNN"
},
{
"name": "ref-chrB:1350-1499",
"reference": "chrB:1350-1499",
"reference_sequence": "TCTAACAATTACGGAGGAGAACCGGTCTCCCCCCACGGTGGAGCGTATACGTCGTTAATGGTGGGGAAGTCAATACGACCGACTGTCCATGGTCATGCCCCTATCGTCGGATTCACGCTGCTTTACCTCAAAGTCAACCCAGCCGTAGTT"
"name": "ref-chrB:1350-1500",
"reference": "chrB:1350-1500",
"reference_sequence": "TCTAACAATTACGGAGGAGAACCGGTCTCCCCCCACGGTGGAGCGTATACGTCGTTAATGGTGGGGAAGTCAATACGACCGACTGTCCATGGTCATGCCCCTATCGTCGGATTCACGCTGCTTTACCTCAAAGTCAACCCAGCCGTAGTTA"
},
{
"name": "ref-chrB:1500-1509",
"reference": "chrB:1500-1509",
"reference_sequence": "AGGCCATACG"
"name": "chrB:1501-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT",
"position": "chrB:1501-1509",
"sequence": "TCAGGTTGTCTTATGCTTGGCATCGTTCTT"
},
{
"name": "chrB:1500-1509:TCAGGTTGTCTTATGCTTGGCATCGTTCTT",
"position": "chrB:1500-1509",
"sequence": "TCAGGTTGTCTTATGCTTGGCATCGTTCTT"
"name": "ref-chrB:1501-1509",
"reference": "chrB:1501-1509",
"reference_sequence": "GGCCATACG"
},
{
"name": "ref-chrB:1510-1659",
Expand All @@ -79,8 +79,8 @@
"paths": [
{
"nodes": [
"ref-chrB:1350-1499",
"ref-chrB:1500-1509",
"ref-chrB:1350-1500",
"ref-chrB:1501-1509",
"ref-chrB:1510-1659"
],
"path_id": "REF|1",
Expand Down
2 changes: 1 addition & 1 deletion share/test-data/genotyping_test_2/chrB.vcf
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
##contig=<ID=chrB,length=3000>
##contig=<ID=chrC,length=3000>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT swaps
chrB 1500 swap2 AGGCCATACG TCAGGTTGTCTTATGCTTGGCATCGTTCTT . PASS . GT 1/1
chrB 1500 swap2 AGGCCATACG ATCAGGTTGTCTTATGCTTGGCATCGTTCTT . PASS . GT 1/1
6 changes: 3 additions & 3 deletions share/test-data/genotyping_test_2/expected-genotypes.vcf
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
chrA 1500 swap1 GCTGCCCCTT CTAGTAACTT . PASS GRMPY_ID=swaps.vcf@5a0b775f60ed1cd0b938ae09b753ad0207c5ba9f83679f894f17d3d1fd352b6f:1 GT:OLD_GT:DP:FT:AD:ADF:ADR:PL 0/0:0/0:1076:PASS:604,4:604,4:0,0:0,3281,32454
chrB 1500 swap2 AGGCCATACG TCAGGTTGTCTTATGCTTGGCATCGTTCTT . PASS GRMPY_ID=swaps.vcf@5a0b775f60ed1cd0b938ae09b753ad0207c5ba9f83679f894f17d3d1fd352b6f:2 GT:OLD_GT:DP:FT:AD:ADF:ADR:PL 1/1:1/1:952:PASS:0,596:0,596:0,0:32425,3031,0
chrC 1500 swap3 CGCGTTGTAAGCTACCATATTCAATCTGTGCCAGGGATCGAGCCACAGGCACCGCTCAATCTCGCGGGAGATTGTGCAAAGAGTCTTACCTTTCGTCGACCTCCGCCTCGCTCGTGAATCTTGCGATCGATTGAAAGTCACGGGTAGAGTGATGTTCGGGCGAATCAGACAGGCAGATGCAATGGAGGTTCCCGGATAGT CCGGAGATACCCTCTGTCTCCGCTAACATTTCCCCGCGGACAAAATTTGTCGGCTGGGAGGAATAGGTGCAAACGCATAATATACCCCTCTTACTTTTTGTTAGGGTCTAGTCCGAATCTAAAAAATGACTAAGGACTCTCAGAGTGATGGATATATGCCTCGCGACGCCGATCTGTGCTTATGTCGCAGCTTTGGCATCAAACCAGTTTCACATACCCTGCCTAAAAGATTCCCATACTGCGAAATCGCAAGATTGTACAAGTTGTAGTCTGTGCGCCAGCGTGAGCACGGCACTCGGT . PASS GRMPY_ID=swaps.vcf@5a0b775f60ed1cd0b938ae09b753ad0207c5ba9f83679f894f17d3d1fd352b6f:3 GT:OLD_GT:DP:FT:AD:ADF:ADR:PL 0/1:0/1:1008:PASS:538,538:538,538:0,0:4132,0,4132
chrA 1500 swap1 GCTGCCCCTT GCTAGTAACTT . PASS GRMPY_ID=swaps.vcf@42527ba8a8840f1c955f8e6879b567988bbf858febd25ba5b4555895dbbcfef7:1 GT:OLD_GT:DP:FT:AD:ADF:ADR:PL 0/0:0/0:1100:PASS:604,4:604,4:0,0:0,3364,32458
chrB 1499 swap2 TAGGCCATACG TTCAGGTTGTCTTATGCTTGGCATCGTTCTT . PASS GRMPY_ID=swaps.vcf@42527ba8a8840f1c955f8e6879b567988bbf858febd25ba5b4555895dbbcfef7:2 GT:OLD_GT:DP:FT:AD:ADF:ADR:PL 1/1:1/1:952:PASS:0,596:0,596:0,0:32425,3031,0
chrC 1500 swap3 CGCGTTGTAAGCTACCATATTCAATCTGTGCCAGGGATCGAGCCACAGGCACCGCTCAATCTCGCGGGAGATTGTGCAAAGAGTCTTACCTTTCGTCGACCTCCGCCTCGCTCGTGAATCTTGCGATCGATTGAAAGTCACGGGTAGAGTGATGTTCGGGCGAATCAGACAGGCAGATGCAATGGAGGTTCCCGGATAGT CCGGAGATACCCTCTGTCTCCGCTAACATTTCCCCGCGGACAAAATTTGTCGGCTGGGAGGAATAGGTGCAAACGCATAATATACCCCTCTTACTTTTTGTTAGGGTCTAGTCCGAATCTAAAAAATGACTAAGGACTCTCAGAGTGATGGATATATGCCTCGCGACGCCGATCTGTGCTTATGTCGCAGCTTTGGCATCAAACCAGTTTCACATACCCTGCCTAAAAGATTCCCATACTGCGAAATCGCAAGATTGTACAAGTTGTAGTCTGTGCGCCAGCGTGAGCACGGCACTCGGT . PASS GRMPY_ID=swaps.vcf@42527ba8a8840f1c955f8e6879b567988bbf858febd25ba5b4555895dbbcfef7:3 GT:OLD_GT:DP:FT:AD:ADF:ADR:PL 0/1:0/1:1008:PASS:538,538:538,538:0,0:4132,0,4132
Loading

0 comments on commit bab03b8

Please sign in to comment.