NeoSV

A computational workflow to identify Neoantigens from Structural Variations.

New in NeoSV v0.0.4

Support BEDPE format as input
Fix bugs related to NetMHCpan 4.1 (NetMHCpan 4.0 will no longer be supported by NeoSV.)
Add an additional parameter erc, which enable users filter neoantigens by EL (eluted ligand) rank

Background

Neoantigens are considered as ideal targets for immunotherapies because they are tumor-specifc and not subject to immune tolerance. Previous studies have been focused on single nucleotide variation (SNV) and insertion-and-deletion (indel), with the neoantigens from structural variation (SV) poorly characterized.

We developed a Python package-NeoSV-to annotate the effect of SVs on protein and predict potential neoantigens created by SVs. We have successfully applied NeoSV to thousands of tumor genomes from Pan Cancer Analysis of Whole Genomes (PCAWG) and constructed a comprehensive repertoire of SV-derived neoantigens. For more details, please read our paper:

Shi, Y., Jing, B. & Xi, R. Comprehensive analysis of neoantigens derived from structural variation across whole genomes from 2528 tumors. Genome Biol 24, 169 (2023)

Install

Prerequisites

Python (>3.6). NeoSV should work well with all versions of Python3, but has been only tested on Python > 3.6
NetMHCpan (4.1). After you sign up and get the link for downloading, there will be a accompanied guidance on how to configure netMHCpan.

Download

PyPI: if you already have python and pip, you can directly install NeoSV via pip install neosv
Source code: we noted that sometimes pip will not install the binary file neosv, is such case you can download the package and install it using python setup.py install. Please remember to install biopython and pyensembl using pip before installation.

Usage

Input

NeoSV requires 3 types of inputs:

Variant file: a file in VCF format or BEDPE format which lists all SVs you want to analyze. Template files: test.sv.vcf and test.sv.bedpe
HLA file: a file listing the HLA alleles line by line. This usually includes six HLA alleles for an individual. HLA should be in 4 digit format like HLA-A*02:01. Template file: test.hla.txt
Reference file: NeoSV utilizes pyensembl for SV annotation, thus a reference for pyensembl is needed. There are 3 ways to prepare it:
- Pre-download by pyensembl (recommended): When you install NeoSV using pip or conda, pyensembl will be automatically installed as well. Then you can download the reference:
```
export PYENSEMBL_CACHE_DIR=/custom/cache/dir # specify the location for storing reference
pyensembl install --release <list of Ensembl release numbers> --species <species-name> # download, for hg19 please use release 75, for hg38 please used release 96
```
- Automatically download by NeoSV: If NeoSV did not detect a valid reference in --pyensembl-cache-dir, it will automatically download one to that folder. Please make sure the internet connection of your system, since some high performance computing nodes have no network.
- Prepare the reference file manually: This would be useful if your data is not from human or mouse. Then you need to prepare the reference by yourself. A FASTA file and a GTF file will be enough. For more details please see the guidance. In addition, you need to confirm the MHC alleles in that species are supported by NetMHCpan.

Run

Quick start: suppose you have a variant file named test.sv.vcf, a HLA file named test.hla.txt. Your pyensembl reference is human sapiens release 75 and located at /pyensembl/, then a typical NeoSV command is:
```
neosv -sf test.sv.vcf -hf test.hla.txt -np /path/to/netmhcpan -o test -p test -r 75    
```

Below is detailed description for each parameter:

Argument	Description
`-h`, `--help`	show the help message
`-sf`, `--sv-file`	Structural variants in VCF or BEDPE format. NeoSV will automatically identify the format according to the file suffix.
`-hf`, `--hla-file`	HLA alleles (resolution: 4 digit), with one allele per line.
`-np`, `--netmhc-path`	Absolute path to the NetMHCpan execution file, please skip this argument if NetMHCpan has been added to your PATH.
`-o`, `--out`	Folder for all result files. A new folder will be created if it does not exist.
`-p`, `--prefix`	This prefix will be added to all output files.
`-r`, `--release`	The release of Ensembl to use. Valid release versions can be found here. Ensembl release for hg19/GRCh37, hg38/GRCh38 are 75, 96.
`-gf`, `--gtf-file`	GTF file for the reference, only needed when you want to prepare the ensembl reference by yourself.
`-cf`, `--cdna-file`	cDNA file for the reference, only needed when you want to prepare the ensembl reference by yourself.
`-pd`, `--pyemsembl-cache-dir`	Directory for Pyensembl cache files. If not specified, the platform-specific cache folder will be used
`-l`, `--epitope-lengths`	Lengths of neoepitopes to predict MHC binding. Default: 8-11.
`-ic`, `--ic50-cutoff`	Filter neoepitopes with IC50 (nM) above this value. Default: 500.
`-brc`, `--ba-rank-cutoff`	Filter neoepitopes with BA-rank above this value. Default: 2.
`-erc`, `--el-rank-cutoff`	Filter neoepitopes with EL-rank above this value. Default: 2.
`-ct`, `--complete-transcript`	Only complete transcripts will be considered for SV annotation. Default: True.
`--anno-only`	Only annotate SV without predicting neoantigens.If this argument is added, --hla-file is not required, and you will only get the annotation result.

Output

Several files will be generated in the output directory, you may have interest in the files suffixed by neoantigen.filtered.txt and anno.filtered.txt

{prefix}.neoantigen.filtered.txt stores all information of the candidate neantigens:

Column index	Column name	Content
1	chrom1	Chromosome of the 1st breakpoint
2	pos1	Genommic position of the 1st breakpoint
3	gene1	Gene name of the 1st breakpoint
4	transcript_id1	Ensembl transcript ID of the 1st breakpoint
5	chrom2	Chromosome of the 2nd breakpoint
6	pos2	Genommic position of the 2nd breakpoint
7	gene2	Gene name of the 2nd breakpoint
8	transcript_id2	Ensembl transcript ID of the 2nd breakpoint
9	svpattern
10	svtype	SV types according to the orientation of junction read. Values: DUP, DEL, TRA, t2tINV, or h2hINV.
11	frameshift	The effect on open reading frame. Values: In-frame, Stop-gain, Stop-loss, Start-loss.
12	neoantigen	Amino acid sequence of the neoantigen
13	allele	HLA allele that binds to the neoantigen
14	affinity	Binding affinity (nM) provided by NetMHCpan
15	BA_rank	BA rank of the binding provided by NetMHCpan
16	EL_rank	EL rank of the binding provided by NetMHCpan. From NetMHCpan4.0, EL rank is the most recommended feature for filtering neoantigens.

{prefix}.anno.filtered.txt stores all annotations of the SVs:

Column index	Column name	Content
1	chrom1	Chromosome of the 1st breakpoint.
2	pos1	Genommic position of the 1st breakpoint.
3	function1	Location of the 1st breadpoint relative to a gene. Values: Intergenic, Intron, Exon.
4	gene1	Gene name of the 1st breakpoint.
5	transcript_id1	Ensembl transcript ID of the 1st breakpoint
6	strand1	Coding strand of the 1st gene. Values: +, -, None (if intergenic)
7	transcript_retain1	The part being retained of transcript, I/i indicates intron, E/e indicates exon. Upper case means an intact exon/intron, while lower case means the exon/intron is truncated by this SV
8	chrom2	Chromosome of the 2nd breakpoint
9	pos2	Genommic position of the 2nd breakpoint
10	function2	Location of the 1st breadpoint relative to a gene. Values: Intergenic, Intron, Exon.
11	gene2	Gene name of the 2nd breakpoint
12	transcript_id2	Ensembl transcript ID of the 2nd breakpoint
13	strand2	Coding strand of the 2nd gene. Values: +, -, None (if intergenic)
14	transcript_retain2
15	svpattern
16	svtype	SV types according to the orientation of junction read. Values: DUP, DEL, TRA, t2tINV, or h2hINV.
17	fusion	Whether this SV can lead to a functional gene fusion. It should be noted that the fusion is not restricted to two-gene fusion.

{prefix}.net.in.txt stores the peptides fed to netMHCpan.
{prefix}.net.out.txt stores the raw output from netMHCpan.

License

NeoSV is licensed under the terms of MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeoSV

New in NeoSV v0.0.4

Background

Install

Prerequisites

Download

Usage

Input

Run

Output

License

About

Releases 3

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
neosv		neosv
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
test.hla.txt		test.hla.txt
test.sv.bedpe		test.sv.bedpe
test.sv.vcf		test.sv.vcf

License

ysbioinfo/NeoSV

Folders and files

Latest commit

History

Repository files navigation

NeoSV

New in NeoSV v0.0.4

Background

Install

Prerequisites

Download

Usage

Input

Run

Output

License

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages