harp

Haplotype-Assisted Read Parsing

what

This script performs allele-specific alignment of a set of reads to two reference genomes. It outputs sam files for each genome, which can be used in many downstream analyses.

how

The script in this package takes a set of single/paired-end fastq files, aligns them two reference genomes using bowtie2, and parses each read based on which genome they best align to. If a read aligns to both genomes, but not equally-well, the alignment to both genomes must be unique (have a high alignment score for both genomes) in order to avoid mismappings due to SNPs.

why

This was originally written to perform allele-specific alignment of repli-seq data in castaneus-musculus hybrid mouse cells. It has also been used for parsing Hi-C, RNA-seq, and ChIP-seq data. This R package contains the implementation of the read-parsing algorithm used in the manuscript:

Allele-specific control of replication timing and genome organization during development. Juan Carlos Rivera-Mulia, Andrew Dimond, Daniel Vera, Claudia Trevilla-Garcia, Takayo Sasaki, Jared Zimmerman, Catherine Dupont, Joost Gribnau, Peter Fraser and David M. Gilbert

who

This software was written by Daniel Vera ([email protected])

software requirements

This software has only been tested on centos7 and ubuntu trusty, but is expected to work on most modern linux-based systems with the following software installed and in your $PATH:

bowtie2
samtools >1.3
gawk
GNU coreutils
R >3

And the following R packages should be installed:

devtools >1.13 (R package)

input requirements

fastq files to parse
bowtie2 indices for each haplotype.

installation

# in R:
devtools::install_github("dvera/harp")

usage

# make bowtie2 indices for each haplotype, assuming you have a fasta file for each haplotype, where each differs only by SNPs:

mkdir bowtie2index && cd bowtie2index
bowtie2-build /path/to/genome1.fa genome1
bowtie2-build /path/to/genome2.fa genome2

# navigate to a directory with your fastq files
cd /path/to/fastqFiles

# open R
R

in R:

library(harp)
# define a vector of fastq files to parse
f <- files("*.fastq")

ref1 <- "/path/to/bowtie2index/genome1"
ref2 <- "/path/to/bowtie2index/genome2"

harp( f, index1prefix=ref1, index2prefix=ref2 )

output

The script will generate a series of files for each input fastq file:

*_unmapped.sam (did not map well to either genome)
*_parsed_genome1.sam (parsed to genome1)
*_parsed_genome2.sam (parsed to genome2)
*_ambiguous.sam (mapped well to at least one genome, but could not be confidently parsed)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
harp.Rproj		harp.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

harp

what

how

why

who

software requirements

input requirements

installation

usage

output

About

Releases

Packages

Languages

License

dvera/harp

Folders and files

Latest commit

History

Repository files navigation

harp

what

how

why

who

software requirements

input requirements

installation

usage

output

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages