It is recommended to launch assemblies from molecule .bnx files via command line if assembly intermediate files -alignmolvref
step - are needed as input for downstream bioinformatics analysis (e.g molecule distance script). Assemblies downloaded from Access will likely not contain alignmolvref
intermediate files.
For quick reference, please proceed to Local Guided Assembly (GA-Local) via Command Line Usage section.
Conda environment and Solve need to be installed in order to run pipelines. The following describes Solve installation in a Linux operating system.
In general, the latest *.tar.gz
of Solve can be downloaded from Bionano Software Download. Installation guide in a Linux operating system can be found in 30182 Bionano Solve Installation Guide Page 5.
Alternatively, if using Solve 3.7.2:
- Acquire
*.tar.gz
of Solve 3.7.2.tar.gz. - Install
bionano_python3.0.yaml
.
conda env create --file bionano_python3.0.yml
- Activate
bionano_python3.0
conda environment.
conda activate bionano_python3.0
- Untar installation
- Navigate into /bionano_packages
- Install the following R and python packages:
mkdir ~/Rpackages
export R_LIBS_USER=~/Rpackages
~/install/Solve3.7_09022021_137/bionano_packages$ R CMD INSTALL ./BionanoR
~/install/Solve3.7_09022021_137/bionano_packages$ R CMD INSTALL ./FractCNV
~/install/Solve3.7_09022021_137/bionano_packages$ python3 -m pip install ./pybionano
~/install/Solve3.7_09022021_137/bionano_packages$ python3 -m pip install ./svconfmodels
~/install/Solve3.7_09022021_137/bionano_packages$ y
- Run assemblies. e.g
local_guided_assembly.sh
local_guided_assembly.sh
can only be run on machine with Linux-based operating system with Solve 3.7.2 installed and bionano_python3.0
conda environment installed. Paths to pipeline, repeat coordinate .csv, and seed files path referenced in the script need to be examined to point to user's Solve installation.
hg38_DLE1_0kb_0labels.cmap
reference and optArguments_haplotype_DLE1_saphyr_human_D4Z4.xml
files will be provided as input in the script for GA, which are preinstalled in Solve via command line.
Once Guided Assembly completes, the second portion of the script script launches a custom Enfocus pipeline assembling maps along repeat expansion gene coordinates supplied as an argument, which can be found in coo_csvs/*.csv
.
Script call will automatically reference the necessary repeat coordinate file for the custom Enfocus pipeline. All repeat loci with an expected repeat size >300-500 bp can be analyzed with this script by changing the -coo parameter. In our study, only the known repeat loci in CNBP, DMPK, RFC1, FMR1, STARD7, ATXN10, FXN, NOP56, and C9orf72 have been analyzed and the -coo parameter coordinate files for these loci are provided on this Github page. To run the script for a different repeat locus, the script has to be run using a user-provided custom -coo parameter coordinate file.
Example script call
# For running GA-Local in a single call.
sh ./scripts/run_local_guided_assembly.sh ./bnx_input/RUMC_DMPK_HvB_01.bnx DMPK
# For running GA-Local in a loop, given the path to a folder containing molecules
GA_PATH="$1/"
for DIR in $GA_PATH*.bnx;
do
echo $DIR
# echo $(dirname "$DIR")
echo $(basename "$DIR")
sh ./scripts/local_guided_assembly.sh $DIR dmpk
done
The output file is similar to an Enfocus Fragile X assembly, with the difference that it estimates repeat expansion distances along the locus specified by the .csv file instead of FMR1. Furthermore, the estimated number of repeat units depends on the repeat length (e.g trinucleotide, pentanucleotide), which is included in the ~/coo_csvs/*_repeat_coords.csv
file for genes FMR1, RFC1, DMPK, STARD7, CNBP, ATXN10, FXN, NOP56, and c9orf72.
More information can be found in the following documentation:
- 30182 Bionano Solve Installation Guide.
- 30205 Guidelines for Running Solve on the Command Line.
- 30194 How to Align a BNX to Reference.
Molecule distance script is authored by Joyce Lee, Syukri Shukor, Jillian Burke, and Andy Pang. For any questions, please reach out to Syukri Shukor ([email protected]) or Andy Pang ([email protected]) for questions and issues.