Skip to content

ferrojm/python3-scripts

Repository files navigation

Python3-scripts

general python scripts and tools for working with NGS data

AnalyzeLandscapes.py-plot repetitive landscapes using a RepeatMasker divsum output (AnalyzeLandscapes.py -h to see all options). IMPORTANT: use python 3.6 or earlier and pandas v.1.1 or earlier (e.g., for creating a miniforge3 environment # mamba create -n alandscapes -c bioconda python=3.6 pandas=1.1 biopython numpy matplotlib)

divsum_splitter.py-searches for the 'Div' string in a divsum file output from rmasker and exports a new csv table for landscapes plot. (divsum_spliter.py file.divsum)

extract_contigs_RExp.py CLN contigs.fa - extract specific contigs from repeatexplorer contigs.fa file using seqtk. Useful for loops using a list of clusters; eg. cat list | while read line; do extract_contigs_Rexp.py $line contigs.fa; done

pick_fasta_lenght.py -f <file.fasta> -l <int_length> - filters sequences keeping those above a desired length

read_reads.py -i <input.fasta/fastq> -t <type of file: fasta/fastq> - uses a fasta/fastq file as input and plot a histogram of reads length distribution

rename_header_rexp.py -i <file.fasta> - Rename headers id of a fasta file as "name#sat/name" for using as a custom library for RepeatExplorer (Rmasker), also checks headers lenght (< 50) and possible misspellings

pcr_main.py -t <target.fasta> -f <forward_primer_string> -r <reverse_primer_string> -m <modes= pcr, mpcr, N> - PCR in silico from a template using seqkit, extract a PCR product from one (pcr mode) or multiple files (mpcr mode), also search an exact match in a target and replace nucleotides with N (N mode)

count_bases.py - counts the number of bases in all the sequences of a fasta file. made by: @mylena-s

About

python3 scripts for working with NGS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published