Skip to content

hrluo93/GenomeAnnotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 

Repository files navigation

GenomeAnnotation

This project aims to create an easy way to annotate genomes.

Personally suggested that input transcripts from Hiast2+TransDecoder(cufflinks_gtf_to_alignment_gff3.pl) to PASA

###########update#################

V2.sh: More simplicity, faster, and no frameshift

RNA-based: Hisat2+TransDdecoder

Ab-initio: braker3

Homology: Complete structure from Miniprot

########### Post PASA ############

This (https://github.com/hrluo93/python4bio/blob/main/false-gene-model.py) script can check if the annotation contains false gene models.

###Soft-masked genome would result in TE contained in annotation. We used OrthoFinder to filter annotation results to retain orthologous genes and remove non-orthologous with 1 or 2 exons.

##Target species (Gene ID like NNYC0000010.1 )in Orthogroups.GeneCount.tsv $3 with reference species in $2 and $4

orthofinder -f orthof -og -M msa -t 12 -S blast_gz

cd orthof/*/Orthogroups/

cat Orthogroups.GeneCount.tsv | awk '{if ($2 > 0 || $4 >0) print}' | awk '{if ($3 > 0) print}' > nny.allortho.count.tsv

awk 'FNR==NR {a[$1]=$0;next} $1 in a {print a[$1],$0}' nny.allortho.count.tsv Orthogroups.tsv > nny.merge.tsv

grep -o image nny.merge.tsv | cut -f1 -d "." > nny.orthogene.list

#nny.orthogene.list contained all orthologous genes that should kept. Non-orthologous with 1 or 2 exons can be found via TBTools GXF STAT or any other method you prefer.

######################################################

RNA-seq+homology+ab-initio based annotation.sh used in the great bustard genome. annotation1

About

Genome annotation in a easy way

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages