Skip to content
Mei-Ju May Chen edited this page Dec 16, 2016 · 18 revisions

Q: Licensing terms for this project?

This software/database is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the author's official duties as a United States Government employee and thus cannot be copyrighted. This software/database is freely available to the public for use. The National Agriculture Library and the U.S. Government have not placed any restriction on its use or reproduction. (Please see LICENCE.md)

Detection of GFF3 format errors

Q: What kind of errors can be detected by gff-QC.py?

Currently, ~50 types of formatting errors can be detected. Errors are detected by reviewing three types of feature sets in a GFF3 file, and thus are grouped into three categories (Error category – feature type):

  • Intra-model errors (Ema) – multiple features within a model
  • Inter-model errors (Emr) – multiple features across models
  • Single feature errors (Esf) – each single feature.

Please view the wike page of QC phase for the full list of the detected error types.

Sort a GFF3 file

Q: Why are the lines of the sorted gff different from the input?

The gff3-sort.py would automatically ignore the hash tag lines other than ##gff-version 3 and ###. Therefore, the total lines of the output file might be different from the input. To check the consistency of the lines, please use the following command,

grep -v "#" annotations1.gff |wc -l

grep -v "#" annotations1_sorted.gff |wc -l

In addition, if your input gff file contains a feature has two or more parent IDs, the program would replicate the feature and list it under each parent. Thus, the lines of output would be more than the input.

Genernate biological sequences from a GFF3 file

Q: Which codons are considered for translation?

Translation from 64 combitions of standard codons (Only standard codons and universal stop condons are considered.)