Skip to content
Mei-Ju May Chen edited this page Dec 16, 2016 · 18 revisions

Q: Licensing terms for this project?

This software/database is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the author's official duties as a United States Government employee and thus cannot be copyrighted. This software/database is freely available to the public for use. The National Agriculture Library and the U.S. Government have not placed any restriction on its use or reproduction. (Please see LICENCE.md)

Q: What kind of errors can be detected by gff-QC.py? (Detection of GFF3 format errors: gff-QC.py)

Currently, ~50 types of formatting errors can be detected. Errors are detected by reviewing three types of feature sets in a GFF3 file, and thus are grouped into three categories (Error category – feature type):

  • Intra-model errors (Ema) – multiple features within a model
  • Inter-model errors (Emr) – multiple features across models
  • Single feature errors (Esf) – each single feature.

Please view the wike page of QC phase for the full list of the detected error types.

Q: Why are the lines of the sorted gff different from the input? (Sort a GFF3 file: gff3-sort.py)

The gff3-sort.py would automatically ignore the hash tag lines other than ##gff-version 3 and ###. Therefore, the total lines of the output file might be different from the input. To check the consistency of the lines, please use the following command,

grep -v "#" input.gff |wc -l

grep -v "#" sorted.gff |wc -l

In addition, if your input gff file contains a feature has two or more parent IDs, the program would replicate the feature and list it under each parent. Thus, the lines of output would be more than the input.

Q: Which codons are considered for translation? (Genernate biological sequences from a GFF3 file: gff3_to_fasta.py)

Translation from 64 combitions of standard codons (Only standard codons and universal stop condons are considered.)