Releases: iquasere/reCOGnizer
Fix on database creation from PN files
Databases' names changed to CD-batch search options
Databases' names inputted to the --databases
has changed to accomodate the options present at CDD Batch Search. The new options are:
NCBI_Curated
Pfam
SMART
KOG
COG
PRK
TIGR
Domains now follow the lists at the PN files provided by NCBI
Domains related to the NCBI_Curated
and PRK
databases were not all being considered when building databases. This has been fixed, in accordance to the PN files provided with cdd.tar.gz
.
Database construction reimplemented to use the PN files provided by CDD
If those are not available, reCOGnizer will still build the PN files, but with more added domains.
This should fix #19. But lets see.
Also removed deprecated parameters
--download-resources
and --skip-downloaded
parameters now will result in error when specified.
Fix on regex search of EC numbers
re.escape
is required for handling the regex search where strings are being concatenated.
E.g. to consider the literal )
when searching for (1.1.1.1)
, in the function in question.
This problem was caused by using the new r"regex"
format.
Simpler download of databases and more robust COG2KO conversion
Much simpler download of databases
reCOgnizer relied on --download-resources
and --skip-downloaded
parameters for setting up its databases.
--download-resources
instructed reCOgnizer to download the files required for its execution, and --skip-downloaded
instructed it to ignore already downloaded files, if there had simply been the mistake of removing one file.
Now, reCOGnizer relies on the recognizer_dwnl.timestamp
to check if databases have already been downloaded. If the file exists, it skips installation. If the file doesn't exist, reCOGnizer will remove all available files, and download everything.
COG2KO conversion more reliable
Previously, reCOGnizer built the cog2ko conversion as a collection of all KOs available for each protein mapping to the specific COG.
Now, reCOGnizer uses a similar approach to cog2ec conversion, where it will only assign a KO to a COG where over half of instances of that COG have that particular KO.
This obtains a more reliable COG2KO conversion, while keeping KOs for a considerable number of COGs.
Also removes the intermediate ssv
files outputted during construction of the cog2ko database.
New parameters --test-run and --output-rpsbproc-columns will usually not be needed
--test-run
parameter had to be implemented as consequence of a simpler database downloading. When set, reCOGnizer runs in an abnormal fashion, which is required for the tests at GitHub. reCOGnizer will move the cdd.tar.gz
file available in the repo, and use it as a valid cdd.tar.gz
file.
--output-rpsbproc-columns
will output the Superfamilies
, Sites
, Motifs
columns, which are usually empty for almost all annotations.
Removed some unnecessary files
recognizer.log
was produced at working directory. It only included rpsblast
outputs, mainly for error assessment. Users can obtain that information by running reCOGnizer with the --debug
parameter, and manually running the faulty commands.
taxonomy.rdf
was obtained as part of building taxonomy.tsv
. Now, reCOgnizer removes it after it outlived its usefulness.
Some fixes
reCOGnizer was not reporting the download of files when the --quiet
flag was set, except when the files had already been downloaded, and it removed them.
Also updated regexes to new format, the r'regex'
format.
Fixed KOG outputting
rpsbproc doesn't work with the KOG database.
reCOGnizer's KOG report is now made directly from BLAST 6.
Fix when only downloading resources
reCOGnizer wasn't properly checking if --file
parameter had been imputed. Therefore, reCOGnizer still attempeted to perform annotation and searched for annotation outputs, when no --file
argument was specified.
Now, it's working properly.
Custom databases workflow now multithreaded
Now works multithreaded
Removed -db
parameter. Incorporated into -dbs
.
--custom-database
changed to --custom-databases
to reflect this change.
Added input sanitization for custom/default databases. Only custom or default databases can be used at the same time.
Also some necessary changes on the tests
latest
image of miniconda is not funcitonal, fixed version on 22.11.1
.
Added test for custom-database-workflow
.
Tests now simultaneous, instead of one at a time.
Fixed several annoyances
No more need to confirm you don't want to gunzip download resource files
If --skip-downloaded
was set, reCOGnizer will both skip the downloading and gunzipping.
No more FutureWarning when trying to sum COGs
.sum(numeric_only=True)
fixed that.
reCOGnizer is called without ".py"
Now called as "recognizer"
reCOGnizer was always called through the shell as recognizer.py
. Now, is called with recognizer
.
Now removes intermediate folders
Unused directories - tmp
, rpsbproc
, et al, whose files were removed, are now themselves removed.
Also, several fixes
Fixed conversion COG2KO.
Fixed future warning - xlsx_report.save()
to xlsx_report.close()
.
Updated documentation
Added a nice interactive krona plot.
Also corrected the parameters, and talked about the taxonomy thing.
Fix on outputting COG categories
Due to reformatting how reCOGnizer outputs information, its capacity for outputting COG categories was damaged.
It is fixed now.