Skip to content

Commit

Permalink
Squashed 'lib/foldseek/' changes from eec10926..33103374
Browse files Browse the repository at this point in the history
33103374 Add parameter for taxonomy report in easy-search (#389)
3cad1360 Fix cluster reassign + tm-align #383
b43e63d7 Rework residue mapping to combine most gemmi AAs with prev FS AAs #387
8c3e3938 Merge pull request #385 from rachelse/master
7a3a9db3 Merge branch 'steineggerlab:master' into master
214886bb Update citations
d295eec6 Replace rust with rustup in github CI
8485aaf9 Switch mac arm test to github hosted runner
b011b8e4 lddt works when chaintm 0
d6056322 Fix convertalis for FS multimer
d2d09b58 Merge pull request #330 from stromjm/interface_v2.0
4e514a2a Merge branch 'steineggerlab:master' into interface_v2.0
8daecacc except overlap
e1f38a1e Merge pull request #362 from rachelse/master
6b00c5d9 Merge pull request #366 from rachelse/steineggerlab
c18727e4 Deleted search-clust pipeline from README
3d85d5c4 minor
d692f966 there could exist no match against itself
e1f238df error
a078fb9f typo in readme
24c8c92b treat monomer as singleton when scoremultimer uses --monomer-include-mode 1
1d5f9369 fix interface extraction exceptions
c27a629a single chain alignment bug fixed
19c8820c Merge pull request #359 from steineggerlab/multimer
079a5a13 monomer related update done
06275df3 rollback to 43fd26f3d3e043c8f9fd4c2b193a8b68f8781689
4046f00f test rbh filter off
13e9883c Merge branch 'multimer' of https://github.com/steineggerlab/foldseek into multimer
2dadffc0 test rbh filter off
83cc643b update single chain cluster
a17598cd build exceptions for interface mode
43fd26f3 remove tmscore threshold
704c3a82 fix chain cov ratio
1800e6a9 update for single chained alignments
cd26d54c implement complex-tm-threshold
0b1fa423 typo
c74a1a5f replace singlechain mode into  monomer mode
34104148 merged steineggerlab/foldseek
d414d908 merged steineggerlab/foldseek
232a4c43 Merge pull request #353 from rachelse/master
19595fd5 order in LocalParameters
c5412cf7 Merge remote-tracking branch 'upstream/master' into interface_v2.0
c1a6b76c merged steineggerlab/foldseek
88093635 Merge pull request #354 from steineggerlab/test
d267b3d8 bug fixed
7f2c6219 bug fix try4
abff375e bug fix try3
25e9629c bug fix try2
d9b2913b bug fix try1
8711f6b7 minor things
48666e18 typo in Readme
154019b6 Merge remote-tracking branch 'upstream/master'
a2ec51d4 fix parameter explain
306ffb82 add parameter for single chained assignments
b7c58acb Update regression
2c6b809d Update regression and convertalis ALNTMSCORE score
7e6be60a Revise alignment TMscore computation
ab20120e scoremultimer
319144b4 minor
7112fccb minor
e9b0f234 check alinged chain num when interfacelddt
2256e219 Merge pull request #345 from steineggerlab/foldseek_multiple_with_singlechain
9a2b7da4 bug fixed: single elemented vector for single chain alignment
d99d79c1 single chain allowing multimersearch
52029c06 Added BFVD as a foldseek database (#344)
af1e86e5 default parameters
e1d15f64 alnLen seems much better
5b76247b minor
ac0a32b1 minor
06016bdd minor
fd87b160 added parameters in example in Readme
17402748 added coverage in Readme
f1fc6a96 minor
1b39c482 big complex first to prevent big ones to be left and run alone for few hours using 1 thread
08b7e9ce previous version is twice faster
de945b2b simd returns segfault
602ff37b Merge branch ‘steineggerlab:master’ into master
be9fc339 Merge pull request #343 from steineggerlab/foldseek_multimer_bottleneck
3bcdabae checkChainRedundancy with unordered_sets
5a4ad0be foundNeighbors as an unordered_set
b15e236a foldseek-multimer bottleneck solved
b947f688 implement foundNeighbors
89f371bb DBSCAN as non-recursive function
9f603b29 check new scoremultimer
4f70b3f4 Readme
50c1df1b Readme
4f1592a4 Readme
c3093cd3 outputs filtcov too
420038de chaging readme
3fd78777 chaging readme
aa2ced33 all_seqs.fasta not working
d1605fe7 SIMD for tmscore
72f5028c map to vector, complex to multimer, [TODO] check if speed improved
63bca7b2 remove distMap
73e41342 seq3di.clear() in GemmiWrapper
2758b96e Merge branch 'master' into interface_v2.0
9d74a1bf orders of Parameters
94874214 make mergeable
2b731d3c Updated regression
02fb1e58 Updated submodule
17986f4c code styles
6db582ea made filtermultimer to get one argument as output. 'output' and 'outout_info' will be the actual outputs
a6bee293 important issue solved, thread_idx while writing
4bbdca4f minor
aeacb68b changed way to buffer for ustring, tstring
05a80c58 filtcov.tsv to complex_filt_info file. createtsv query query complex_filt_info filtcov.tsv possible
51197117 minor
868bfb1c Update README.md
3ed737c0 Add --tmscore-threshold-mode to allow to switch normalization
552e18dc Fix convertalis and alignment normalization
6b77a4f6 code styles
b40729c1 Fix issue steineggerlab/foldseek#312 alntmscore is now normalized by the backtrace length
0415c37c minor issues
edef0856 two db lines for each interface
9730f059 minor
2639127a Camel
ce4528b1 Use MathUtil squaredist
69d397b2 consistency loss with multithreading solved
a21576a0 mergeable, also only if at least 4 residues
0d82857c mergeable, only if at least 4 residues
aefcffca Addition of interface code
6740f823 merged steineggerlab/foldseek
71b1f38f complex_h
56d3adbd monomer in scoremultimer
aa30bec5 NogridInterface
928984bf Add BFMD database to repository
bde99a74 Add ungappedprefilter to it. profile searches
16dc9150 complex-multimer DBSCAN earlystop with maxClusterNum
04876ca2 Merge commit '97d4c6cfb57bb7f0994015580579f31a18aaf9c5'
97d4c6cf Squashed 'lib/mmseqs/' changes from 804bb2af6d..ffb05619ca
0f6bb3cc Deleted original interfaceLDDT code file
22d24ffc Separated interface retrieving and saving
7b5e7287 Implemented interfaceLDDT but naive
e478a324 Saved aligned coordinates into vector but cannot use SIMD operations
75013627 Sync with master branch
c86d2ce3 solved complex_db_h for monomers
50208e9b Merged commit with review
3d26d2ee commit before pull
27756597 changed order of elements in struct and class for memory
e35f355f setting default parameter collides with existing default values
543db3ad createtsv with --threads 1 to make complex_db_h in order
a8f0a091 The mistake was not a big problem. One stage before putting iLDDT code
3fd0dab7 Corrected targetcomplexid mistake & chain number comparison
6a94924e Corrected mistake: Saved dbKey as target complex id so far..
3123bae1 Removed redundant loops and improved performance
cdf6e786 minor, input->query
e44ea30e Made Complex struct and implementation is in progress
3922544e Inactivated filter-mode param: chainNum & conformation is affected
7e3e4764 Recovery point : saved previous iLDDT implementation
b6943b8e Merge branch 'steineggerlab:master' into master
6494f8a6 minor
bc212bc8 FoldseekBase.cpp update (#306)
3df6bc46 solved everything
7635ea3d minor
a82587c6 minor
c5f59d20 Merge branch 'steineggerlab:master'
ee77f9d7 Merge
4604c238 complex to multimer
25812ffa Try moving to macos 11 in azure pipelines CI
ebfdc666 Revert "Fix GCC 14 warnings"
044806f3 Fix GCC 14 warnings
59d2a253 Fix pymol mmcif files breaking gemmi (upstreamed here project-gemmi/gemmi#325)
e06bc508 octant
1da321c2 not done, but added vector check
e8469df0 Update filtercomplex.cpp
5b10e67f Update filtercomplex.cpp
cb0a43ec minor
b31f2ada reset
8bc07703 reset
cb277387 [MAYBE SOLVED} chainTM
c411e323 DbKey to AlnId/DbId
01a39259 Check if no aligned chain exists
693d723d res.Len seems right
3855d2e8 Look at this. ChainTM goes higher than 1
a8f6588f simple
09b4e410 Solved Multithreading
fe0c9383 [TODO] multithreading segfault
e4abea41 Revert "maybe solved chain TM"
27f9ac86 maybe solved chain TM
0430e9e5 Calculate chain TM everytime
a78bbd5d Set default param as set4final when computing chaintmscore
e333ad48 Made few comments reviewing filtercomplex.cpp
8f2ab715 simplify building complex header
cf28e076 parsing problem solved
55b5338c memcpy error solve
6f3ac2b6 parsing with pdb
d02373d8 parsing
961b8cf0 removing extension
a2ea4743 make filtcov.tsv not db
0bc9d97d minor
f3a9c22b minor
36490d18 minor change
77936ab3 handling monomer & calculate chainTM if complexTM satisfied
a0b426ef Merge branch 'steineggerlab:master' into master
79ad721c Solved weird chain TM-score behavior
81fbfd99 Implemented per-chain-tm but tmscore is suspicious
e44034e6 Implemented realloc function in Coordinate.h
6ef9dc7b modified complex header
94da95c4 complex header make
5e47a2ac still
99251fd6 complexheader, but still issue exists
4e7c3624 Revised code of filter complex
a27efde9 tmthreshold parameter
52f0459a TODO maybe TMthreshold
c1294530 both tm for all cov modes
7367a247 assID, query, target, coverage(1 or 2), tm(1 or 2)
ff5a8e5f filtercomplex tmp coverage.tsv
73c2aa7a Merge branch 'steineggerlab:master' into master
369842e2 Solved argument list too long issue
4329b254 Finalized rmdb
81ac5ef5 Generates comment about rep complex in fastafile
b7a27454 easy-cc description
b25de75f remove tmp files
4ceb9ada Completed to output rep seqs fasta file
560c6e49 temporary Result2repseq
d16ac100 tmp remove
7e5bd089 changed tmp dir
86f5fe9e colsed easycc
412b51d0 header file
fa875fdf making complex header file
fe9865cb small changes
819a75a8 add description
06945aed Parameters
7d34bcbb default parameters
e3defcd4 separated buildCmplDb from filtercomplex
83100b8e Solved complexsearch parameter not applied problem
607c14fc Success command run
9b183905 share status
7e054b6f [DONE] Build successed. [TODO] Default Parameter setting
6ac16224 finally make works
c3a7c959 [TODO] Solve conflicts during make
8dd17b24 Organized shell scripts
a36b8c7e git conflicts
6db40b57 tmp LocalParameters.cpp
dd34d670 still build failed
aebd3fd8 small changes
b9d73150 .cpp files
02a89148 easycc and cc .sh
dbd9b076 To Complexclusterworkflow
f7b9508e Changes
03c635e5 Changed ComplexCluster into FilterComplex
ec234b1d revised parameters for filtercomplex
39b2f062 renamed complexcluster.sh to filtercomplex.sh and finalized
47cfb386 share status
40a0e719 to share status
413faeeb Add filtercomplex parameter for coverage
8667be3d [TODO] Build failed. check localparameters, workflowfiles, etc.
3782b550 Made workflow file
e15a2241 [IN PROGRESS] separated complexcluster and easycomplexcluster but need to organize
84c5279f FoldSeelBase.cpp should be changed though, easy-complexcluster output instruction
017ad0fe Updated LocalParameter files
83a46549 clustered results to flatfiles
38b60958 data/CMakeLists update
46611c48 CMakeLists update
51d29b8c Changed complexcluster.sh to easycomplexcluster.sh
69876616 Merge branch 'master' of https://github.com/rachelse/foldseek
b5c45c37 minor modification
ef00e785 [IN PROGRESS] Draft state complexcluster.sh
5ac175fd erased default -c 0.8
aaf1a6b1 complexclust.sh
e52c527a cleaned code
b7bc37fc -c default 0.8
d81811f9 TODO: select highest aligned alignments among same complex-complex & what if user wants to use -c 0.0?
5adeb999 no errors, not debugged yet
03860d19 has error, but for sharing status. Coverge criteria
47c37e08 Merge branch 'steineggerlab:master' into master
35c5914a Merge branch 'steineggerlab:master' into master
bb7ec93b First version for complex filter

git-subtree-dir: lib/foldseek
git-subtree-split: 3310337471fc46880c245508af6a23adcb192cee
  • Loading branch information
gamcil committed Dec 4, 2024
1 parent 15a2e1e commit 38a7b3c
Show file tree
Hide file tree
Showing 58 changed files with 2,586 additions and 386 deletions.
8 changes: 6 additions & 2 deletions .github/workflows/mac-arm64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,20 @@ on:

jobs:
build:
runs-on: [self-hosted, macOS, ARM64]
runs-on: macos-latest
steps:
- uses: actions/checkout@v3
with:
submodules: true

- name: Dependencies
run: |
brew install -f --overwrite cmake libomp rustup
rustup-init --profile minimal -q -y
- name: Build
run: |
mkdir -p build
rustup update
cd build
LIBOMP=$(brew --prefix libomp)
cmake \
Expand Down
125 changes: 95 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,26 +14,45 @@ Foldseek enables fast and sensitive comparisons of large protein structure sets.
# Table of Contents

- [Foldseek](#foldseek)
- [Webserver](#webserver)
- [Installation](#installation)
- [Memory requirements](#memory-requirements)
- [Tutorial Video](#tutorial-video)
- [Documentation](#documentation)
- [Quick Start](#quick-start)
- [Search](#search)
- [Output](#output-search)
- [Important Parameters](#important-search-parameters)
- [Alignment Mode](#alignment-mode)
- [Structure search from FASTA input](#structure-search-from-fasta-input)
- [Databases](#databases)
- [Create Custom Databases and Indexes](#create-custom-databases-and-indexes)
- [Cluster](#cluster)
- [Output](#output-cluster)
- [Important Parameters](#important-cluster-parameters)
- [Multimer](#multimersearch)
- [Output](#multimer-search-output)
- [Main Modules](#main-modules)
- [Examples](#examples)
- [Publications](#publications)
- [Table of Contents](#table-of-contents)
- [Webserver](#webserver)
- [Installation](#installation)
- [Memory requirements](#memory-requirements)
- [Tutorial Video](#tutorial-video)
- [Documentation](#documentation)
- [Quick start](#quick-start)
- [Search](#search)
- [Output Search](#output-search)
- [Tab-separated](#tab-separated)
- [Superpositioned Cα only PDB files](#superpositioned-cα-only-pdb-files)
- [Interactive HTML](#interactive-html)
- [Important search parameters](#important-search-parameters)
- [Alignment Mode](#alignment-mode)
- [Structure search from FASTA input](#structure-search-from-fasta-input)
- [Databases](#databases)
- [Create custom databases and indexes](#create-custom-databases-and-indexes)
- [Cluster](#cluster)
- [Output Cluster](#output-cluster)
- [Tab-separated cluster](#tab-separated-cluster)
- [Representative fasta](#representative-fasta)
- [All member fasta](#all-member-fasta)
- [Important cluster parameters](#important-cluster-parameters)
- [Multimersearch](#multimersearch)
- [Using Multimersearch](#using-multimersearch)
- [Multimer Search Output](#multimer-search-output)
- [Tab-separated-complex](#tab-separated-complex)
- [Complex Report](#complex-report)
- [Multimercluster](#multimercluster)
- [Output MultimerCluster](#output-multimercluster)
- [Tab-separated multimercluster](#tab-separated-multimercluster)
- [Representative multimer fasta](#representative-multimer-fasta)
- [Filtered search result](#filtered-search-result)
- [Important multimer cluster parameters](#important-multimer-cluster-parameters)
- [Main Modules](#main-modules)
- [Examples](#examples)
- [Rescore aligments using TMscore](#rescore-aligments-using-tmscore)
- [Query centered multiple sequence alignment](#query-centered-multiple-sequence-alignment)

## Webserver
Search your protein structures against the [AlphaFoldDB](https://alphafold.ebi.ac.uk/) and [PDB](https://www.rcsb.org/) in seconds using the Foldseek webserver ([code](https://github.com/soedinglab/mmseqs2-app)): [search.foldseek.com](https://search.foldseek.com) 🚀
Expand Down Expand Up @@ -238,6 +257,7 @@ MCAR...Q
| --cov-mode | Alignment | 0: coverage of query and target, 1: coverage of target, 2: coverage of query |
| --min-seq-id | Alignment | the minimum sequence identity to be clustered |
| --tmscore-threshold | Alignment | accept alignments with an alignment TMscore > thr |
| --tmscore-threshold-mode | Alignment | normalize TMscore by 0: alignment, 1: representative, 2: member length |
| --lddt-threshold | Alignment | accept alignments with an alignment LDDT score > thr |


Expand Down Expand Up @@ -300,9 +320,64 @@ The default output fields are: `query,target,fident,alnlen,mismatch,gapopen,qsta
1tim.pdb.gz 8tim.pdb.gz A,B A,B 0.98941 0.98941 0.999983,0.000332,0.005813,-0.000373,0.999976,0.006884,-0.005811,-0.006886,0.999959 0.298992,0.060047,0.565875 0
```

### Multimercluster
The `easy-multimercluster` module is designed for multimer-level structural clustering(supported input formats: PDB/mmCIF, flat or gzipped). By default, easy-multimercluster generates three output files with the following prefixes: (1) `_cluster.tsv`, (2) `_rep_seq.fasta` and (3) `_cluster_report`. The first file (1) is a [tab-separated](#tab-separated-multimercluster) file describing the mapping from representative multimer to member, while the second file (2) contains only [representative sequences](#representative-multimer-fasta). The third file (3) is also a [tab-separated](#filtered-search-result) file describing filtered alignments.

Make sure chain names in PDB/mmcIF files does not contain underscores(_).

foldseek easy-multimercluster example/ clu tmp --multimer-tm-threshold 0.65 --chain-tm-threshold 0.5 --interface-lddt-threshold 0.65

#### Output MultimerCluster
##### Tab-separated multimercluster
```
5o002 5o002
194l2 194l2
194l2 193l2
10mh121 10mh121
10mh121 10mh114
10mh121 10mh119
```
##### Representative multimer fasta
```
#5o002
>5o002_A
SHGK...R
>5o002_B
SHGK...R
#194l2
>194l2_A0
KVFG...L
>194l2_A6
KVFG...L
#10mh121
...
```
##### Filtered search result
The `_cluster_report` contains `qcoverage, tcoverage, multimer qTm, multimer tTm, interface lddt, ustring, tstring` of alignments after filtering and before clustering.
```
5o0f2 5o0f2 1.000 1.000 1.000 1.000 1.000 1.000,0.000,0.000,0.000,1.000,0.000,0.000,0.000,1.000 0.000,0.000,0.000
5o0f2 5o0d2 1.000 1.000 0.999 0.992 1.000 0.999,0.000,-0.000,-0.000,0.999,-0.000,0.000,0.000,0.999 -0.004,-0.001,0.084
5o0f2 5o082 1.000 0.990 0.978 0.962 0.921 0.999,-0.025,-0.002,0.025,0.999,-0.001,0.002,0.001,0.999 -0.039,0.000,-0.253
```
The query and target coverages here represent the sum of the coverages of all aligned chains, divided by the total query and target multimer length respectively.

#### Important multimer cluster parameters

| Option | Category | Description |
|-------------------|-----------------|-----------------------------------------------------------------------------------------------------------|
| -e | Sensitivity | List matches below this E-value (range 0.0-inf, default: 0.001); increasing it reports more distant structures |
| --alignment-type| Alignment | 0: 3Di Gotoh-Smith-Waterman (local, not recommended), 1: TMalign (global, slow), 2: 3Di+AA Gotoh-Smith-Waterman (local, default) |
| -c | Alignment | List matches above this fraction of aligned (covered) residues (see --cov-mode) (default: 0.0); higher coverage = more global alignment |
| --cov-mode | Alignment | 0: coverage of query and target (cluster multimers only with same chain numbers), 1: coverage of target, 2: coverage of query |
| --multimer-tm-threshold | Alignment | accept alignments with multimer alignment TMscore > thr |
| --chain-tm-threshold | Alignment | accept alignments if every single chain TMscore > thr |
| --interface-lddt-threshold | Alignment | accept alignments with an interface LDDT score > thr |

## Main Modules
- `easy-search` fast protein structure search
- `easy-cluster` fast protein structure clustering
- `easy-multimersearch` fast protein multimer-level structure search
- `easy-multimercluster` fast protein multimer-level structure clustering
- `createdb` create a database from protein structures (PDB,mmCIF, mmJSON)
- `databases` download pre-assembled databases

Expand All @@ -324,16 +399,6 @@ foldseek createtsv queryDB targetDB aln_tmscore aln_tmscore.tsv

Output format `aln_tmscore.tsv`: query and target identifiers, TMscore, translation(3) and rotation vector=(3x3)

### Cluster search results
The following command performs an all-against-all alignments of the input structures and retains only the alignments, which cover 80% of the sequence (-c 0.8) (read more about alignment coverage options [here](https://github.com/soedinglab/MMseqs2/wiki#how-to-set-the-right-alignment-coverage-to-cluster)). It then clusters the results using a greedy set cover algorithm. The clustering mode can be adjusted using --cluster-mode, read more [here](https://github.com/soedinglab/MMseqs2/wiki#clustering-modes). The clustering output format is described [here](https://github.com/soedinglab/MMseqs2/wiki#cluster-tsv-format).

```
foldseek createdb example/ db
foldseek search db db aln tmpFolder -c 0.8
foldseek clust db aln clu
foldseek createtsv db db clu clu.tsv
```

### Query centered multiple sequence alignment
Foldseek can output multiple sequence alignments in a3m format using the following commands.
To convert a3m to FASTA format, the following script can be used [reformat.pl](https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl) (`reformat.pl in.a3m out.fas`).
Expand Down
8 changes: 4 additions & 4 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,10 @@ jobs:
targetPath: $(Build.SourcesDirectory)/build/src/foldseek
artifactName: foldseek-linux-$(SIMD)

- job: build_macos_11
displayName: macOS 11
- job: build_macos
displayName: macOS
pool:
vmImage: 'macos-11'
vmImage: 'macos-12'
steps:
- checkout: self
submodules: true
Expand Down Expand Up @@ -153,7 +153,7 @@ jobs:
pool:
vmImage: 'ubuntu-latest'
dependsOn:
- build_macos_11
- build_macos
- build_ubuntu_2004
- build_ubuntu_cross_2004
steps:
Expand Down
2 changes: 2 additions & 0 deletions data/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ set(COMPILED_RESOURCES
vendor.js.zst
multimersearch.sh
easymultimersearch.sh
multimercluster.sh
easymultimercluster.sh
)

set(GENERATED_OUTPUT_HEADERS "")
Expand Down
163 changes: 163 additions & 0 deletions data/easymultimercluster.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
#!/bin/sh -e
fail() {
echo "Error: $1"
exit 1
}

notExists() {
[ ! -f "$1" ]
}

exists() {
[ -f "$1" ]
}

abspath() {
if [ -d "$1" ]; then
(cd "$1"; pwd)
elif [ -f "$1" ]; then
if [ -z "${1##*/*}" ]; then
echo "$(cd "${1%/*}"; pwd)/${1##*/}"
else
echo "$(pwd)/$1"
fi
elif [ -d "$(dirname "$1")" ]; then
echo "$(cd "$(dirname "$1")"; pwd)/$(basename "$1")"
fi
}

mapCmplName2ChainKeys() {
awk -F"\t" 'FNR==1 {++fIndex}
fIndex==1 {
repName[$1]=1
if (match($1, /MODEL/)){
tmpName[$1]=1
}else{
tmpName[$1"_MODEL_1"]=1
}
next
}
fIndex==2{
if (match($2, /MODEL/)){
if ($2 in tmpName){
repId[$1]=1
}else{
ho[1]=1
}
}else{
if ($2 in repName){
repId[$1]=1
}
}
next
}
{
if ($3 in repId){
print $1
}
}
' "${1}" "${2}.source" "${2}.lookup" > "${3}"
}

postprocessFasta() {
awk ' BEGIN {FS=">"}
$0 ~/^>/ {
# match($2, /(.*).pdb*/)
split($2,parts,"_")
complex=""
for (j = 1; j < length(parts); j++) {
complex = complex parts[j]
if (j < length(parts)-1){
complex=complex"_"
}
}
if (!(complex in repComplex)) {
print "#"complex
repComplex[complex] = ""
}
}
{print $0}
' "${1}" > "${1}.tmp" && mv "${1}.tmp" "${1}"
}

if notExists "${TMP_PATH}/query.dbtype"; then
# shellcheck disable=SC2086
"$MMSEQS" createdb "${INPUT}" "${TMP_PATH}/query" ${CREATEDB_PAR} \
|| fail "query createdb died"
fi

if notExists "${TMP_PATH}/multimer_clu.dbtype"; then
# shellcheck disable=SC2086
"$MMSEQS" multimercluster "${TMP_PATH}/query" "${TMP_PATH}/multimer_clu" "${TMP_PATH}" ${MULTIMERCLUSTER_PAR} \
|| fail "Multimercluster died"
fi

SOURCE="${TMP_PATH}/query"
INPUT="${TMP_PATH}/latest/multimer_db"
if notExists "${TMP_PATH}/cluster.tsv"; then
# shellcheck disable=SC2086
"$MMSEQS" createtsv "${INPUT}" "${INPUT}" "${TMP_PATH}/multimer_clu" "${TMP_PATH}/cluster.tsv" ${THREADS_PAR} \
|| fail "Convert Alignments died"
# shellcheck disable=SC2086
"$MMSEQS" createtsv "${INPUT}" "${INPUT}" "${TMP_PATH}/multimer_clu_filt_info" "${TMP_PATH}/cluster_report" ${THREADS_PAR} \
|| fail "Convert Alignments died"
fi

if notExists "${TMP_PATH}/multimer_rep_seqs.dbtype"; then
mapCmplName2ChainKeys "${TMP_PATH}/cluster.tsv" "${SOURCE}" "${TMP_PATH}/rep_seqs.list"
# shellcheck disable=SC2086
"$MMSEQS" createsubdb "${TMP_PATH}/rep_seqs.list" "${SOURCE}" "${TMP_PATH}/multimer_rep_seqs" ${CREATESUBDB_PAR} \
|| fail "createsubdb died"
fi

if notExists "${TMP_PATH}/multimer_rep_seq.fasta"; then
# shellcheck disable=SC2086
"$MMSEQS" result2flat "${SOURCE}" "${SOURCE}" "${TMP_PATH}/multimer_rep_seqs" "${TMP_PATH}/multimer_rep_seq.fasta" ${VERBOSITY_PAR} \
|| fail "result2flat died"
postprocessFasta "${TMP_PATH}/multimer_rep_seq.fasta"
fi

#TODO: generate fasta file for all sequences
# if notExists "${TMP_PATH}/multimer_all_seqs.fasta"; then
# # shellcheck disable=SC2086
# "$MMSEQS" createseqfiledb "${INPUT}" "${TMP_PATH}/multimer_clu" "${TMP_PATH}/multimer_clust_seqs" ${THREADS_PAR} \
# || fail "Result2repseq died"

# # shellcheck disable=SC2086
# "$MMSEQS" result2flat "${INPUT}" "${INPUT}" "${TMP_PATH}/multimer_clust_seqs" "${TMP_PATH}/multimer_all_seqs.fasta" ${VERBOSITY_PAR} \
# || fail "result2flat died"
# fi

# mv "${TMP_PATH}/multimer_all_seqs.fasta" "${RESULT}_all_seqs.fasta"
mv "${TMP_PATH}/multimer_rep_seq.fasta" "${RESULT}_rep_seq.fasta"
mv "${TMP_PATH}/cluster.tsv" "${RESULT}_cluster.tsv"
mv "${TMP_PATH}/cluster_report" "${RESULT}_cluster_report"

if [ -n "${REMOVE_TMP}" ]; then
rm "${INPUT}.0"
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${TMP_PATH}/multimer_db" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
# "$MMSEQS" rmdb "${TMP_PATH}/multimer_clu_seqs" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${TMP_PATH}/multimer_rep_seqs" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${TMP_PATH}/multimer_rep_seqs_h" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${TMP_PATH}/complex_clu" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${TMP_PATH}/query" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${TMP_PATH}/query_h" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${INPUT}" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${INPUT}_h" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${TMP_PATH}/query_ca" ${VERBOSITY_PAR}
# shellcheck disable=SC2086
"$MMSEQS" rmdb "${TMP_PATH}/query_ss" ${VERBOSITY_PAR}
rm "${TMP_PATH}/rep_seqs.list"
rm -rf "${TMP_PATH}/latest"
rm -f "${TMP_PATH}/easymultimercluster.sh"
fi
6 changes: 6 additions & 0 deletions data/easystructuresearch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ if notExists "${TMP_PATH}/alis.dbtype"; then
|| fail "Convert Alignments died"
fi

if [ -n "${TAXONOMY}" ]; then
# shellcheck disable=SC2086
"$MMSEQS" taxonomyreport "${TARGET}${INDEXEXT}" "${INTERMEDIATE}" "${RESULTS}_report" ${TAXONOMYREPORT_PAR} \
|| fail "taxonomyreport died"
fi

if [ -n "${REMOVE_TMP}" ]; then
if [ -n "${GREEDY_BEST_HITS}" ]; then
# shellcheck disable=SC2086
Expand Down
Loading

0 comments on commit 38a7b3c

Please sign in to comment.