You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Output from meeting with Alexander and Chaoran on 25.11.2024 – the final architecture of V-pipe & Loculus:
The difficulty: The crux is that V-Pipe's input and output data are much larger than the consensus sequences that Loculus was designed for. 14kb << 600 Mb. Thus, there are some caveats to how Loculus can handle V-Pipe data. Mainly, big data files are not passed through the Loculus backend but through S3 bucket references.
Below is the currently envisioned final setup:
V-Pipe and Loculus:
User-uploads raw data via web interface / AWS S3 uploader into an S3 bucket and metadata.
Loculus Backend writes the incoming metadata and S3 raw references into the PostgreSQL
Loclulus backend dispatches the metadata and S3 raw references to the Pipeline: V-Pipe - GET request
V-Pipe fetches the raw data from that S3 bucket and processes them
V-Pipe final step: sr2silo activates to process V-Pipe's output.bam to .ndjson of merged, paired reads enriched with the metadata
sr2silo uploads the .ndjson to another S3 bucket / and POST request the metadata and S3 URL to the Loculus Backend
Loculus Backend sends the metadata .ndjson and S3 URLs to SILO preprocessing.
Silo preprocessing enriches the ndjson with the full sequences fetching the correct reads from the S3 output in 6)
Silo pre-processing indexes on all files in that massive ndjson, as it needs all sequences
V-Pipe outputs to Silo*
This is step 5-6). In the final setup some wrapper code will implement a GET request in V-Pipe to be received from the Loculus backend upon which it will run V-Pipe and return the request with a POST to the Loculus backend and read and write all data but meta, from an S3.
On the output side of this stands the below s2silo. It will be triggered upon completion of V-Pipe. Probably as a docker-compose, to take a single .bam and metadata. It will do:
[temporary] GET request (to be received from Loculus Backend with nothing but an index)
nextclade-like output to ndjson with Rust code from Fabian
enrich ndjson with metadata per line
upload ndjson to S3
POST to Loculus backend with S3 URLs of processed ndjson and metadata
will later be implemented with by the program that prepared inputs and directories for V-pipe, for now we'll artificially import a batch of data with sr2Silo alone.
Doubt: What was the reason for processing the .bam to .ndjson in one place? As compared to having the nextclade-like output stored and processed in front of SILO.
So that only at once place we need to store one file with data and metadata once. Special for wastewater.
Probably the silo_input_transformer will later move to the SILO Preprocessing and we will upload BAMs and metadata to the Loculus backend, yet the BAMs for nucleotides and amino acids already read paired and merged to some S3.
For now this should be all in sr2silo to get going.
Output from meeting with Alexander and Chaoran on 25.11.2024 – the final architecture of V-pipe & Loculus:
The difficulty: The crux is that V-Pipe's input and output data are much larger than the consensus sequences that Loculus was designed for. 14kb << 600 Mb. Thus, there are some caveats to how Loculus can handle V-Pipe data. Mainly, big data files are not passed through the Loculus backend but through S3 bucket references.
Below is the currently envisioned final setup:
V-Pipe and Loculus:
.bam
to.ndjson
of merged, paired reads enriched with the metadata.ndjson
to another S3 bucket / and POST request the metadata and S3 URL to the Loculus Backend.ndjson
and S3 URLs to SILO preprocessing.ndjson
with the full sequences fetching the correct reads from the S3 output in 6)ndjson
, as it needs all sequencesV-Pipe outputs to Silo*
This is step 5-6). In the final setup some wrapper code will implement a GET request in V-Pipe to be received from the Loculus backend upon which it will run V-Pipe and return the request with a POST to the Loculus backend and read and write all data but meta, from an S3.
On the output side of this stands the below s2silo. It will be triggered upon completion of V-Pipe. Probably as a docker-compose, to take a single
.bam
and metadata. It will do:[temporary] GET request (to be received from Loculus Backend with nothing but an index)
read-processing (bam->sam->pair & merge -> align & translate i.e. nextclade like)
nextclade-like output to
ndjson
with Rust code from Fabianenrich
ndjson
with metadata per lineupload
ndjson
to S3POST to Loculus backend with S3 URLs of processed ndjson and metadata
will later be implemented with by the program that prepared inputs and directories for V-pipe, for now we'll artificially import a batch of data with sr2Silo alone.
Sub-Issues:
silo-input-transformer
intosr2silo
transform tondjson
#46ndjson
with Metadata #47Open Questions
The text was updated successfully, but these errors were encountered: