meta: sr2silo for Loculus Architecture #45

gordonkoehn · 2024-11-25T15:48:08Z

Output from meeting with Alexander and Chaoran on 25.11.2024 – the final architecture of V-pipe & Loculus:

The difficulty: The crux is that V-Pipe's input and output data are much larger than the consensus sequences that Loculus was designed for. 14kb << 600 Mb. Thus, there are some caveats to how Loculus can handle V-Pipe data. Mainly, big data files are not passed through the Loculus backend but through S3 bucket references.

Below is the currently envisioned final setup:

V-Pipe and Loculus:

User-uploads raw data via web interface / AWS S3 uploader into an S3 bucket and metadata.
Loculus Backend writes the incoming metadata and S3 raw references into the PostgreSQL
Loclulus backend dispatches the metadata and S3 raw references to the Pipeline: V-Pipe - GET request
V-Pipe fetches the raw data from that S3 bucket and processes them
V-Pipe final step: sr2silo activates to process V-Pipe's output.bam to .ndjson of merged, paired reads enriched with the metadata
sr2silo uploads the .ndjson to another S3 bucket / and POST request the metadata and S3 URL to the Loculus Backend
Loculus Backend sends the metadata .ndjson and S3 URLs to SILO preprocessing.
Silo preprocessing enriches the ndjson with the full sequences fetching the correct reads from the S3 output in 6)
Silo pre-processing indexes on all files in that massive ndjson, as it needs all sequences

V-Pipe outputs to Silo*

This is step 5-6). In the final setup some wrapper code will implement a GET request in V-Pipe to be received from the Loculus backend upon which it will run V-Pipe and return the request with a POST to the Loculus backend and read and write all data but meta, from an S3.

On the output side of this stands the below s2silo. It will be triggered upon completion of V-Pipe. Probably as a docker-compose, to take a single .bam and metadata. It will do:

[temporary] GET request (to be received from Loculus Backend with nothing but an index)
read-processing (bam->sam->pair & merge -> align & translate i.e. nextclade like)
nextclade-like output to ndjson with Rust code from Fabian
enrich ndjson with metadata per line
upload ndjson to S3
POST to Loculus backend with S3 URLs of processed ndjson and metadata
will later be implemented with by the program that prepared inputs and directories for V-pipe, for now we'll artificially import a batch of data with sr2Silo alone.

Sub-Issues:

Open Questions

Do users want to download only SNVs? Do users want to download BAMs as well? If so the above would need modification.

The text was updated successfully, but these errors were encountered:

gordonkoehn · 2024-11-28T09:15:57Z

Doubt: What was the reason for processing the .bam to .ndjson in one place? As compared to having the nextclade-like output stored and processed in front of SILO.

So that only at once place we need to store one file with data and metadata once. Special for wastewater.

gordonkoehn · 2024-11-29T16:22:11Z

Sync with Alexander:

Probably the silo_input_transformer will later move to the SILO Preprocessing and we will upload BAMs and metadata to the Loculus backend, yet the BAMs for nucleotides and amino acids already read paired and merged to some S3.

For now this should be all in sr2silo to get going.

gordonkoehn · 2024-12-03T09:27:39Z

See the discussion here:

Raw data sharing loculus-project/loculus#3344

gordonkoehn · 2024-12-19T15:39:29Z

I shall remove this from the board for cleanness.

gordonkoehn added the meta Epic task / Overarching issue label Nov 25, 2024

gordonkoehn self-assigned this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta: sr2silo for Loculus Architecture #45

meta: sr2silo for Loculus Architecture #45

gordonkoehn commented Nov 25, 2024 •

edited

Loading

gordonkoehn commented Nov 28, 2024

gordonkoehn commented Nov 29, 2024

gordonkoehn commented Dec 3, 2024

gordonkoehn commented Dec 19, 2024

meta: sr2silo for Loculus Architecture #45

meta: sr2silo for Loculus Architecture #45

Comments

gordonkoehn commented Nov 25, 2024 • edited Loading

V-Pipe and Loculus:

V-Pipe outputs to Silo*

Sub-Issues:

Open Questions

gordonkoehn commented Nov 28, 2024

gordonkoehn commented Nov 29, 2024

gordonkoehn commented Dec 3, 2024

gordonkoehn commented Dec 19, 2024

gordonkoehn commented Nov 25, 2024 •

edited

Loading