Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sequences to go to INSDC privately, and then subsequently be released using "release projects" #3436

Open
theosanderson opened this issue Dec 12, 2024 · 3 comments
Assignees
Labels
deposition related to ENA/INSDC deposition

Comments

@theosanderson
Copy link
Member

theosanderson commented Dec 12, 2024

Current Situation

  • We are able to move sequences between projects in ENA
  • However, the only time that a sequence's visibility changes is at the moment when its parent project undergoes release. And each project can only be released once. This prevented our planned model of each user having a private and a public project and us simply transferring between them at the point of release.

Proposal

My proposal is that when on a specific day we want to release sequences, we:

  • create a new "release project" for that day's sequences (potentially across all users)
  • move sequences from their user's private projects to the release project
  • release the release project, triggering release of the sequences
  • some time later move the sequences to the users' public project

This could result in us creating 365 projects per year, but that doesn't actually seem too bad. And initially it wouldn't be that many.

@theosanderson theosanderson added this to the Priority 1 (highest) milestone Dec 12, 2024
@anna-parker
Copy link
Contributor

anna-parker commented Dec 13, 2024

I think this sounds like a potential solution - great idea @theosanderson!

However, as this will be a large code change I think we should do a test before implementing. Sadly it will have to be on ENA production (with our non-broker account) as we can't test well on dev - so we will have to ask them to clean up afterwards.

We should create 2 private and one public project.

  • First submit the sequence(assembly) in the private project and wait for it to be accessioned.
  • Move the sequence from the first to the second private project and then release the project (i.e. make it public).
  • Make sure the sequence has in fact gone public
  • Move the sequence to the initial public project -> check that the correct bioproject is linked on sequence view pages and on NCBI virus

@anna-parker
Copy link
Contributor

anna-parker commented Dec 13, 2024

As I was anyways thinking about this, these are the steps required for implementing the feature:

Update: #2893 will need to also be resolved for this to work.

  1. [minutes] Modify cronjob to also notify us about new private sequences to submit
  2. [1 day - hours if we choose to stay with slack notifications] Modify cronjob to create PRs on github with private and public sequences (maybe on different pages)
  3. [minutes] Add new public/private status and release data column for sequence- and project- table in DB
  4. [hours] Change read in process from github to distinguish between public and private
  5. [hours] Upload private sequences to private ENA repo (code reusable until upload to Loculus, just change project creation to private)
  6. [1+ days - tbd] Do not add accession to Loculus public page: either send submission group an email or make this only visible on sequence details page for group members (if a private sequence is publicly disclosed ENA has a process of making it public) - specifics should be discussed before starting could be much longer
  7. [hours] If a sequence's release data is changed modify table (I think the cronjob will have to updated to also send these updates and the read in process will also have to be modified)
  8. [1 day] On release date move sequences to different private folder and make folder public
  9. [1 day] Once public (check API if there is a good endpoint for checking this) move to group's public project.
  10. [hours] Update sequence's status to open, change bioproject to new public project and make this data visible on Loculus (with upload external metadata endpoint)

Other thoughts - restricted use sequences will also be visible in a github repo - do we need to add a banner or sth to make sure they are not mis-used?

@theosanderson
Copy link
Member Author

Totally agreed that we need to do the tests. And I am quite uncertain about whether the last move to the correct project will propagate to NCBI Virus. Even if it doesn't this may still be a useful stopgap compared to the previous behaviour - perhaps with us contacting the ENA helpdesk every month or so to trigger a sync.

@anna-parker anna-parker self-assigned this Dec 14, 2024
@anna-parker anna-parker added the deposition related to ENA/INSDC deposition label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deposition related to ENA/INSDC deposition
Projects
None yet
Development

No branches or pull requests

2 participants