Set up CI #59

vincerubinetti · 2024-11-12T20:15:49Z

I believe this is what our GitHub Actions CI workflow should eventually look like:

name: Update data

on:
  pull_request:
  workflow_dispatch:

jobs:
  update:
    runs-on: ubuntu-latest

    steps:
      - name: Debug dump
        uses: crazy-max/ghaction-dump-context@v2

      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache-dependency-path: "**/environment.yml"

      - name: Install Python packages
        run: |
          echo Some conda install command

      - name: SSH debug
        if: runner.debug == '1'
        uses: mxschmitt/action-tmate@v3

      - name: Update data
        run: |
          python some-script-1.py
          python some-script-2.py
          python some-script-3.py

      # optional. might be nice for the site to know when the data was last updated. 
      - name: Record data compile time
        run: sed -i "" "s/PUBLIC_DATA_DATE=*/PUBLIC_DATA_DATE=$(date -uIseconds)/" site/.env

      - name: Open pull request with updated files
        if: github.event_name == 'workflow_dispatch'
        uses: peter-evans/create-pull-request@v6
        with:
          branch: update-data
          title: Update data

      - name: Commit updated files
        if: github.event_name == 'pull_request'
        uses: stefanzweifel/git-auto-commit-action@v5
        with:
          commit_message: Update data

This will allow you to either manually run the workflow and have it open a PR with the updated data, or open a PR manually and have it run the data update automatically on the PR.

As for the site, I'm thinking that we should use Netlify instead of GitHub Pages for the new website. They are both free and easy to use, but Netlify also gives us live deploy previews of PRs, built in. You can of course also set up your Netlify site to use your custom domain. And we'll have it configured such that it just rebuilds and redeploys the site when there's any changes in the repo (on main or any PR branch), including the /data folder. As such, there's no need to trigger anything related to the site in this gh-actions workflow; it will happen automatically. Yes this means that the site will be rebuilt when ineffectual things like the readme change, but the cost is minimal; the site only takes a few seconds to build.

The text was updated successfully, but these errors were encountered:

hdashnow · 2024-11-14T21:13:00Z

That all sounds great.

At some point, I'd like to add a few things to this process if appropriate (unless they belong elsewhere)

When a new release has been created:
- version the json data and link to it, like what I did manually here: https://github.com/dashnowlab/STRchive/releases/tag/v1.2.0.
- This should also appear at the top of the table and somewhere on the locus pages along with an updated date.
- Run scripts to update locus definitions (probably should generate a PR and be checked)
Update literature for existing loci (scheduled, auto PR)
Search for new locus literature (scheduled, auto PR)

vincerubinetti · 2024-11-14T21:43:25Z

Not sure what you had planned, but it seemed to me like maybe all the data should be updated at the same time, i.e. as part of the same gh-actions workflow. Unless you want to be able to, say, update the literature independently from the other stuff. That might muddy the waters though, like the literature would be on its own version (if it even has a version) separate from the rest of the data.

version the json data and link to it, like what I did manually here:

For reference, here's the "versioning" workflow I have for Lab Website Template. ncipollo/release-action makes it easy to make tags and releases.

This should also appear at the top of the table and somewhere on the locus pages along with an updated date.

If what I said above is what you decide to do, there'd be one version for all of the data, and that version could be displayed in the header perhaps, with a link to the list of releases / changelog.

Instead of the "record data compile time" step I had in my example .yaml file above, it'd actually be better if you made a GitHub CFF citation file for this repo. You probably should have this anyway. But it will also allow me to conveniently get the version and date of the data to display on the website somewhere.

Run scripts to update locus definitions (probably should generate a PR and be checked)
Update literature for existing loci (scheduled, auto PR)
Search for new locus literature (scheduled, auto PR)

I'd imagine all these scripts would run in sequence in this same workflow, and then the workflow could be triggered by workflow_dispatch (running it with a manual button click in the GitHub web interface), pull_request (when you manually open a PR for whatever reason), and schedule (perhaps weekly or monthly). Then it would open a PR with peter-evans/create-pull-request for you to review and merge. We could have it always open a new PR, or give a specific branch name such that if one is already open (e.g. you haven't gotten around to merging last week's update PR yet), it just updates that one.

vincerubinetti · 2024-11-15T19:22:55Z

Just a sanity check here, are all the processing scripts and such actually in this repo? I feel like I've run into cases where I ctrl+f the whole repo, looking for a bit of Python script that generated/processed some JSON, and I can't find it.

Because all of that code will need to be in this repo on the same branch for the CI process, or else we'll need some complicated workarounds.

laurelhiatt · 2024-11-19T03:12:10Z

Hey Vince sorry I didn't see this, I thought CI was confidence interval for the plots lol, which is out of my jurisdiction. we can meet about this tomorrow or this week if you'd like

vincerubinetti · 2024-11-26T20:00:26Z

hdashnow · 2024-11-30T19:17:27Z

#93 should have all the components needed for automation. See the STRchive/README.md and let me know if more details are needed.
Note that the run-manubot.py script runs so long (12+ hours) that I've never actually run it on the full dataset, only a subset. I think we'll need to deal with this before we can use it in CI. In the meantime, you can run everything else using snakemake --config stages="skip-refs" in a few seconds.

vincerubinetti · 2024-11-30T22:25:27Z

It probably will always take at least a couple of hours if doing every citation, but i would look into running things in parallel to see if it improves times. The manubot script I gave you just runs one at a time. You can also pass multiple IDs to manubot in one command, though idk if it parallelizes that.

hdashnow · 2024-12-02T17:49:32Z

I thought about doing things in parallel. It could cause rate limit issues if not done carefully. I've got it down to ~2 hours run time now, so let's see how we go and decide if it's worth the effort to find more speed-ups.

vincerubinetti · 2024-12-05T03:52:27Z

Here's an updated workflow that is working except for the conda activate step which I don't know enough to debug.

Put this in /.github/workflows/update-data.yaml

name: Update data

on:
  pull_request:
    branches: main
    paths:
      - "data/**"
      - "scripts/**"
  schedule:
    - cron: "0 0 1 * *"
  workflow_dispatch:

jobs:
  update:
    runs-on: ubuntu-latest

    steps:
      - name: Debug dump
        uses: crazy-max/ghaction-dump-context@v2

      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache-dependency-path: "**/environment.yml"

      - name: Set up Conda
        uses: s-weigand/setup-conda@v1

      - name: Activate Conda
        run: |
          conda env create --file scripts/environment.yml
          conda init
          source ~/.bashrc
          conda activate strchive

      - name: Update data (short)
        if: ${{ github.event_name == 'pull_request' }}
        run: snakemake --config stages="skip-refs"

      - name: Update data (full)
        if: ${{ github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
        run: snakemake

      - name: Open pull request with updated files
        if: ${{ !(github.event_name == 'pull_request') }}
        uses: peter-evans/create-pull-request@v7
        with:
          branch: update-data
          title: Update data

      - name: Commit updated files to current pull request
        if: ${{ github.event_name == 'pull_request' }}
        uses: stefanzweifel/git-auto-commit-action@v5
        with:
          commit_message: Update data

And here's a working workflow for #96. Take a careful look at the logic that inserts the version string into the file names. I wrote it assuming you would rename the .bed files to start with STRchive-disease-loci. Change it to whatever you want.

Put this in /.github/workflows/make-release.yaml

name: Make release

on:
  push:
    branches:
      - main
    paths:
      - CITATION.cff

jobs:
  release:
    runs-on: ubuntu-latest

    steps:
      - name: Debug dump
        uses: crazy-max/ghaction-dump-context@v2

      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 2

      - name: Get previous version file
        run: git show HEAD~1:CITATION.cff >> CITATION-previous.cff

      - name: Install packages
        run: npm install yaml@v2 semver@v7 glob@v11

      - name: Get version
        id: version
        uses: actions/github-script@v7
        with:
          result-encoding: string
          script: |
            const { readFileSync, renameSync } = require("fs");
            const { valid, eq, lt } = require("semver");
            const { parse } = require("yaml");
            const { globSync } = require("glob");

            // load and parse file contents
            const { version: newVersion } = parse(readFileSync("CITATION.cff").toString());
            const { version: oldVersion } = parse(readFileSync("CITATION-previous.cff").toString());

            console.log(`Old version: ${oldVersion}`);
            console.log(`New version: ${newVersion}`);

            // check version
            if (!valid(newVersion) || lt(newVersion, oldVersion))
              throw Error("Version not valid");
            if (eq(oldVersion, newVersion)) {
              console.log("Version unchanged");
              return "";
            }

            // add version to artifact filenames
            for (const file of globSync("**/STRchive-*.json"))
              renameSync(file, file.replace(".json", `_v${newVersion}.json`));
            for (const file of globSync("**/STRchive-disease-loci*.bed"))
              renameSync(file, file.replace("-loci", `-loci_v${newVersion}_`));

            return newVersion;

      - name: SSH debug
        if: runner.debug == '1'
        uses: mxschmitt/action-tmate@v3

      - name: Release
        uses: softprops/action-gh-release@v2
        if: ${{ steps.version.outputs.result }}
        with:
          tag_name: v${{ steps.version.outputs.result }}
          files: |
            **/STRchive-loci*.json
            **/STRchive-citations*.json
            **/STRchive-disease-loci*.bed

Pro-tip: Add this step somewhere under steps, and the workflow will pause there and allow you to SSH into the runner machine and run any commands you want. I.e. it'll give you a command like ssh [email protected] to run in your terminal. Helpful to put right before a command that keeps failing, then you go in and debug.

- name: SSH debug
  uses: mxschmitt/action-tmate@v3

hdashnow · 2024-12-10T00:05:01Z

I'm out of my depth on this CI stuff... This worked on my fork, but can't get it working in this repository.
https://github.com/dashnowlab/STRchive/actions/runs/12246554105

vincerubinetti · 2024-12-10T17:36:39Z

Here's the failing run:

https://github.com/dashnowlab/STRchive/actions/runs/12246554105/job/34162663499

And the relevant logs:

remote: Permission to dashnowlab/STRchive.git denied to github-actions[bot].
fatal: unable to access 'https://github.com/dashnowlab/STRchive/': The requested URL returned error: 403

It's failing with a 403 permission denied error code. So the action runner (i.e. the "github actions bot") doesn't have permissions to commit/push to the repo.

I believe you'll need to allow actions read + write permissions in the repo settings:

Hopefully if you change that and re-run the failed workflow, it should work.

hdashnow · 2024-12-13T17:08:51Z

Oops, closed prematurely. Still working on the R bit of this.

vincerubinetti mentioned this issue Dec 5, 2024

Append STRchive version number to files downloaded from /loci#downloads #96

Closed

This was referenced Dec 9, 2024

add short CI workflow (requires python only, no R) #117

Closed

add short CI workflow (requires python only, no R) #120

Merged

vincerubinetti linked a pull request Dec 11, 2024 that will close this issue

Auto release upon edit to CITATION.cff #124

Merged

hdashnow closed this as completed in #124 Dec 12, 2024

hdashnow reopened this Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up CI #59

Set up CI #59

vincerubinetti commented Nov 12, 2024 •

edited

Loading

hdashnow commented Nov 14, 2024

vincerubinetti commented Nov 14, 2024 •

edited

Loading

vincerubinetti commented Nov 15, 2024

laurelhiatt commented Nov 19, 2024

vincerubinetti commented Nov 26, 2024

hdashnow commented Nov 30, 2024

vincerubinetti commented Nov 30, 2024

hdashnow commented Dec 2, 2024

vincerubinetti commented Dec 5, 2024 •

edited

Loading

hdashnow commented Dec 10, 2024

vincerubinetti commented Dec 10, 2024

hdashnow commented Dec 13, 2024

Set up CI #59

Set up CI #59

Comments

vincerubinetti commented Nov 12, 2024 • edited Loading

hdashnow commented Nov 14, 2024

vincerubinetti commented Nov 14, 2024 • edited Loading

vincerubinetti commented Nov 15, 2024

laurelhiatt commented Nov 19, 2024

vincerubinetti commented Nov 26, 2024

hdashnow commented Nov 30, 2024

vincerubinetti commented Nov 30, 2024

hdashnow commented Dec 2, 2024

vincerubinetti commented Dec 5, 2024 • edited Loading

hdashnow commented Dec 10, 2024

vincerubinetti commented Dec 10, 2024

hdashnow commented Dec 13, 2024

vincerubinetti commented Nov 12, 2024 •

edited

Loading

vincerubinetti commented Nov 14, 2024 •

edited

Loading

vincerubinetti commented Dec 5, 2024 •

edited

Loading