make training happen on digital ocean #41

josh-chamberlain · 2024-02-27T22:06:55Z

We have a training dataset and stuff in hugging face. We should use cloud computing resources for the training.

mbodeantor · 2024-02-28T16:30:01Z

We have droplet already provisioned for the data-sources-mirror which I'm happy to share the key for experimentation for this purpose. Not sure if this will be sufficient, we should discuss if not.

maxachis · 2024-03-21T15:32:50Z

@josh-chamberlain @mbodeantor Would we want to manually trigger this training, set up a cron job to have it occur at regular intervals, or both?

Additionally, which components of the pipeline would we want to use in training? All parts of the pipeline, or only some?

maxachis · 2024-03-21T20:28:07Z

We have droplet already provisioned for the data-sources-mirror which I'm happy to share the key for experimentation for this purpose. Not sure if this will be sufficient, we should discuss if not.

@mbodeantor @josh-chamberlain Looking at the graphs for the droplet, I note that the droplet tends to have a lot of downtime with CPU resource under-utilization, with brief bursts of near-100% CPU activity. From the standpoint of CPU alone, that would be promising.

However, I would note that the droplet has the following limitations:

512 MB memory (with average usage hovering around 65%)
10 GB Disk (with average usage hovering around 42%)

The two of those, and the memory especially, are probably not enough for training. At best, it will make training take quite a bit of time. At worst, the code just might fail. And even in the best case scenario, we'd have to think about how design the new activity so it doesn't interfere with the existing functionality of data-sources-mirror.

It'd probably be easier and more viable to have training occur on a droplet specifically provisioned for training.

josh-chamberlain · 2024-03-26T19:53:11Z

@maxachis OK, let's provision a droplet. We should start with an entry-level one, since they appear to be easily resizeable, and scale up if we need to.

josh-chamberlain added this to Open issues & Roadmap Feb 27, 2024

josh-chamberlain moved this to Needs Refinement in Open issues & Roadmap Feb 27, 2024

bonjarlow assigned bonjarlow and unassigned bonjarlow Feb 27, 2024

mbodeantor moved this from Needs Refinement to Todo in Open issues & Roadmap Feb 28, 2024

maxachis mentioned this issue Mar 5, 2024

Annotation workflow v2 #19

Closed

7 tasks

This was referenced May 23, 2024

Feature: training dataset maintenance #49

Open

automated workflow: update training data from label studio #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make training happen on digital ocean #41

make training happen on digital ocean #41

josh-chamberlain commented Feb 27, 2024

mbodeantor commented Feb 28, 2024

maxachis commented Mar 21, 2024

maxachis commented Mar 21, 2024 •

edited

Loading

josh-chamberlain commented Mar 26, 2024

make training happen on digital ocean #41

make training happen on digital ocean #41

Comments

josh-chamberlain commented Feb 27, 2024

mbodeantor commented Feb 28, 2024

maxachis commented Mar 21, 2024

maxachis commented Mar 21, 2024 • edited Loading

josh-chamberlain commented Mar 26, 2024

maxachis commented Mar 21, 2024 •

edited

Loading