Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[URGENT] Our GCP credits have run out again #2138

Closed
choldgraf opened this issue Mar 2, 2022 · 28 comments
Closed

[URGENT] Our GCP credits have run out again #2138

choldgraf opened this issue Mar 2, 2022 · 28 comments

Comments

@choldgraf
Copy link
Member

Context

As a part of jupyterhub/team-compass#463 we got a $10,000 batch of credits to keep mybinder.org running while our application for a full year was processed. I just realized two things:

  1. We haven't heard back from Google about those credits yet
  2. Our credits ran out around yesterday

So we are now paying for gke.mybinder.org with cash, not with credits.

Can we significantly reduce the load on gke.mybinder.org ?

As a stopgap, I think we need to significantly reduce the load on gke.mybinder.org, or we are going to need to find some way to pay about $4,000 in cloud bills at the end of March.

@choldgraf
Copy link
Member Author

choldgraf commented Mar 2, 2022

Update: switched billing accounts to a new 2i2c account

After some conversation in our matrix channel.

OK I have done these things:

  1. Created a new Billing Account called gke.mybinder.org that is connected to 2i2c's organizational credit card (so we can reimburse it on 2i2c grants/funds).
  2. Added @yuvipanda and @minrk as billing account administrators on this account, so that they have power to change things if need be.
  3. Swapped our GCP project to use this billing account

Rationale

The reason we did this is because the current Billing Account was linked to that gift from Google, and it was unclear if / how we could reimburse ourselves on that account. By linking it to a 2i2c account, we can use 2i2c grants or funds to pay for this as a stop-gap solution. Once we figure out a more long-term solution here then we can change the Billing Account again, but in the meantime at least we'll be able to reimburse ourselves for the costs we incur.

@choldgraf
Copy link
Member Author

Update: should we sunset the Binder Development project?

I've gotten some notices that our project "Binder Development" may be suspended. I believe that this is a "sandbox" project that we had created with the goal of doing quick development tasks. However, it doesn't seem to be in use, and rather than migrating this one as well, I suggest that we just sunset Binder Development. That gives us one less thing to worry about for now. Does anybody object to that?

@minrk
Copy link
Member

minrk commented Mar 3, 2022

I don't think it's been used for everything, so if we have nothing to pay for it with, shutting it down seems fine.

@minrk
Copy link
Member

minrk commented Mar 3, 2022

I've applied strict pod quotas, which means GKE is currently limited to 400 users and rejects new launches after that (that's currently at least 80% of launch requests during busy times). I can turn it down, too, if need be to keep costs lower.

@minrk
Copy link
Member

minrk commented Mar 3, 2022

I can also cap the autoscale limit, if we have a number to aim for.

@lisamartin72
Copy link

I'm going to chat to a few people and bring up the directed funding options at our staff meeting this afternoon. I will follow up with you here when I have something helpful to contribute.

@ellisonbg
Copy link

Hi all, @lisamartin72 told me about this thread. It may be helpful if someone can attend the Friday governance meeting at 9am PT to discuss this from a broader Jupyter funding perspective.

@choldgraf
Copy link
Member Author

I can try to make it tomorrow - the time generally overlaps with when I take my daughter to day care but if I hustle back then I think I can make it.

@choldgraf
Copy link
Member Author

@ellisonbg - I put together this short slideshow that explains the current situation, we can use that as a starting point for conversation tomorrow: https://docs.google.com/presentation/d/1DVornW2X88tIg-CFgoysz87Q4XbU0IVEBcOkMQSYIVo/edit?usp=sharing

@ellisonbg
Copy link

ellisonbg commented Mar 3, 2022 via email

@pzwang
Copy link

pzwang commented Mar 4, 2022

Hey guys - Happy to help with this. Chris, want to email/DM me to chat about it?

@choldgraf
Copy link
Member Author

choldgraf commented Mar 4, 2022

Hey @pzwang - thanks for reaching out :-) we discussed this a bit in the governance meeting today, and I think there are a few things that we are going to try:

  1. Use JupyterLite for our try.jupyter.org page, to cut down on the load on mybinder.org that isn't necessary
  2. Use a central Jupyter account to provide stopgap funding for mybinder.org, rather than the account of any one stakeholder for Binder (right now it is 2i2c)
  3. Work with NumFocus to create a donation pathway that is earmarked for Binder.
  4. Confirm with Google whether they will be providing more credits or not, also confirm whether Simula can provide credits for Binder
  5. Explore longer term sustainability efforts and partnerships with cloud companies.

@pzwang I wonder if option 3 is the kind of thing that you had in mind? Either way I would be happy to chat about this.

@minrk
Copy link
Member

minrk commented Mar 7, 2022

Simula can cover GKE for a month or two, but not permanently

@choldgraf
Copy link
Member Author

choldgraf commented Mar 14, 2022

Update about current status

Hey all - I wanted to provide an update on this situation and describe ongoing efforts at improving things:

JupyterLite: In jupyter/jupyter.github.io#682 we updated the top two links of try.jupyter.org to use JupyterLite instead of Binder. This has cut Binder's usage by about 60% and will significantly reduce costs.

2i2c covered mybinder.org's bill for about two weeks, and Simula is now footing the bill. We went a few weeks with 2i2c covering the gke.mybinder.org bill, and @minrk found about a month's funding from Simula as well. Simula is currently the primary payer for gke.mybinder.org

No clear response from Google. I have gotten confusing and somewhat conflicting messages from people inside of Google. The person that originally told us to work with the "Open Source at Google" team now seems to suggest that the original pathway we were taking is what we should be doing now instead. So I'm not sure what to make of that.

NumFocus donation account. I spoke with Lisa at NumFocus, and I think they are setting up a "Donate to Binder" link / page that we could use. This might be one way to get donations from others via mechanisms described in jupyterhub/team-compass#430

@choldgraf
Copy link
Member Author

choldgraf commented Mar 21, 2022

Update: Still no response from Google

I've pinged the Google group again today about the status of credits, but they still have not responded with a clear answer. I'm not really sure what's going on with google :-( Right now my goal is just to figure out what is the stage of the process internally, but having a hard time getting clear answers

@choldgraf
Copy link
Member Author

Update: still no response from Google

I was told that they were discussing this and making a decision around April 10th, but I reached out and again was told that they are still discussing. I don't know what's going on with the Google Open Source team but I am thinking we should not count on them moving forward.

@pzwang
Copy link

pzwang commented Apr 22, 2022

OK that's disappointing but perhaps not unexpected. Let's coordinate offline on a straight-up donation from Anaconda to hold things over while we figure out longer-term more sustainable model?

@tnabtaf
Copy link

tnabtaf commented Apr 22, 2022

@choldgraf I am a community manager at Anaconda. I will work with you on the short term fix, and then longer term ideas. Expect an email shortly.

@choldgraf
Copy link
Member Author

Thanks @pzwang and @tnabtaf for reaching out again, and for your support of the project!

Two quick thoughts:

  • I would love to chat, though I can't make decisions on behalf of the project on my own, so maybe we can discuss possibilities and then we can open up a team compass issue for comment to make sure there are no objections. That OK?
  • I recently opened up this issue to track setting up donation and sponsorship infrastructure for mybinder.org, would love any thoughts and suggestions there for ways that we could properly structure the incentives and attribution there! Set up donation infrastructure for mybinder.org team-compass#508

@tnabtaf
Copy link

tnabtaf commented Apr 22, 2022

  • I would love to chat, though I can't make decisions on behalf of the project on my own, so maybe we can discuss possibilities and then we can open up a team compass issue for comment to make sure there are no objections. That OK?

Absolutely.

I will take a look at this. It's a widespread challenge.

Email sent, and hope to talk soon.

@SylvainCorlay
Copy link

@choldgraf I think it is time to make OVH the primary cluster of Mybinder.org.

@choldgraf
Copy link
Member Author

I think an easy first step would be for OVH to provision more node capacity for user sessions. Right now I believe it has about 25% of the capacity of binder. I believe if they added more nodes then we could relatively easy move more traffic their way by redistributing the quotas. That said, I am not sure who at OVH to ask about this. Maybe others here know who has been the latest contact with OVH and the binder/jupyterhub team?

@minrk
Copy link
Member

minrk commented Apr 25, 2022

Just chiming in here - if OVH contributes significantly more resources, we can easily make it prime. Turing coming back online may make more sense, though, since it's a bigger cluster.

@choldgraf
Copy link
Member Author

choldgraf commented Apr 25, 2022

@minrk we should also estimate the minimal cost of running the "mothership deployment" of the mybinder.org federation. E.g., if a BinderHub deployment ran no user pods (or some minimal amount), but still ran the redirector and served the main home page, how much would it be?

I feel like it is more important that the "mothership" deployment be extremely stable and reliable, and easy to configure and modify by the team, rather than it having a particularly large capacity. Does that make sense?

@minrk
Copy link
Member

minrk commented Apr 25, 2022

I'll work on an estimate tomorrow. I think we can keep it under 1k/month, maybe less.

"Prime" member mostly doesn't mean much other than being the one to keep getting traffic when everyone else is full. It's easy to move around, though.

@minrk
Copy link
Member

minrk commented Apr 26, 2022

It's hard to be precise, but our GKE bill is currently around $3,200, of which ~$800 is storage that would go to just a few bucks if we stopped running binderhub and cleared the image cache (we'd still have, and $2200 is compute for our nodes. Other costs for SQL, GKE (~$150) would mostly stay, or get reduced a tiny bit if we shutdown the staging cluster.

Our nodes are

  • staging: 2x n1-standard-4 (8 cpu, 26 GB)
  • prod core: 1x n1-highmem-4 (4 cpu, 24GB)
  • prod user: 4x n1-highmem-8 (32 cpu, 192GB)

If we were only running the redirector and prometheus/grafana, we could fit on a single n1-highmem-4 node, or 2 n1-standard-4. Assuming the compute costs scale purely by these numbers (not quite, but probably close), our compute costs would drop by ~90% if we shutdown the user nodes in the prod cluster and the staging cluster altogether. That suggests our barebones bill would be closer to $500, maybe lower, but should comfortably fit under $1k.

@minrk
Copy link
Member

minrk commented Apr 26, 2022

Truly barebones would be to just run the redirector, grafana (not prometheus), and . That would take close to no resources, and arguably k8s isn't the right deployment tool at that point, since it's substantial overkill. But migration to something else might be trickier than we can manage right now.

@choldgraf
Copy link
Member Author

choldgraf commented Apr 29, 2022

Update: the Google Cloud credits came through!

I finally heard back from Google yesterday evening, and they are offering us $50,000 in credits for gke.mybinder.org for the next year. We can ask again through a more streamlined process for the next year as well. A reason it took so long this time is because we asked them in an "off-cycle" way before they had figured out their budget situation.

I've asked them to deposit these credits in the billing account that we are currently using (called gke.mybinder.org). They will hopefully land in the next day or two.

I don't think this should change any of our other plans around diversifying and increasing the funding streams that go into the Binder Project, so we should keep pushing forward as we've made a lot of progress there. e.g., @pzwang it would be great to have your input on that as well, even though the "credit crunch" is no longer as dire right now.

IMO, we should announce that Google has provided us these credits when we also announce the new donation / support-the-project infrastructure we've been discussing in jupyterhub/team-compass#508

I'm going to close this one, and we can take conversation about next steps on cloud/donations/etc to:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants