-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[URGENT] Our GCP credits have run out again #2138
Comments
Update: switched billing accounts to a new 2i2c accountAfter some conversation in our matrix channel. OK I have done these things:
RationaleThe reason we did this is because the current Billing Account was linked to that gift from Google, and it was unclear if / how we could reimburse ourselves on that account. By linking it to a 2i2c account, we can use 2i2c grants or funds to pay for this as a stop-gap solution. Once we figure out a more long-term solution here then we can change the Billing Account again, but in the meantime at least we'll be able to reimburse ourselves for the costs we incur. |
Update: should we sunset the
|
I don't think it's been used for everything, so if we have nothing to pay for it with, shutting it down seems fine. |
I've applied strict pod quotas, which means GKE is currently limited to 400 users and rejects new launches after that (that's currently at least 80% of launch requests during busy times). I can turn it down, too, if need be to keep costs lower. |
I can also cap the autoscale limit, if we have a number to aim for. |
I'm going to chat to a few people and bring up the directed funding options at our staff meeting this afternoon. I will follow up with you here when I have something helpful to contribute. |
Hi all, @lisamartin72 told me about this thread. It may be helpful if someone can attend the Friday governance meeting at 9am PT to discuss this from a broader Jupyter funding perspective. |
I can try to make it tomorrow - the time generally overlaps with when I take my daughter to day care but if I hustle back then I think I can make it. |
@ellisonbg - I put together this short slideshow that explains the current situation, we can use that as a starting point for conversation tomorrow: https://docs.google.com/presentation/d/1DVornW2X88tIg-CFgoysz87Q4XbU0IVEBcOkMQSYIVo/edit?usp=sharing |
Thanks Chris, this is really helpful!
…On Thu, Mar 3, 2022 at 2:17 PM Chris Holdgraf ***@***.***> wrote:
@ellisonbg <https://github.com/ellisonbg> - I put together this short
slideshow that explains the current situation, we can use that as a
starting point for conversation tomorrow:
https://docs.google.com/presentation/d/1DVornW2X88tIg-CFgoysz87Q4XbU0IVEBcOkMQSYIVo/edit?usp=sharing
—
Reply to this email directly, view it on GitHub
<#2138 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGXUGGUKIK2S32Y7JFB4LU6E3AVANCNFSM5PYKDPZA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Brian E. Granger
Senior Principal Technologist, AWS AI/ML ***@***.***)
On Leave - Professor of Physics and Data Science, Cal Poly
@ellisonbg on GitHub
|
Hey guys - Happy to help with this. Chris, want to email/DM me to chat about it? |
Hey @pzwang - thanks for reaching out :-) we discussed this a bit in the governance meeting today, and I think there are a few things that we are going to try:
@pzwang I wonder if option 3 is the kind of thing that you had in mind? Either way I would be happy to chat about this. |
Simula can cover GKE for a month or two, but not permanently |
Update about current statusHey all - I wanted to provide an update on this situation and describe ongoing efforts at improving things: JupyterLite: In jupyter/jupyter.github.io#682 we updated the top two links of try.jupyter.org to use JupyterLite instead of Binder. This has cut Binder's usage by about 60% and will significantly reduce costs. 2i2c covered mybinder.org's bill for about two weeks, and Simula is now footing the bill. We went a few weeks with 2i2c covering the gke.mybinder.org bill, and @minrk found about a month's funding from Simula as well. Simula is currently the primary payer for gke.mybinder.org No clear response from Google. I have gotten confusing and somewhat conflicting messages from people inside of Google. The person that originally told us to work with the "Open Source at Google" team now seems to suggest that the original pathway we were taking is what we should be doing now instead. So I'm not sure what to make of that. NumFocus donation account. I spoke with Lisa at NumFocus, and I think they are setting up a "Donate to Binder" link / page that we could use. This might be one way to get donations from others via mechanisms described in jupyterhub/team-compass#430 |
Update: Still no response from GoogleI've pinged the Google group again today about the status of credits, but they still have not responded with a clear answer. I'm not really sure what's going on with google :-( Right now my goal is just to figure out what is the stage of the process internally, but having a hard time getting clear answers |
Update: still no response from GoogleI was told that they were discussing this and making a decision around April 10th, but I reached out and again was told that they are still discussing. I don't know what's going on with the Google Open Source team but I am thinking we should not count on them moving forward. |
OK that's disappointing but perhaps not unexpected. Let's coordinate offline on a straight-up donation from Anaconda to hold things over while we figure out longer-term more sustainable model? |
@choldgraf I am a community manager at Anaconda. I will work with you on the short term fix, and then longer term ideas. Expect an email shortly. |
Thanks @pzwang and @tnabtaf for reaching out again, and for your support of the project! Two quick thoughts:
|
Absolutely.
I will take a look at this. It's a widespread challenge. Email sent, and hope to talk soon. |
@choldgraf I think it is time to make OVH the primary cluster of Mybinder.org. |
I think an easy first step would be for OVH to provision more node capacity for user sessions. Right now I believe it has about 25% of the capacity of binder. I believe if they added more nodes then we could relatively easy move more traffic their way by redistributing the quotas. That said, I am not sure who at OVH to ask about this. Maybe others here know who has been the latest contact with OVH and the binder/jupyterhub team? |
Just chiming in here - if OVH contributes significantly more resources, we can easily make it prime. Turing coming back online may make more sense, though, since it's a bigger cluster. |
@minrk we should also estimate the minimal cost of running the "mothership deployment" of the mybinder.org federation. E.g., if a BinderHub deployment ran no user pods (or some minimal amount), but still ran the redirector and served the main home page, how much would it be? I feel like it is more important that the "mothership" deployment be extremely stable and reliable, and easy to configure and modify by the team, rather than it having a particularly large capacity. Does that make sense? |
I'll work on an estimate tomorrow. I think we can keep it under 1k/month, maybe less. "Prime" member mostly doesn't mean much other than being the one to keep getting traffic when everyone else is full. It's easy to move around, though. |
It's hard to be precise, but our GKE bill is currently around $3,200, of which ~$800 is storage that would go to just a few bucks if we stopped running binderhub and cleared the image cache (we'd still have, and $2200 is compute for our nodes. Other costs for SQL, GKE (~$150) would mostly stay, or get reduced a tiny bit if we shutdown the staging cluster. Our nodes are
If we were only running the redirector and prometheus/grafana, we could fit on a single n1-highmem-4 node, or 2 n1-standard-4. Assuming the compute costs scale purely by these numbers (not quite, but probably close), our compute costs would drop by ~90% if we shutdown the user nodes in the prod cluster and the staging cluster altogether. That suggests our barebones bill would be closer to $500, maybe lower, but should comfortably fit under $1k. |
Truly barebones would be to just run the redirector, grafana (not prometheus), and . That would take close to no resources, and arguably k8s isn't the right deployment tool at that point, since it's substantial overkill. But migration to something else might be trickier than we can manage right now. |
Update: the Google Cloud credits came through!I finally heard back from Google yesterday evening, and they are offering us $50,000 in credits for gke.mybinder.org for the next year. We can ask again through a more streamlined process for the next year as well. A reason it took so long this time is because we asked them in an "off-cycle" way before they had figured out their budget situation. I've asked them to deposit these credits in the billing account that we are currently using (called I don't think this should change any of our other plans around diversifying and increasing the funding streams that go into the Binder Project, so we should keep pushing forward as we've made a lot of progress there. e.g., @pzwang it would be great to have your input on that as well, even though the "credit crunch" is no longer as dire right now. IMO, we should announce that Google has provided us these credits when we also announce the new donation / support-the-project infrastructure we've been discussing in jupyterhub/team-compass#508 I'm going to close this one, and we can take conversation about next steps on cloud/donations/etc to: |
Context
As a part of jupyterhub/team-compass#463 we got a $10,000 batch of credits to keep mybinder.org running while our application for a full year was processed. I just realized two things:
So we are now paying for
gke.mybinder.org
with cash, not with credits.Can we significantly reduce the load on
gke.mybinder.org
?As a stopgap, I think we need to significantly reduce the load on gke.mybinder.org, or we are going to need to find some way to pay about $4,000 in cloud bills at the end of March.
The text was updated successfully, but these errors were encountered: