Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instnace g4dn.xlarge not starting even after the CF template successfully created the stack #1

Open
vinodvarma24 opened this issue Jan 21, 2021 · 13 comments

Comments

@vinodvarma24
Copy link

Screen Shot 2021-01-21 at 2 10 42 PM

Screen Shot 2021-01-21 at 2 11 43 PM

Screen Shot 2021-01-21 at 2 12 11 PM

You can see the output and dashboard url, the worker did not start.

I have recreated cloudformation with template multiple times, but no luck. Could you throw some light here.

Thanks in advance,

@mludvig
Copy link
Owner

mludvig commented Jan 22, 2021

What do you get in the Auto scaling group "Events" tab? There may be some hints on why it fails to spin up the instances.

@vinodvarma24
Copy link
Author

Screen Shot 2021-01-23 at 12 17 02 PM

I checked the Autoscaling group events tab, there seems to an issue with the low no. of spot instances in the Ohio region, those instances are not starting up. What is the best way to avoid this.

Should I run the Cloud formation template in Virginia or Oregon? or reduce the capacity of auto-scaling from 0-5 to something else?

@vinodvarma24
Copy link
Author

I have figured it out. My aws account didnot have the Spot limit required.

@getSwiftly
Copy link

Facing a similar issue in Events tab of Auto Scaling Group I see:

"Launching a new EC2 instance. Status Reason: We currently do not have sufficient g4dn.xlarge capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get g4dn.xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1b, us-east-1c, us-east-1d, us-east-1f. Launching EC2 instance failed."

Should I wait for the service request to be processed?

@mludvig
Copy link
Owner

mludvig commented Nov 3, 2021

The g4dn.xlarge have availability issues across all the cheapest regions. However you can currently run them in the Los Angeles (LAX) local zone (us-west-2-lax-1) for the same spot price as in Oregon. All you have to do is:

  1. Opt-in to the LAX zone with:
    aws --region us-west-2 ec2 modify-availability-zone-group --group-name us-west-2-lax-1 --opt-in-status opted-in
    
  2. Create new default subnets in the VPC in the 2 LAX AZs:
    aws --region us-west-2 ec2 create-default-subnet --availability-zone us-west-2-lax-1a
    aws --region us-west-2 ec2 create-default-subnet --availability-zone us-west-2-lax-1b
    
  3. Delete and re-deploy the CFN stack (because it picks up the AZ list when it's getting created).

@mludvig mludvig reopened this Nov 3, 2021
@getSwiftly
Copy link

Thanks it got created in 20-30mins after I recreated the template. How to check what spot price I am getting on that AWS instance?

@wilfi
Copy link
Contributor

wilfi commented Nov 4, 2021

@mludvig - I tried this , but when the ASG is created I'm getting the same error as below screenshot. In my case 2 out of 10 instances are getting created which are in us-west-2-lax-1b, us-west-2-lax-1a zones.

Also the ASG Details shows the Availability zones - us-west-2a, us-west-2b, us-west-2-lax-1b, us-west-2-lax-1a, us-west-2c, us-west-2d which means us-west-2-lax-1b, us-west-2-lax-1a are added.

I wonder if this is due to the spot on limitation. I've tried this as suggested.

Screenshot 2021-11-04 at 9 40 33 PM

Screenshot 2021-11-04 at 9 30 54 PM

@mludvig
Copy link
Owner

mludvig commented Nov 4, 2021

Hi @wilfi

The log message says:

Max spot instance count exceeded.

You'll have to raise a support request to increase the spot instance quota for your account. The quota is in vCPU units and each g4dn.xlarge has 4 vCPUs, so increasing it to 40 will give you enough capacity for 10 instances in the region.

See: Increasing resource quotas in the README file.

@0xtruth
Copy link

0xtruth commented Apr 13, 2022

How long does it take for AWS to increase quotas? Still haven't heard back after a few days

@d4op
Copy link

d4op commented May 26, 2022

how to launch the CF tempalte via aws cli ? o.o

@mludvig
Copy link
Owner

mludvig commented May 27, 2022

@d4op don't add unrelated comments to existing issues. Open a new ticket and I'll tell you how to do it with aws cli ;)

@mludvig mludvig closed this as completed May 27, 2022
@mludvig mludvig reopened this May 27, 2022
@mludvig
Copy link
Owner

mludvig commented May 27, 2022

@0xtruth GPU limits often take them a few days to process or to ask for more info. Unfortunately quite often they reject it when a good justification for the request wasn't provided.

@MoAdelAbdelrahman
Copy link

I have figured it out. My aws account didnot have the Spot limit required.

What was the solution for this?
I am facing the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants