Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase timeout 5 --> 10 minutes for standalone Azure tests #1301

Conversation

Andrei-Aksionov
Copy link
Collaborator

Hi there 👋

CI workflow for GPU relates tests might fail occasionally due to timeout.
The reason is added tests for Thunder integration.

As a workaround, this PR increases timeout from 5 to 10 minutes.

@Andrei-Aksionov Andrei-Aksionov added the CI Continuous integration label Apr 16, 2024
@williamFalcon
Copy link
Contributor

ummm... shouldn't the solution be to fix the reason for the slow tests? if we keep bumping timeouts, CI will become unbearably slow over time.

cc @lantiga

@Andrei-Aksionov
Copy link
Collaborator Author

That's what makes this PR a workaround.
I recommend merging this for now until the solution is found.

@lantiga
Copy link
Contributor

lantiga commented Apr 16, 2024

The offending test is always thunder_test_save_load_sharded_checkpoint (https://github.com/Lightning-AI/litgpt/blob/main/tests/test_thunder_fsdp.py#L265), let's unblock by skipping that particular test until @carmocca can look into it

@lantiga
Copy link
Contributor

lantiga commented Apr 16, 2024

Opened #1304

@lantiga lantiga closed this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants