Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary machine meets her demise by performing large work #201

Open
vorj opened this issue Feb 28, 2019 · 4 comments
Open

Primary machine meets her demise by performing large work #201

vorj opened this issue Feb 28, 2019 · 4 comments
Assignees
Labels
investigation Measure performance, investigate other's materials, pepers,

Comments

@vorj
Copy link

vorj commented Feb 28, 2019

We, Fixstars developers, develop ClPy on some machines.
Especially, we use the Primary machine (mounting AMD Radeon Vega) and Secondary machine (mounting NVIDIA TITAN V) for now.

However, the GPU driver of the Primary machine meets its demise frequently when performing large work.
We need to push the reset button every time the machine has passed away.
We should fix this problem.

Related issue: #108

@LWisteria
Copy link
Member

@vorj As you reported on #180 (comment) and #108 (comment), the problem seems to be caused by perfoming large work, not frequently.

Please report other situation without large proglem if you know. Otherwise do not stretch the problem.

@vorj
Copy link
Author

vorj commented Feb 28, 2019

@LWisteria Currently we are tackling to pass some test cases, which contain large works.
Therefore, the developers have been clashed by this problem repeatedly.
So, I feel like it's occurred frequently .
However, the problem has some reproducibility, and basically we can avoid it by taking care for executing the test cases.
Additionally, current CI task don't contain problematic large works, so we don't hit the problem always .

Anyway, the description is not good (because it is based on my feelings), so I'll fix it.

@vorj vorj changed the title Primary machine meets her demise frequently Primary machine meets her demise by performing large work Feb 28, 2019
@LWisteria
Copy link
Member

@vorj we must talk and discuss about technology and engineering, never your spiritual feeling.

@LWisteria LWisteria added the investigation Measure performance, investigate other's materials, pepers, label Mar 5, 2019
@LWisteria
Copy link
Member

@yuk-to I hear you investigated this problem. Is this caused by ClPy or machine specific? The old primary machine (furyx) wont' die even if the work load is heavy. We can close this issue if this is not by ClPy itself. I need you to report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigation Measure performance, investigate other's materials, pepers,
Projects
None yet
Development

No branches or pull requests

3 participants