Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

paperspace-python behind proxy #7

Open
dalmia opened this issue Apr 11, 2018 · 8 comments
Open

paperspace-python behind proxy #7

dalmia opened this issue Apr 11, 2018 · 8 comments

Comments

@dalmia
Copy link

dalmia commented Apr 11, 2018

Does paperspace-python need any additional settings to work behind a proxy?

@dte
Copy link
Member

dte commented Apr 11, 2018

It shouldn't be required. Are you seeing any issues in particular? If you can post some relevant code snippets we can help debug further :)

@dalmia
Copy link
Author

dalmia commented Apr 11, 2018

Thanks, @dte for getting back. Actually, I noticed the proxy issue with paperspace-node, hence, asked the same for the python module. With paperspace-node (on Ubuntu), if I just run:

paperspace machines availability --region "Europe (AMS1)" --machineType "P5000"

I get the following error:

{
  "error": "getaddrinfo EAI_AGAIN api.paperspace.io:443"
}

Which I found upon googling a bit, indicates a proxy error. Once I switch to mobile data, it works fine.

Coming back to the main problem, I am just having a look around as to how to save files, etc. So, I just made a simple python script where I am copying data from one text file and saving it to another one in /storage and moving it from /storage to /artifacts in the file run.sh (below):

python test.py
mv /storage/* /artifacts

And when I run:

paperspace-python run --command "bash run.sh"

I get the following error:

Job Error: Error starting container: Error response from daemon: OCI runtime create failed: 
container_linux.go:296: starting container process caused "process_linux.go:398: container
 init caused \"process_linux.go:381: running prestart hook 1 caused \\\"error running hook: 
exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods 
configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=
cuda>=9.0 --pid=22549 var/lib/docker/overlay2/bfe1e5885d03a3df4a089c16f052cebb5746ef307
ebaec7d2f5c5d5667e14920/merged]\\\\nnvidia-container-cli: initialization error: cuda error: 
no cuda-capable device is detected\\\\n\\\"\"": unknown

That's why I checked the machine availability, but it shows available=True. Any reason why this might be happening?

@sanfilip
Copy link
Contributor

Aman,

That machine lost access to its GPU temporarily. We were able to restart it and restore GPU access. I just ran a test and everything looks good currently, but we will continue to monitor it. Let us know if you see it again. Just send a note to [email protected] and we will escalate it. We may need to take that particular host out of service if if happens again.

@dalmia
Copy link
Author

dalmia commented Apr 11, 2018

Hi, @sanfilip. Oh, it's good to know that it is fixed. I ran a test now too, and it works fine. Just one problem though, due to that GPU problem, I have got many failed tests in my Job runner under Gradient, thereby exhausting my limit of 10 GPU jobs. Is there any way that can be reverted?

@sanfilip
Copy link
Contributor

I'm going to forward your request to support. They should be able to credit you for the failures.

@dalmia
Copy link
Author

dalmia commented Apr 11, 2018

Thanks, @sanfilip. Really appreciate the support!
Email for the account: [email protected]

@dalmia dalmia closed this as completed Apr 11, 2018
@dalmia
Copy link
Author

dalmia commented Apr 12, 2018

I am now trying to run the actual training using paperspace-python using the following command:

paperspace-python run --command "bash run.sh" --workspace autoencoder_train.zip  --req ../requirements.txt --project "Splice Site Prediction" --name "AE train"

But after a while, I get back:

{           
  "error": true, 
  "message": "HTTPSConnectionPool(host='api.paperspace.io', port=443): Max retries
   exceeded with url: /jobs/createJob?project=Splice+Site+Prediction&workspaceFileName= 
   autoencoder_train.zip.zip&container=paperspace%2Ftensorflow-python&machineType=P5000& 
   name=AE+train&command=pip2+install+-r+requirements.txt% 0Abash+run.sh (Caused by
   ProxyError('Cannot connect to proxy.', error(\"(110, 'ETIMEDOUT')\",)))"
}

This is confusing since my test script worked, but this is showing a proxy error. Can you please help me with this?

@dalmia dalmia reopened this Apr 12, 2018
@dalmia
Copy link
Author

dalmia commented Apr 15, 2018

ping @sanfilip @dte

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants