-
Notifications
You must be signed in to change notification settings - Fork 10
/
HOWTO CLUSTER SETUP.txt
153 lines (95 loc) · 4.57 KB
/
HOWTO CLUSTER SETUP.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
This file is part of PyBroMo: a single-molecule Brownian motion diffusion
simulator for confocal smFRET experiments:
* http://tritemio.github.io/PyBroMo/
Introduction
============
This is a quick howto of how to setup an IPython cluster.
For more info refer to the IPython documentation:
http://ipython.org/ipython-doc/dev/parallel/parallel_process.html
Requirements
============
You need to install IPython. The easiest way is to get it through
a scientific python distribution, like Anaconda.
Parallel computing on a single machine
======================================
Method 1
--------
Launch the notebook server and, from the cluster tab, start 4 engines.
Method 2
--------
Open a terminal (cmd.exe) and type:
ipcluster start -n 4
Parallel computing on many machines (Windows 7)
===============================================
IPython docs:
http://ipython.org/ipython-doc/dev/parallel/parallel_process.html#starting-the-controller-and-engines-on-different-hosts
Here we configure 2 machines, one controller host that launch the simulation
and one "slave" host that performs the computation. This procedure can be
extended to multiple "slave" machine just repeating this same configuration.
Windows note
------------
All the commands must be pasted in a cmd.exe terminal.
Setup the controller
--------------------
Only the first time we need to create an ipython profile.
ipython profile create --parallel --profile=parallel
This command copies a new set of configuration files in
IPYTHONDIR/profile_parallel, where IPYTHONDIR is usually a folder named
.ipython in the user home folder (C:\Users\username\). These files can be
customized to change the default behaviour, if needed.
Now, each time we want to start a parallel computation we begin starting
the controller:
ipcontroller --profile=parallel --ip=169.232.130.141
(where you have to specify the controller ip address)
This command creates a file ipcontroller-engine.json that contains
the connection info that the other machines need in order to connect to the
controller.
The file is located in IPYTHONDIR/profile_parallel/security.
We need to copy ipcontroller-engine.json to the computation machine.
To automate this step I like to link the IPython folder into a Dropbox folder
so that all the configuration files are automatically copied/updated on
the different machines.
Setup the "slave" machine
-------------------------
Also on the machine in which we run the computation it's useful to create
a profile (only the first time), with the same command as before:
ipython profile create --parallel --profile=parallel
A new set of configuration files is created in
IPYTHONDIR/profile_parallel.
We can start a computation engine with the ipengine command, specifing the
path of the ipcontroller-engine.json file:
ipengine --profile=parallel --file=C:\Data\user\software\Dropbox\ipython\profile_parallel\security\ipcontroller-engine.json
or, we can write the file name in the configuration file so we don't need
to write it every time. To do so, edit the file ipengine_config.py
found in the previously created profile folder (IPYTHONDIR/profile_parallel).
Find the line:
#c.IPEngineApp.url_file = u''
remove the trailing # and write the ipcontroller-engine.json path, in our
example:
c.IPEngineApp.url_file = u'C:\Data\user\software\Dropbox\ipython\profile_parallel\security\ipcontroller-engine.json'
Now to launch an engine simply type:
ipengine --profile=parallel
It is suggested to launch as many engine as the number of cores. To launch
a second engine open a new terminal and type again the command, and so on.
To add another machine for computation just repeat the previous steps.
Launching the simulation
========================
Once the cluster is started (either in a single machine or on multiple
machines) we are ready to launch a simulation.
On the controller machine start an IPython QtConsole or an
IPython notebook using the profile "parallel":
ipython qtconsole --profile=parallel
or
ipython notebook --profile=parallel
Then do:
from IPython.parallel import Client
rc = Client()
rc.ids
the last command shoud print the number of engines that were started.
Alternatively, if you have a qtconsole or notebook already started without
the profile parallel you can simply specify the path of the file
that contains the clients (engines) information. This file is
ipcontroller-client.json (not -engines as before!) and is located in the
profile folder.
This trick is used by the PyBroMo notebooks so you don't need
to restart the notebook server after you launch the cluster.