Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension to mpi programs save restore bkp #385

Open
wants to merge 98 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
497a751
Adding MPI support continuing
Aug 1, 2020
831c5de
Adding multi process GUI elements
Aug 3, 2020
20c9cf8
Process on focus
Aug 7, 2020
cf7307e
Adding MPI support
Aug 13, 2020
936bc39
Cluster
Aug 15, 2020
b3a0ffa
Adding mossing file
Aug 15, 2020
79e49ba
Nox testing passing
Aug 17, 2020
7bc9406
First pytest for mpi version
Aug 20, 2020
ffc4ae8
Refactoring tests for MPI
Aug 22, 2020
c68ae89
Testing JS with puppeters
Aug 23, 2020
905b189
Merge pull request #1 from cs01/master
Aug 24, 2020
a867e74
Fixing package.json
Aug 24, 2020
c7e1cef
MPI extension working with Pty
Aug 27, 2020
fc29ead
Fixing tests
Aug 29, 2020
49498bb
Fixing format to pass check
Aug 29, 2020
082cb87
Removing garbage
Aug 29, 2020
5d9e4f3
Adding openmpi to the testing machines
Aug 29, 2020
0f96f84
Adding openmpi to the testing machines
Aug 29, 2020
127580a
Adding openmpi to the testing machines (again)
Aug 29, 2020
d54b6d9
Fixing compiling nodes_name
Aug 29, 2020
c9b1ef3
Making the test more general
Aug 29, 2020
965c243
Fixing python test for Azure CI servers
Aug 29, 2020
a0e3f06
Trying 127.0.0.1 for debug session connection
Aug 29, 2020
4979556
Install gdbserver in the CI machines
Aug 29, 2020
cbd0230
Troubleshooting ...
Aug 29, 2020
2d3dec0
Moving from shell to bash
Aug 29, 2020
1260136
Fixing test
Aug 29, 2020
66d1bf4
Fixing Javascript test jest calling
Aug 29, 2020
feacf2c
Fixing Javascript test jest calling
Aug 29, 2020
f4b9a4b
Moving js to ts for the testing script
Aug 29, 2020
2e7a337
Cancel workflow
Aug 29, 2020
eac3fd7
Adding timeout
Aug 29, 2020
a814d29
Checking Adding timeout
Aug 29, 2020
469b73b
Apply small review changes
Aug 29, 2020
cdd0ace
Retry with timeout
Aug 29, 2020
aef0ef2
Timeout to 8 min
Aug 29, 2020
f9ae0a6
Removing __main__ and __init__ in gdbgui-mpi
Aug 29, 2020
f528fff
try to fix CI machines Stuck (still good to review)
Aug 30, 2020
11d55e5
try to fix CI machines Stuck
Aug 30, 2020
a4be66b
CI again ...
Aug 30, 2020
ecf29e4
CI again ...
Aug 30, 2020
2c3ecde
CI again ...
Aug 30, 2020
497703b
CI again ...
Aug 30, 2020
1d7b681
Trying without shell
Aug 30, 2020
889fc82
CI again ...
Aug 30, 2020
c7ee38d
CI again ...
Aug 30, 2020
efc49b7
CI again ...
Aug 30, 2020
9627de0
CI again ...
Aug 30, 2020
6538702
Moving build before test in js_tests ...
Aug 30, 2020
9648fd3
CI run after nox fix in previous commit
Aug 30, 2020
cbb95ef
Fixing lint test
Aug 30, 2020
cb1c7b1
Moving requirement of flaskio to setup.py
Aug 30, 2020
b213761
Merge pull request #2 from cs01/master
Jan 16, 2021
79fdd65
Fixing gdbgui with OpenFPM
Jan 18, 2021
9b94f15
Fixing conflicts
Jan 25, 2021
e8e0deb
Fixing github
Jan 25, 2021
1cb1950
Running setup.py before python testing
Jan 25, 2021
3273965
Adding python 3.9 amd fixing tests
Jan 25, 2021
259460d
Adding more time for print node file
Jan 25, 2021
e15ae69
Adding additional output in case of error
Jan 25, 2021
580ffab
More output
Jan 25, 2021
f42ed39
Fixing launching of print nodes
Jan 25, 2021
7f14fe5
Fixing launching of debugger
Jan 25, 2021
a9d89f4
Fixing lint
Jan 25, 2021
5088b53
Fixing lint and tests
Jan 25, 2021
b2a76be
Increase timeout limit
Jan 25, 2021
5583e09
Increase timeout limit
Jan 26, 2021
0af6893
Increase timeout limit
Jan 26, 2021
66e13ec
Eliminating potential deadlock
Jan 26, 2021
ea666ac
Reducing to two sessions instead of six
Jan 26, 2021
923a620
Forcing error t checl
Jan 26, 2021
ddebde7
Checking process.terminate ... does not terminate on github
Jan 26, 2021
9e22d9c
Checking process.terminate ... does not terminate on github
Jan 26, 2021
81f9565
Checking process.terminate ... does not terminate on github
Jan 26, 2021
e277bbd
More analyze
Jan 26, 2021
25d5a5e
More analyze
Jan 26, 2021
6c6abb1
More analyze
Jan 26, 2021
b0db403
More robust detection
Jan 26, 2021
6c27678
Fixing tests
Jan 26, 2021
1b7369d
Fixing tests
Jan 26, 2021
c8048e7
Fixing tests
Jan 26, 2021
973ac13
Fixing pagination and sigint
Apr 21, 2021
e0f6c9d
Fixing syntax
Apr 21, 2021
3e71552
Fixing syntax
Apr 21, 2021
37a6353
Fixing syntax
Apr 21, 2021
0c27d40
Fixing syntax
Apr 21, 2021
39ee11f
Fixing lint
Apr 21, 2021
93831e6
Fixing lint
Apr 22, 2021
8d15c97
Fixing test
Apr 22, 2021
6af6f3f
Fixing SIGINT for MPI
Apr 22, 2021
38c1e75
Set timeout for launching gdbgui server
Apr 22, 2021
01059cd
Set timeout for launching gdbgui server
Apr 22, 2021
d261481
Set timeout for launching gdbgui server
Apr 22, 2021
3d6c4aa
Fixing lint
Apr 22, 2021
9e2a8ae
Fixing test timeout
Apr 22, 2021
9133063
Adding saving breakpoints on cookies
Apr 25, 2021
48c805a
Fixing breakpoints saved on cookies + activate tests
Apr 25, 2021
e85a796
Fixing format
Apr 25, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,14 @@ on:
push:
branches:
- master
- extension_to_mpi_programs
- extension_to_mpi_programs_save_restore_bkp
release:

jobs:
run_tests:
runs-on: ${{ matrix.os }}
timeout-minutes: 8
strategy:
matrix:
os: [ubuntu-latest]
Expand All @@ -27,9 +30,13 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install nox
- name: Install gdb ubuntu
- name: Install gdb gdbserver and openmpi ubuntu
run: |
sudo apt-get install gdb
sudo apt-get update
sudo apt-get install gdb gdbserver openmpi-bin libopenmpi-dev
- name: Compiling node_names
run: |
cd gdbgui-mpi && ./compile.sh
- name: Execute Tests
run: |
nox --non-interactive --session tests-${{ matrix.python-version }}
Expand Down
6 changes: 6 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ graft gdbgui
# are generated
graft gdbgui/static/js

include gdbgui-mpi/compile.sh
include gdbgui-mpi/launch_gdb_server
include gdbgui-mpi/launch_mpi_debugger
include gdbgui-mpi/main.cpp

prune examples
prune downloads
prune screenshots
Expand All @@ -15,6 +20,7 @@ prune docs
prune docker
prune images
prune gdbgui/__pycache__
prune gdbgui/server/__pycache__

exclude mypy.ini
exclude .eslintrc.json
Expand Down
3 changes: 3 additions & 0 deletions gdbgui-mpi/compile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#! /bin/bash

mpic++ -g -O0 -o print_nodes main.cpp
8 changes: 8 additions & 0 deletions gdbgui-mpi/launch_gdb_server
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/sh
# Usage: debug_server.sh <executable> <arguments>

GDB_HOST=$(hostname)
GDB_PORT=$(( 60000 + $OMPI_COMM_WORLD_RANK ))
echo "GDB server for rank $OMPI_COMM_WORLD_RANK available on $GDB_HOST:$GDB_PORT"
exec gdbserver :$GDB_PORT $*

14 changes: 14 additions & 0 deletions gdbgui-mpi/launch_mpi_debugger
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

mpirun --oversubscribe -np $1 $DIR/print_nodes
if [ ! -d gdbgui-mpi ]; then
mkdir gdbgui-mpi
fi
mv nodes_name gdbgui-mpi/
mpirun --oversubscribe -np $1 $DIR/launch_gdb_server ${@:2} # &
rm nodes_name
#python -m gdbgui-mpi


81 changes: 81 additions & 0 deletions gdbgui-mpi/main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
#include <string>
#include <sstream>
#include <iostream>

int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);

// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

char name_[256];
int ret = gethostname(name_,256);
std::string name(name_);

if (ret == 0)
{
int name_max_s = name.size()+1;
int name_max_r = 0;

MPI_Allreduce(&name_max_s,&name_max_r,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);

std::stringstream ss;

ss.width(10);
ss << std::left << world_rank;
ss.width(name_max_r);
ss << name << std::endl;

std::string proc_name = ss.str();

if (world_rank == 0)
{
char * nodes;
nodes = new char [(proc_name.size()+1)*world_size];
MPI_Gather(proc_name.c_str(),proc_name.size()+1,MPI_CHAR,
nodes,proc_name.size()+1,MPI_CHAR,0,MPI_COMM_WORLD);

// fiter names
std::stringstream sf;

for (int i = 0 ; i < world_size ; i++)
{
sf << std::string(&nodes[i*(proc_name.size()+1)]);
}

FILE * pFile;
pFile = fopen("nodes_name","w");
if (pFile!=NULL)
{
fputs (sf.str().c_str(),pFile);
fclose (pFile);
}
else
{
printf("Error cannot create nodes_name \n");
}
}
else
{
MPI_Gather(proc_name.c_str(),proc_name.size()+1,MPI_CHAR,
NULL,0,MPI_CHAR,0,MPI_COMM_WORLD);
}
}
else
{
MPI_Abort(MPI_COMM_WORLD,-1);
}

// Finalize the MPI environment.
MPI_Finalize();
}

11 changes: 7 additions & 4 deletions gdbgui/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,12 @@

from gdbgui import __version__
from gdbgui.server.app import app, socketio
from gdbgui.server.constants import DEFAULT_GDB_EXECUTABLE, DEFAULT_HOST, DEFAULT_PORT
from gdbgui.server.server import run_server

from gdbgui.server.constants import (
DEFAULT_GDB_EXECUTABLE,
DEFAULT_HOST,
DEFAULT_PORT,
)
import gdbgui.server.server

logger = logging.getLogger(__name__)
logger.setLevel(logging.WARNING)
Expand Down Expand Up @@ -245,7 +248,7 @@ def main():
"and https://sourceware.org/gdb/onlinedocs/gdb/Starting.html"
)

run_server(
gdbgui.server.server.run_server(
app=app,
socketio=socketio,
host=args.host,
Expand Down
176 changes: 139 additions & 37 deletions gdbgui/server/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,81 @@ def run_gdb_command(message: Dict[str, str]):
emit("error_running_gdb_command", {"message": "gdb is not running"})


@socketio.on("run_gdb_command_mpi", namespace="/gdb_listener")
def run_gdb_command_mpi(message: Dict[str, str]):
"""
Endpoint for a websocket route.
Runs a gdb command over multiple sessions.
Responds only if an error occurs when trying to write the command to
gdb
"""
if message["processor"] != -1:
""" If the command is target we have to handle differently """
cmd = message["cmd"]
cmds_tcheck = cmd[0].split(" ")
if cmds_tcheck[0] == "-target-select" and cmds_tcheck[1] == "remote":
debug_session = manager.debug_session_from_mpi_processor_id(-1)
if debug_session is not None:
debug_session.set_mpi_rank(int(cmds_tcheck[2].split(":")[1]) - 60000)
debug_session = manager.debug_session_from_mpi_processor_id(
int(message["processor"])
)
if debug_session is not None:
pty_mi = debug_session.pygdbmi_controller
if pty_mi is not None:
try:
# the command (string) or commands (list) to run
cmds = message["cmd"]
for cmd in cmds:
pty_mi.write(
cmd + "\n",
timeout_sec=0,
raise_error_on_timeout=False,
read_response=False,
)

except Exception:
err = traceback.format_exc()
logger.error(err)
emit("error_running_gdb_command", {"message": err})
else:
emit("error_running_gdb_command", {"message": "gdb is not running"})
else:
"""
execute the command for all controllers
"""
for debug_session, client_ids in manager.get_controllers().items():
if debug_session is None:
continue
try:
# the command (string) or commands (list) to run
cmd = message["cmd"]
pty_mi = debug_session.pygdbmi_controller
if pty_mi is not None:
try:
# the command (string) or commands (list) to run
cmds = message["cmd"]
for cmd in cmds:
pty_mi.write(
cmd + "\n",
timeout_sec=0,
raise_error_on_timeout=False,
read_response=False,
)

except Exception:
err = traceback.format_exc()
logger.error(err)
emit("error_running_gdb_command", {"message": err})
# debug_session.write(cmd, read_response=False)
# in case is the connection command take the port number to understand the mpi process rank

except Exception:
err = traceback.format_exc()
logger.error(err)
emit("error_running_gdb_command", {"message": err})


def send_msg_to_clients(client_ids, msg, error=False):
"""Send message to all clients"""
if error:
Expand Down Expand Up @@ -221,50 +296,77 @@ def test_disconnect():
print("Client websocket disconnected", request.sid)


def read_and_forward_gdb_and_pty_output():
"""A task that runs on a different thread, and emits websocket messages
of gdb responses"""
@socketio.on("open_mpi_sessions", namespace="/gdb_listener")
def open_mpi_sessions(message):
"""
In MPI we kill all old sessions and we open new sessions
"""

while True:
socketio.sleep(0.05)
debug_sessions_to_remove = []
for debug_session, client_ids in manager.debug_session_to_client_ids.items():
manager.exit_all_gdb_processes_except_client_id(request.sid)

for i in range(1, int(message["processors"])):
gdb_command = request.args.get("gdb_command", app.config["gdb_command"])
mi_version = request.args.get("mi_version", "mi2")
manager.add_new_debug_session(
gdb_command=gdb_command, mi_version=mi_version, client_id=request.sid
)


def process_controllers_out():
debug_sessions_to_remove = []
for debug_session, client_ids in manager.debug_session_to_client_ids.items():
try:
try:
try:
response = debug_session.pygdbmi_controller.get_gdb_response(
timeout_sec=0, raise_error_on_timeout=False
)
response = debug_session.pygdbmi_controller.get_gdb_response(
timeout_sec=0, raise_error_on_timeout=False
)

except Exception:
response = None
send_msg_to_clients(
client_ids,
"The underlying gdb process has been killed. This tab will no longer function as expected.",
error=True,
except Exception:
response = None
send_msg_to_clients(
client_ids,
"The underlying gdb process has been killed. This tab will no longer function as expected.",
error=True,
)
debug_sessions_to_remove.append(debug_session)

if response:
"""Attach processor information"""
for r in response:
r["proc"] = debug_session.mpi_rank
if r["type"] == "notify":
if r["message"] == "thread-group-started":
debug_session.inferior_pid = int(r["payload"]["pid"])

# Here we parse for thread-group-started to get the inferior_pid

for client_id in client_ids:
logger.info("emiting message to websocket client id " + client_id)
socketio.emit(
"gdb_response",
response,
namespace="/gdb_listener",
room=client_id,
)
debug_sessions_to_remove.append(debug_session)
else:
# there was no queued response from gdb, not a problem
pass

if response:
for client_id in client_ids:
logger.info(
"emiting message to websocket client id " + client_id
)
socketio.emit(
"gdb_response",
response,
namespace="/gdb_listener",
room=client_id,
)
else:
# there was no queued response from gdb, not a problem
pass
except Exception:
logger.error(traceback.format_exc())

debug_sessions_to_remove += check_and_forward_pty_output()
for debug_session in set(debug_sessions_to_remove):
manager.remove_debug_session(debug_session)

except Exception:
logger.error(traceback.format_exc())

debug_sessions_to_remove += check_and_forward_pty_output()
for debug_session in set(debug_sessions_to_remove):
manager.remove_debug_session(debug_session)
def read_and_forward_gdb_and_pty_output():
"""A task that runs on a different thread, and emits websocket messages
of gdb responses"""

while True:
socketio.sleep(0.05)
process_controllers_out()


def check_and_forward_pty_output() -> List[DebugSession]:
Expand Down
Loading