-
Notifications
You must be signed in to change notification settings - Fork 36
Pytest tests for fixing broken links and parallelism of build script #75
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. There are lots of improvements big and small which I think will be very useful both for maintainability and usability.
I've done a short "test drive" running this locally, and most things seem to work:
-
Using Ctrl-C to cancel a run now works as expected (much appreciated by @ksbeattie and I!)
-
The parallelism works out of the box with the default options
- I also like that now the logging shows the PID of the main process as well as the individual workers' indices
- I've encountered a few failures in the notebook builds, but most of them look more likely to be due to the notebooks themselves (e.g. some incompatibility in the IDAES/Pyomo version in my hastily-created Conda env) than the build script
- The only thing that looks suspicious is this error, since ZMQ is part of the Jupyter/IPython infrastructure:
Traceback (most recent call last): File "/opt/conda/envs/examples-pse/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/envs/examples-pse/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel_launcher.py", line 16, in <module> app.launch_new_instance() File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/traitlets/config/application.py", line 845, in launch_instance app.initialize(argv) File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/traitlets/config/application.py", line 88, in inner return method(app, *args, **kwargs) File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 632, in initialize self.init_sockets() File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 287, in init_sockets self.stdin_port = self._bind_socket(self.stdin_socket, self.stdin_port) File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 229, in _bind_socket return self._try_bind_socket(s, port) File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 205, in _try_bind_socket s.bind("tcp://%s:%i" % (self.ip, port)) File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/zmq/sugar/socket.py", line 214, in bind super().bind(addr) File "zmq/backend/cython/socket.pyx", line 540, in zmq.backend.cython.socket.Socket.bind File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc zmq.error.ZMQError: Address already in use
I've marked "Request changes" because I had some questions about the implementation, but I don't think I have any blocking comment. Feel free to go through my comments and dismiss/resolve them and we can get this in quickly.
def get_build_config(): | ||
if os.environ.get("GITHUB_ACTIONS", False): | ||
return "build-ci.yml" | ||
return "build.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
args: "-b html -T docs_test docs_test/_build/html" | ||
args: "-b html -T {output} {output}/{html}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
timeout = self._timeout | ||
if timeout < 60: | ||
# force at least 10 second timeout | ||
timeout = max(timeout, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout
is set here to the adjusted value, but then the original value self._timeout
is passed later to the ParallelNotebookWorker
. Is this expected?
wait_time = f"{timeout} seconds" | ||
else: | ||
if timeout // 60 * 60 == timeout: | ||
wait_time = f"{timeout // 60} minute{'' if timeout == 60 else 's'}" | ||
else: | ||
sec = timeout - (timeout // 60 * 60) | ||
wait_time = ( | ||
f"{timeout // 60} minute{'' if timeout == 60 else 's'}, " | ||
f"{sec} second{'' if sec == 1 else 's'}" | ||
) | ||
notify( | ||
f"Convert notebooks with {num_workers} " | ||
f"worker{'' if num_workers == 1 else 's'}. Timeout after {wait_time}." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this a bit hard to follow, but I guess it is not crucial since as far as I can tell it's only used to produce a human-readable value for timeout
to be used for notify()
. Maybe it could be "swept under" a utility function _get_human_readable_duration_str(duration_s)
but it's a clean-code nit.
b = bossy.Bossy( | ||
jobs, | ||
num_workers=num_workers, | ||
worker_function=worker.convert, | ||
output_log=_log, | ||
pool = mproc.Pool(num_workers) | ||
num_jobs = len(jobs) | ||
log_level = _log.getEffectiveLevel() | ||
ar = pool.map_async( | ||
worker.convert, | ||
((i + 1, jobs[i], log_level) for i in range(num_jobs)), | ||
callback=self._convert_success, | ||
error_callback=self._convert_failure, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍 bossy.py
, don't let the door hit you on your way out, you won't be missed!
"# Execute Jupyter notebooks. This is slow.\n" | ||
"# Only those notebooks that have not changed since the last time\n" | ||
"# this was run will be re-executed.\n" | ||
"{command} --exec\n" | ||
"{command} -e # <-- short option\n" | ||
"\n" | ||
"# Convert Jupyter notebooks, as in previous command,\n" | ||
"# then build Sphinx documentation.\n" | ||
"# This can be combined with -r/--remove to convert all notebooks.\n" | ||
"{command} -cd\n" | ||
"# Copy Jupyter notebooks into docs. This is quick.\n" | ||
"{command} --copy\n" | ||
"{command} -y # <-- short option\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the very useful "this is quick/slow" added here.
help="Run notebooks but do not convert them.", | ||
help="Execute notebooks (do not copy them into docs)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like using exec
instead of test
, I think it's more clear.
_root = os.path.join(os.path.dirname(__file__), "..") | ||
sys.path.insert(0, _root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be irrational but these make me feel slightly concerned, especially when mixed with os.chdir(_root)
used later. I'm not sure I have better alternatives off the top of my head but maybe using a context manager (with change_working_dir(_root): ...
) would be enough to assuage my fears.
proc = subprocess.Popen(cmd) | ||
proc.wait() | ||
assert proc.returncode == 0 | ||
find_broken_links() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, the linkchecking is run when find_broken_links()
is called. Is there a reason why we need to perform this in test_parse_notebook()
and again, separately, in test_broken_links()
?
def find_broken_links(rebuild=True): | ||
"""Run the Sphinx link checker. | ||
|
||
This was created in response to a number of broken links in Jupyter notebook | ||
cells, but would also find broken links in any documentation pages. | ||
""" | ||
os.chdir(_root) | ||
config = get_build_config() | ||
config_dict = load_build_config(config) | ||
# Copy notebooks to docs. -S suppresses Sphinx output. | ||
args = ["python", "build.py", "--config", config, "-Sy"] | ||
proc = subprocess.Popen(args) | ||
rc = proc.wait() | ||
assert rc == 0, "Copying notebooks to docs failed" | ||
# Run linkchecker (-l). -S suppresses Sphinx output. | ||
# output will be in dir configured in sphinx.linkcheck_dir (see below) | ||
proc = subprocess.Popen(["python", "build.py", "--config", config, "-Sl"]) | ||
rc = proc.wait() | ||
assert rc == 0, "Linkchecker process failed" | ||
# find links marked [broken], report them | ||
link_file = Path(".") / config_dict["sphinx"]["linkcheck_dir"] / "output.json" | ||
assert link_file.exists() | ||
links = [] | ||
for line in link_file.open(mode="r", encoding="utf-8"): | ||
obj = json.loads(line) | ||
if obj["status"] == "broken": | ||
num = len(links) + 1 | ||
links.append(f"{num}) {obj['filename']}:{obj['lineno']} -> {obj['uri']}") | ||
# fail if there were any broken links | ||
assert len(links) == 0, f"{len(links)} broken links:\n" f"{newline.join(links)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fully aware that I'm saying this as a pytest.fixture
evangelist, but I think that extracting the test setup (i.e. running the command) and the test (i.e. the assertion that the command was successful) would be tidier, not to mention that it would open up the possibility to parametrize some of the tests, e.g. having a command
fixture that's parametrized with the various CLI flags.
On the other hand, I'm also aware that this is firmly in "testing the tests" territory and so this might be overthinking it, so let's just consider this a note for possible future work if/when we need to make changes to these tests in the future.
This PR has "decayed" a bit but the |
Seem like this PR needs to be re-created? |
This will be replaced by an entire re-org of the examples-pse repo (in the works) |
Related to two things
Issue #72
Pull request #74 (needs this code to run)
Proposed changes:
build.yml
file depending on CI or non-CI environmentLegal Acknowledgement
By contributing to this software project, I agree to the following terms and conditions for my contribution: