Skip to content
This repository has been archived by the owner on Apr 6, 2023. It is now read-only.

Pytest tests for fixing broken links and parallelism of build script #75

Closed
wants to merge 23 commits into from

Conversation

dangunter
Copy link
Member

Related to two things

Issue #72
Pull request #74 (needs this code to run)

Proposed changes:

  • Add to both the integration tests and component tests
    • for component test, skip if there are no notebooks (maybe not converted)
    • for integration test, fail if no notebooks, since conversion should have been run
    • either way, fail if there are any broken links found
  • Use appropriate build.yml file depending on CI or non-CI environment

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the license terms described in the LICENSE.txt file at the top level of this directory.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

@dangunter dangunter marked this pull request as draft November 11, 2021 18:45
@dangunter dangunter marked this pull request as ready for review November 11, 2021 21:49
@ksbeattie ksbeattie added the Priority:High High Priority Issue or PR label Nov 18, 2021
Copy link
Contributor

@lbianchi-lbl lbianchi-lbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. There are lots of improvements big and small which I think will be very useful both for maintainability and usability.

I've done a short "test drive" running this locally, and most things seem to work:

  • Using Ctrl-C to cancel a run now works as expected (much appreciated by @ksbeattie and I!)

  • The parallelism works out of the box with the default options

    • I also like that now the logging shows the PID of the main process as well as the individual workers' indices
    • I've encountered a few failures in the notebook builds, but most of them look more likely to be due to the notebooks themselves (e.g. some incompatibility in the IDAES/Pyomo version in my hastily-created Conda env) than the build script
    • The only thing that looks suspicious is this error, since ZMQ is part of the Jupyter/IPython infrastructure:
     Traceback (most recent call last):
    File "/opt/conda/envs/examples-pse/lib/python3.8/runpy.py", line 194, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/opt/conda/envs/examples-pse/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel_launcher.py", line 16, in <module>
      app.launch_new_instance()
    File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/traitlets/config/application.py", line 845, in launch_instance
      app.initialize(argv)
    File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/traitlets/config/application.py", line 88, in inner
      return method(app, *args, **kwargs)
    File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 632, in initialize
      self.init_sockets()
    File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 287, in init_sockets
      self.stdin_port = self._bind_socket(self.stdin_socket, self.stdin_port)
    File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 229, in _bind_socket
      return self._try_bind_socket(s, port)
    File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 205, in _try_bind_socket
      s.bind("tcp://%s:%i" % (self.ip, port))
    File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/zmq/sugar/socket.py", line 214, in bind
      super().bind(addr)
    File "zmq/backend/cython/socket.pyx", line 540, in zmq.backend.cython.socket.Socket.bind
    File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
    zmq.error.ZMQError: Address already in use
    

I've marked "Request changes" because I had some questions about the implementation, but I don't think I have any blocking comment. Feel free to go through my comments and dismiss/resolve them and we can get this in quickly.

Comment on lines +105 to +108
def get_build_config():
if os.environ.get("GITHUB_ACTIONS", False):
return "build-ci.yml"
return "build.yml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Comment on lines -45 to +44
args: "-b html -T docs_test docs_test/_build/html"
args: "-b html -T {output} {output}/{html}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

timeout = self._timeout
if timeout < 60:
# force at least 10 second timeout
timeout = max(timeout, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout is set here to the adjusted value, but then the original value self._timeout is passed later to the ParallelNotebookWorker. Is this expected?

Comment on lines +578 to +591
wait_time = f"{timeout} seconds"
else:
if timeout // 60 * 60 == timeout:
wait_time = f"{timeout // 60} minute{'' if timeout == 60 else 's'}"
else:
sec = timeout - (timeout // 60 * 60)
wait_time = (
f"{timeout // 60} minute{'' if timeout == 60 else 's'}, "
f"{sec} second{'' if sec == 1 else 's'}"
)
notify(
f"Convert notebooks with {num_workers} "
f"worker{'' if num_workers == 1 else 's'}. Timeout after {wait_time}."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this a bit hard to follow, but I guess it is not crucial since as far as I can tell it's only used to produce a human-readable value for timeout to be used for notify(). Maybe it could be "swept under" a utility function _get_human_readable_duration_str(duration_s) but it's a clean-code nit.

Comment on lines -570 to +609
b = bossy.Bossy(
jobs,
num_workers=num_workers,
worker_function=worker.convert,
output_log=_log,
pool = mproc.Pool(num_workers)
num_jobs = len(jobs)
log_level = _log.getEffectiveLevel()
ar = pool.map_async(
worker.convert,
((i + 1, jobs[i], log_level) for i in range(num_jobs)),
callback=self._convert_success,
error_callback=self._convert_failure,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍 bossy.py, don't let the door hit you on your way out, you won't be missed!

Comment on lines +1605 to +1613
"# Execute Jupyter notebooks. This is slow.\n"
"# Only those notebooks that have not changed since the last time\n"
"# this was run will be re-executed.\n"
"{command} --exec\n"
"{command} -e # <-- short option\n"
"\n"
"# Convert Jupyter notebooks, as in previous command,\n"
"# then build Sphinx documentation.\n"
"# This can be combined with -r/--remove to convert all notebooks.\n"
"{command} -cd\n"
"# Copy Jupyter notebooks into docs. This is quick.\n"
"{command} --copy\n"
"{command} -y # <-- short option\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the very useful "this is quick/slow" added here.

Comment on lines -1583 to +1666
help="Run notebooks but do not convert them.",
help="Execute notebooks (do not copy them into docs)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like using exec instead of test, I think it's more clear.

Comment on lines 19 to 20
_root = os.path.join(os.path.dirname(__file__), "..")
sys.path.insert(0, _root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be irrational but these make me feel slightly concerned, especially when mixed with os.chdir(_root) used later. I'm not sure I have better alternatives off the top of my head but maybe using a context manager (with change_working_dir(_root): ...) would be enough to assuage my fears.

proc = subprocess.Popen(cmd)
proc.wait()
assert proc.returncode == 0
find_broken_links()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, the linkchecking is run when find_broken_links() is called. Is there a reason why we need to perform this in test_parse_notebook() and again, separately, in test_broken_links()?

Comment on lines +61 to +90
def find_broken_links(rebuild=True):
"""Run the Sphinx link checker.

This was created in response to a number of broken links in Jupyter notebook
cells, but would also find broken links in any documentation pages.
"""
os.chdir(_root)
config = get_build_config()
config_dict = load_build_config(config)
# Copy notebooks to docs. -S suppresses Sphinx output.
args = ["python", "build.py", "--config", config, "-Sy"]
proc = subprocess.Popen(args)
rc = proc.wait()
assert rc == 0, "Copying notebooks to docs failed"
# Run linkchecker (-l). -S suppresses Sphinx output.
# output will be in dir configured in sphinx.linkcheck_dir (see below)
proc = subprocess.Popen(["python", "build.py", "--config", config, "-Sl"])
rc = proc.wait()
assert rc == 0, "Linkchecker process failed"
# find links marked [broken], report them
link_file = Path(".") / config_dict["sphinx"]["linkcheck_dir"] / "output.json"
assert link_file.exists()
links = []
for line in link_file.open(mode="r", encoding="utf-8"):
obj = json.loads(line)
if obj["status"] == "broken":
num = len(links) + 1
links.append(f"{num}) {obj['filename']}:{obj['lineno']} -> {obj['uri']}")
# fail if there were any broken links
assert len(links) == 0, f"{len(links)} broken links:\n" f"{newline.join(links)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fully aware that I'm saying this as a pytest.fixture evangelist, but I think that extracting the test setup (i.e. running the command) and the test (i.e. the assertion that the command was successful) would be tidier, not to mention that it would open up the possibility to parametrize some of the tests, e.g. having a command fixture that's parametrized with the various CLI flags.

On the other hand, I'm also aware that this is firmly in "testing the tests" territory and so this might be overthinking it, so let's just consider this a note for possible future work if/when we need to make changes to these tests in the future.

@ksbeattie ksbeattie added Priority:Normal Normal Priority Issue or PR and removed Priority:High High Priority Issue or PR labels Mar 24, 2022
@ksbeattie ksbeattie changed the title Pytest tests for fixing broken links Pytest tests for fixing broken links and parallelism of build script Apr 14, 2022
@lbianchi-lbl
Copy link
Contributor

This PR has "decayed" a bit but the build.py infrastructure changes are still relevant and useful, so I'd call it somewhere in between "fixer-upper" and "scrap for parts". Since either of these options would require a non-trivial amount of work, I propose to revisit it immediately after the Aug release.

@ksbeattie
Copy link
Member

Seem like this PR needs to be re-created?

@ksbeattie
Copy link
Member

This will be replaced by an entire re-org of the examples-pse repo (in the works)

@ksbeattie ksbeattie closed this Nov 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Priority:Normal Normal Priority Issue or PR
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants