Pytest tests for fixing broken links and parallelism of build script #75

dangunter · 2021-11-11T18:45:32Z

Related to two things

Issue #72
Pull request #74 (needs this code to run)

Proposed changes:

Add to both the integration tests and component tests
- for component test, skip if there are no notebooks (maybe not converted)
- for integration test, fail if no notebooks, since conversion should have been run
- either way, fail if there are any broken links found
Use appropriate build.yml file depending on CI or non-CI environment

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

I agree my contributions are submitted under the license terms described in the LICENSE.txt file at the top level of this directory.
I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

…ng the notebooks

lbianchi-lbl

This looks great. There are lots of improvements big and small which I think will be very useful both for maintainability and usability.

I've done a short "test drive" running this locally, and most things seem to work:

Using Ctrl-C to cancel a run now works as expected (much appreciated by @ksbeattie and I!)

The parallelism works out of the box with the default options

I also like that now the logging shows the PID of the main process as well as the individual workers' indices
I've encountered a few failures in the notebook builds, but most of them look more likely to be due to the notebooks themselves (e.g. some incompatibility in the IDAES/Pyomo version in my hastily-created Conda env) than the build script
The only thing that looks suspicious is this error, since ZMQ is part of the Jupyter/IPython infrastructure:

 Traceback (most recent call last):
File "/opt/conda/envs/examples-pse/lib/python3.8/runpy.py", line 194, in _run_module_as_main
  return _run_code(code, main_globals, None,
File "/opt/conda/envs/examples-pse/lib/python3.8/runpy.py", line 87, in _run_code
  exec(code, run_globals)
File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel_launcher.py", line 16, in <module>
  app.launch_new_instance()
File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/traitlets/config/application.py", line 845, in launch_instance
  app.initialize(argv)
File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/traitlets/config/application.py", line 88, in inner
  return method(app, *args, **kwargs)
File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 632, in initialize
  self.init_sockets()
File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 287, in init_sockets
  self.stdin_port = self._bind_socket(self.stdin_socket, self.stdin_port)
File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 229, in _bind_socket
  return self._try_bind_socket(s, port)
File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 205, in _try_bind_socket
  s.bind("tcp://%s:%i" % (self.ip, port))
File "/opt/conda/envs/examples-pse/lib/python3.8/site-packages/zmq/sugar/socket.py", line 214, in bind
  super().bind(addr)
File "zmq/backend/cython/socket.pyx", line 540, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Address already in use

I've marked "Request changes" because I had some questions about the implementation, but I don't think I have any blocking comment. Feel free to go through my comments and dismiss/resolve them and we can get this in quickly.

lbianchi-lbl · 2021-11-19T01:23:28Z

tests/test_notebooks.py

+def get_build_config():
+    if os.environ.get("GITHUB_ACTIONS", False):
+        return "build-ci.yml"
+    return "build.yml"


lbianchi-lbl · 2021-11-19T01:24:11Z

build-ci.yml

-    args: "-b html -T docs_test docs_test/_build/html"
+    args: "-b html -T {output} {output}/{html}"


lbianchi-lbl · 2021-11-19T01:28:20Z

build.py

+        timeout = self._timeout
+        if timeout < 60:
+            # force at least 10 second timeout
+            timeout = max(timeout, 10)


timeout is set here to the adjusted value, but then the original value self._timeout is passed later to the ParallelNotebookWorker. Is this expected?

lbianchi-lbl · 2021-11-19T01:34:51Z

build.py

+            wait_time = f"{timeout} seconds"
+        else:
+            if timeout // 60 * 60 == timeout:
+                wait_time = f"{timeout // 60} minute{'' if timeout == 60 else 's'}"
+            else:
+                sec = timeout - (timeout // 60 * 60)
+                wait_time = (
+                    f"{timeout // 60} minute{'' if timeout == 60 else 's'}, "
+                    f"{sec} second{'' if sec == 1 else 's'}"
+                )
+        notify(
+            f"Convert notebooks with {num_workers} "
+            f"worker{'' if num_workers == 1 else 's'}. Timeout after {wait_time}."
+        )


I found this a bit hard to follow, but I guess it is not crucial since as far as I can tell it's only used to produce a human-readable value for timeout to be used for notify(). Maybe it could be "swept under" a utility function _get_human_readable_duration_str(duration_s) but it's a clean-code nit.

lbianchi-lbl · 2021-11-19T01:36:19Z

build.py

-        b = bossy.Bossy(
-            jobs,
-            num_workers=num_workers,
-            worker_function=worker.convert,
-            output_log=_log,
+        pool = mproc.Pool(num_workers)
+        num_jobs = len(jobs)
+        log_level = _log.getEffectiveLevel()
+        ar = pool.map_async(
+            worker.convert,
+            ((i + 1, jobs[i], log_level) for i in range(num_jobs)),
+            callback=self._convert_success,
+            error_callback=self._convert_failure,


Nice 👍 bossy.py, don't let the door hit you on your way out, you won't be missed!

lbianchi-lbl · 2021-11-19T01:49:41Z

build.py

+        "# Execute Jupyter notebooks. This is slow.\n"
+        "# Only those notebooks that have not changed since the last time\n"
+        "# this was run will be re-executed.\n"
+        "{command} --exec\n"
+        "{command} -e  # <-- short option\n"
        "\n"
-        "# Convert Jupyter notebooks, as in previous command,\n"
-        "# then build Sphinx documentation.\n"
-        "# This can be combined with -r/--remove to convert all notebooks.\n"
-        "{command} -cd\n"
+        "# Copy Jupyter notebooks into docs. This is quick.\n"
+        "{command} --copy\n"
+        "{command} -y  # <-- short option\n"


I like the very useful "this is quick/slow" added here.

lbianchi-lbl · 2021-11-19T01:50:31Z

build.py

-        help="Run notebooks but do not convert them.",
+        help="Execute notebooks (do not copy them into docs)",


I like using exec instead of test, I think it's more clear.

lbianchi-lbl · 2021-11-19T01:55:25Z

tests/test_notebooks.py

 _root = os.path.join(os.path.dirname(__file__), "..")
 sys.path.insert(0, _root)


This might be irrational but these make me feel slightly concerned, especially when mixed with os.chdir(_root) used later. I'm not sure I have better alternatives off the top of my head but maybe using a context manager (with change_working_dir(_root): ...) would be enough to assuage my fears.

lbianchi-lbl · 2021-11-19T01:58:42Z

tests/test_notebooks.py

    proc = subprocess.Popen(cmd)
    proc.wait()
    assert proc.returncode == 0
+    find_broken_links()


As I understand, the linkchecking is run when find_broken_links() is called. Is there a reason why we need to perform this in test_parse_notebook() and again, separately, in test_broken_links()?

lbianchi-lbl · 2021-11-19T02:04:13Z

tests/test_notebooks.py

+def find_broken_links(rebuild=True):
+    """Run the Sphinx link checker.
+
+    This was created in response to a number of broken links in Jupyter notebook
+    cells, but would also find broken links in any documentation pages.
+    """
+    os.chdir(_root)
+    config = get_build_config()
+    config_dict = load_build_config(config)
+    # Copy notebooks to docs. -S suppresses Sphinx output.
+    args = ["python", "build.py", "--config", config, "-Sy"]
+    proc = subprocess.Popen(args)
+    rc = proc.wait()
+    assert rc == 0, "Copying notebooks to docs failed"
+    # Run linkchecker (-l). -S suppresses Sphinx output.
+    # output will be in dir configured in sphinx.linkcheck_dir (see below)
+    proc = subprocess.Popen(["python", "build.py", "--config", config, "-Sl"])
+    rc = proc.wait()
+    assert rc == 0, "Linkchecker process failed"
+    # find links marked [broken], report them
+    link_file = Path(".") / config_dict["sphinx"]["linkcheck_dir"] / "output.json"
+    assert link_file.exists()
+    links = []
+    for line in link_file.open(mode="r", encoding="utf-8"):
+        obj = json.loads(line)
+        if obj["status"] == "broken":
+            num = len(links) + 1
+            links.append(f"{num}) {obj['filename']}:{obj['lineno']} -> {obj['uri']}")
+    # fail if there were any broken links
+    assert len(links) == 0, f"{len(links)} broken links:\n" f"{newline.join(links)}"


I'm fully aware that I'm saying this as a pytest.fixture evangelist, but I think that extracting the test setup (i.e. running the command) and the test (i.e. the assertion that the command was successful) would be tidier, not to mention that it would open up the possibility to parametrize some of the tests, e.g. having a command fixture that's parametrized with the various CLI flags.

On the other hand, I'm also aware that this is firmly in "testing the tests" territory and so this might be overthinking it, so let's just consider this a note for possible future work if/when we need to make changes to these tests in the future.

lbianchi-lbl · 2022-07-28T19:02:33Z

This PR has "decayed" a bit but the build.py infrastructure changes are still relevant and useful, so I'd call it somewhere in between "fixer-upper" and "scrap for parts". Since either of these options would require a non-trivial amount of work, I propose to revisit it immediately after the Aug release.

ksbeattie · 2022-10-13T18:50:17Z

Seem like this PR needs to be re-created?

ksbeattie · 2022-11-03T18:53:42Z

This will be replaced by an entire re-org of the examples-pse repo (in the works)

dangunter added 8 commits November 10, 2021 15:52

added an option to run the linkchecker from build.py

0016bed

Fix broken link

164f783

Fix broken links

be8666f

add linkcheck dir

65fe84e

fix broken link

25aca87

Merge branch 'python_buildpy_linkcheck' into fix_broken_links

cd7efcc

broken link test

52f45ba

useful message

b1070da

dangunter marked this pull request as draft November 11, 2021 18:45

dangunter added 2 commits November 11, 2021 10:57

skip if no notebooks, for component test

9914a1d

formatting

fb984dc

dangunter marked this pull request as ready for review November 11, 2021 21:49

dangunter requested review from lbianchi-lbl, blnicho and ksbeattie November 11, 2021 21:49

dangunter added 10 commits November 11, 2021 15:40

build docs before running linkcheck

8d4ceec

Build docs earlier

5d67692

Use regular docs dir instead of docs_test (??)

673eb74

Simplify

777c81c

Separate notebook execution and copying into docs

c7db22f

Go back to docs_test but remove the test_mode

888187e

Use new 'copy' command to find broken links without necessarily runni…

427f286

…ng the notebooks

Switched parallel run over to use multiprocessing.map_async

6b5ed52

index page tests and some cleanup

0cd4a34

fixed index

af55e9b

ksbeattie added the Priority:High High Priority Issue or PR label Nov 18, 2021

lbianchi-lbl suggested changes Nov 19, 2021

View reviewed changes

Merge branch 'main' into fix_broken_links

fb1571b

lbianchi-lbl added 2 commits November 30, 2021 16:43

Merge branch 'main' into fix_broken_links

45a8cac

Install pandoc using Conda

8a5f622

ksbeattie added Priority:Normal Normal Priority Issue or PR and removed Priority:High High Priority Issue or PR labels Mar 24, 2022

ksbeattie changed the title ~~Pytest tests for fixing broken links~~ Pytest tests for fixing broken links and parallelism of build script Apr 14, 2022

ksbeattie mentioned this pull request Apr 14, 2022

Perform link checking in (generated) documentation IDAES/idaes-pse#389

Closed

ksbeattie closed this Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytest tests for fixing broken links and parallelism of build script #75

Pytest tests for fixing broken links and parallelism of build script #75

dangunter commented Nov 11, 2021

lbianchi-lbl left a comment

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl Nov 19, 2021

lbianchi-lbl commented Jul 28, 2022

ksbeattie commented Oct 13, 2022

ksbeattie commented Nov 3, 2022

		args: "-b html -T docs_test docs_test/_build/html"
		args: "-b html -T {output} {output}/{html}"

		help="Run notebooks but do not convert them.",
		help="Execute notebooks (do not copy them into docs)",

		_root = os.path.join(os.path.dirname(__file__), "..")
		sys.path.insert(0, _root)

Pytest tests for fixing broken links and parallelism of build script #75

Pytest tests for fixing broken links and parallelism of build script #75

Conversation

dangunter commented Nov 11, 2021

Related to two things

Proposed changes:

Legal Acknowledgement

lbianchi-lbl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbianchi-lbl commented Jul 28, 2022

ksbeattie commented Oct 13, 2022

ksbeattie commented Nov 3, 2022