[rustdoc] Add `--extract-doctests` command-line flag #134531

GuillaumeGomez · 2024-12-19T19:10:20Z

It was discussed with the Rust-for-Linux project recently that they needed a way to extract doctests so they can modify them and then run them more easily (look for "a way to extract doctests" here).

For now, I output most of ScrapedDoctest fields in JSON format with serde_json. So it outputs the following information:

filename
line
langstr
text

cc @ojeda
r? @notriddle

ojeda · 2024-12-19T19:14:00Z

That was quick, thanks a lot!

@rustbot label A-rust-for-linux

rustbot · 2024-12-20T14:49:12Z

This PR modifies tests/run-make/. If this PR is trying to port a Makefile
run-make test to use rmake.rs, please update the
run-make port tracking issue
so we can track our progress. You can either modify the tracking issue
directly, or you can comment on the tracking issue and link this PR.

cc @jieyouxu

notriddle · 2024-12-20T17:30:04Z

src/doc/rustdoc/src/unstable-features.md

+Using this flag looks like this:
+
+```bash
+$ rustdoc -Zunstable-options --extract-doctests src/lib.rs


Why is this a CLI option and not an output format?

Because I didn't think about it, it's SO MUCH better!

notriddle · 2024-12-20T17:32:50Z

src/doc/rustdoc/src/unstable-features.md

+ * File where they are located.
+ * Line where they are located.
+ * Codeblock attributes (more information about this [here](./write-documentation/documentation-tests.html#attributes)).
+
+The output format is JSON.


That's not enough documentation, even for an unstable feature. You need to say what the actual keys are, and what they do (particularly the subkeys of langstr, which are not self-explanatory). Please include a prettified example JSON document with all of the format features used.

I'm not too sure if we want everything to be documented as docblock attributes list might get longer. Well, I'll list the current one and we can think about it later.

aDotInTheVoid · 2024-12-20T22:50:14Z

I don't think the approach taken here (or manually implementing serde::Serialize for internal structs) is a good one.

It means some of the internal structs are actually part of the public API, but the exact relation is unclear in source code.

I think the right way to structure this is to do what --output-format json does and keep the type definitions for (de)serialization very separate from the all rustc_*/rustdoc internal types, and have them in there own module/crate. This achieves:

It's very clear when a PR adjusts the schema (whereas at the moment, that's not super apparent)
We can just use #[derive(Serialize, Deserialize)] on the module (rather than having to implement manually)
We can republish the module onto crates.io, so it's easy for (non-kernel) rust users (eg nextest, etc)
We can the modules api-docs to describe the schema.

Some other things:

There should be an option (probably the default) to output to a file.
There should be a FORMAT_VERSION field at the root of the output. This was super critical for rustdoc-json to be able to evolve without breaking users in a really terrible way.

aDotInTheVoid · 2024-12-20T22:56:30Z

tests/rustdoc-ui/extract-doctests.rs

+//@ compile-flags:-Z unstable-options --extract-doctests
+//@ normalize-stdout-test: "tests/rustdoc-ui" -> "$$DIR"
+//@ check-pass
+


We probably want a test for how this interacts with:

Implicit/explicit fn main()

Hidding lines with /// # use some::path;

True although currently it's the code as defined in the documentation and no changes on rustdoc side (which I think matches better what rust for linux wants).

Maybe we should provide another field with "rustdoc computed code" where hidden lines and main function wrapping are added by rustdoc. Actually that sounds like a very good idea.

True although currently it's the code as defined in the documentation and no changes on rustdoc side (which I think matches better what rust for linux wants).

Currently we use both hidden lines and the ? support, i.e. the # Ok::<..., ...>(...) syntax (which implies the fn main(), from what I understand).

Some of that post-processing may be easy to do by users, like the hidden lines I assume, but if you walk the source or IR or similar to figure out details (like the crate attributes that you move out of main()), then it may be harder for end users to replicate that properly without a hack. Perhaps it may be possible to export what rustdoc figured about the test, so that users can replicate the post-processing on their side, but customized to their environment.

Having the rustdoc computed code sounds fine since it may be enough for some use cases, but e.g. we currently need to convert on the fly the .unwrap() in the ?-using tests into a custom assert!.

So, for instance, if rustdoc tells us "this test uses ?", "these are the crate attributes I would have moved", etc., then users may be able to easily and reliably construct their own wrappers and e.g. do a check instead of an .unwrap().

Of course, for things like crate attributes, it may be best to remove them so that the end user can re-add them where needed, so it wouldn't be the completely "unaltered" source code. But it could be an "adapted" version. Hidden lines should probably be normal lines in that adapted version.

Perhaps it makes sense to provide all those versions, i.e. the completely unaltered one for those that may want to do something complex or to render the text somewhere, the "adapted" version for customized test environments and the rustdoc computed code for those that can use directly that. In the kernel we would use the "adapted" one, only, I think.

GuillaumeGomez · 2024-12-20T23:19:29Z

I don't think the approach taken here (or manually implementing serde::Serialize for internal structs) is a good one.

It means some of the internal structs are actually part of the public API, but the exact relation is unclear in source code.

I think the right way to structure this is to do what --output-format json does and keep the type definitions for (de)serialization very separate from the all rustc_*/rustdoc internal types, and have them in there own module/crate. This achieves:
* It's very clear when a PR adjusts the schema (whereas at the moment, that's not super apparent)

* We can just use `#[derive(Serialize, Deserialize)]` on the module (rather than having to implement manually)

* We can republish the module onto crates.io, so it's easy for (non-kernel) rust users (eg nextest, etc)

* We can the modules api-docs to describe the schema.

I think you're right, I took the easiest way which has the advantage to always keep the output up-to-date since it uses the same types for the code generation. I'll split it out.

Some other things:

* There should be an option (probably the default) to output to a file.

I'm not convinced that adding an option to output to a file is needed. I have these two cases in mind:

From a shell you redirect stdout.
From a program you can just get the stdout once again and then do what you need with it.

Do you think another case I didn't think about maybe?

* There should be a `FORMAT_VERSION` field at the root of the output. This was super critical for rustdoc-json to be able to evolve without breaking users in a really terrible way.

Good catch, gonna add it, thanks!

aDotInTheVoid · 2024-12-21T01:10:08Z

I took the easiest way which has the advantage to always keep the output up-to-date since it uses the same types for the code generation.

I think the nicest way to get this is to unpack the “internal” types exhaustively in the internal->public conversion. This way you’re forced to think about the public repr when internals change, but they’re still seperate and easy to track.

Do you think another case I didn't think about maybe?

I think the big one is build-systems, which are much more used to chaining commands together via files than stdout.

bors · 2024-12-21T04:55:08Z

☔ The latest upstream changes (presumably #134590) made this pull request unmergeable. Please resolve the merge conflicts.

Add --extract-doctests command-line flag

5afc1c5

rustbot assigned notriddle Dec 19, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Dec 19, 2024

GuillaumeGomez mentioned this pull request Dec 19, 2024

Tracking issue for rustdoc --extract-doctests command-line flag #134529

Open

2 tasks

ojeda mentioned this pull request Dec 19, 2024

Rust unstable features needed for the kernel Rust-for-Linux/linux#2

Open

89 tasks

rustbot added the A-rust-for-linux Relevant for the Rust-for-Linux project label Dec 19, 2024

This comment has been minimized.

Sign in to view

GuillaumeGomez force-pushed the extract-doctests branch from ff979fd to 0b006d4 Compare December 19, 2024 19:22

This comment has been minimized.

Sign in to view

GuillaumeGomez force-pushed the extract-doctests branch from 0b006d4 to 486acda Compare December 20, 2024 11:25

This comment has been minimized.

Sign in to view

GuillaumeGomez added 2 commits December 20, 2024 14:47

Add UI test for --extra-doctests command-line flag

5adedd7

Add documentation for `--extract-doctests

67a0fef

GuillaumeGomez force-pushed the extract-doctests branch from 486acda to 67a0fef Compare December 20, 2024 13:47

This comment has been minimized.

Sign in to view

Update run-make/rustdoc-default-output test

651988d

rustbot added the A-run-make Area: port run-make Makefiles to rmake.rs label Dec 20, 2024

notriddle requested changes Dec 20, 2024

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 20, 2024

aDotInTheVoid self-assigned this Dec 20, 2024

aDotInTheVoid self-requested a review December 20, 2024 22:51

aDotInTheVoid reviewed Dec 20, 2024

View reviewed changes

tgross35 mentioned this pull request Dec 28, 2024

Add support for doctests nextest-rs/nextest#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rustdoc] Add `--extract-doctests` command-line flag #134531

[rustdoc] Add `--extract-doctests` command-line flag #134531

GuillaumeGomez commented Dec 19, 2024

ojeda commented Dec 19, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rustbot commented Dec 20, 2024

notriddle Dec 20, 2024

GuillaumeGomez Dec 20, 2024

notriddle Dec 20, 2024

GuillaumeGomez Dec 20, 2024

aDotInTheVoid commented Dec 20, 2024

aDotInTheVoid Dec 20, 2024

GuillaumeGomez Dec 20, 2024

GuillaumeGomez Dec 20, 2024

ojeda Dec 21, 2024

ojeda Dec 21, 2024

GuillaumeGomez commented Dec 20, 2024

aDotInTheVoid commented Dec 21, 2024

bors commented Dec 21, 2024

[rustdoc] Add --extract-doctests command-line flag #134531

Are you sure you want to change the base?

[rustdoc] Add --extract-doctests command-line flag #134531

Conversation

GuillaumeGomez commented Dec 19, 2024

ojeda commented Dec 19, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rustbot commented Dec 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aDotInTheVoid commented Dec 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuillaumeGomez commented Dec 20, 2024

aDotInTheVoid commented Dec 21, 2024

bors commented Dec 21, 2024

[rustdoc] Add `--extract-doctests` command-line flag #134531

[rustdoc] Add `--extract-doctests` command-line flag #134531