Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rustdoc] Add --extract-doctests command-line flag #134531

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

GuillaumeGomez
Copy link
Member

Part of #134529.

It was discussed with the Rust-for-Linux project recently that they needed a way to extract doctests so they can modify them and then run them more easily (look for "a way to extract doctests" here).

For now, I output most of ScrapedDoctest fields in JSON format with serde_json. So it outputs the following information:

  • filename
  • line
  • langstr
  • text

cc @ojeda
r? @notriddle

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Dec 19, 2024
@ojeda
Copy link
Contributor

ojeda commented Dec 19, 2024

That was quick, thanks a lot!

@rustbot label A-rust-for-linux

@rustbot rustbot added the A-rust-for-linux Relevant for the Rust-for-Linux project label Dec 19, 2024
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added the A-run-make Area: port run-make Makefiles to rmake.rs label Dec 20, 2024
@rustbot
Copy link
Collaborator

rustbot commented Dec 20, 2024

This PR modifies tests/run-make/. If this PR is trying to port a Makefile
run-make test to use rmake.rs, please update the
run-make port tracking issue
so we can track our progress. You can either modify the tracking issue
directly, or you can comment on the tracking issue and link this PR.

cc @jieyouxu

Using this flag looks like this:

```bash
$ rustdoc -Zunstable-options --extract-doctests src/lib.rs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a CLI option and not an output format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I didn't think about it, it's SO MUCH better!

Comment on lines +675 to +679
* File where they are located.
* Line where they are located.
* Codeblock attributes (more information about this [here](./write-documentation/documentation-tests.html#attributes)).

The output format is JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not enough documentation, even for an unstable feature. You need to say what the actual keys are, and what they do (particularly the subkeys of langstr, which are not self-explanatory). Please include a prettified example JSON document with all of the format features used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure if we want everything to be documented as docblock attributes list might get longer. Well, I'll list the current one and we can think about it later.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 20, 2024
@aDotInTheVoid
Copy link
Member

I don't think the approach taken here (or manually implementing serde::Serialize for internal structs) is a good one.

It means some of the internal structs are actually part of the public API, but the exact relation is unclear in source code.

I think the right way to structure this is to do what --output-format json does and keep the type definitions for (de)serialization very separate from the all rustc_*/rustdoc internal types, and have them in there own module/crate. This achieves:

  • It's very clear when a PR adjusts the schema (whereas at the moment, that's not super apparent)
  • We can just use #[derive(Serialize, Deserialize)] on the module (rather than having to implement manually)
  • We can republish the module onto crates.io, so it's easy for (non-kernel) rust users (eg nextest, etc)
  • We can the modules api-docs to describe the schema.

Some other things:

  • There should be an option (probably the default) to output to a file.
  • There should be a FORMAT_VERSION field at the root of the output. This was super critical for rustdoc-json to be able to evolve without breaking users in a really terrible way.

@aDotInTheVoid aDotInTheVoid self-assigned this Dec 20, 2024
@aDotInTheVoid aDotInTheVoid self-requested a review December 20, 2024 22:51
//@ compile-flags:-Z unstable-options --extract-doctests
//@ normalize-stdout-test: "tests/rustdoc-ui" -> "$$DIR"
//@ check-pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want a test for how this interacts with:

  • Implicit/explicit fn main()
  • Hidding lines with /// # use some::path;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True although currently it's the code as defined in the documentation and no changes on rustdoc side (which I think matches better what rust for linux wants).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should provide another field with "rustdoc computed code" where hidden lines and main function wrapping are added by rustdoc. Actually that sounds like a very good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True although currently it's the code as defined in the documentation and no changes on rustdoc side (which I think matches better what rust for linux wants).

Currently we use both hidden lines and the ? support, i.e. the # Ok::<..., ...>(...) syntax (which implies the fn main(), from what I understand).

Some of that post-processing may be easy to do by users, like the hidden lines I assume, but if you walk the source or IR or similar to figure out details (like the crate attributes that you move out of main()), then it may be harder for end users to replicate that properly without a hack. Perhaps it may be possible to export what rustdoc figured about the test, so that users can replicate the post-processing on their side, but customized to their environment.

Having the rustdoc computed code sounds fine since it may be enough for some use cases, but e.g. we currently need to convert on the fly the .unwrap() in the ?-using tests into a custom assert!.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for instance, if rustdoc tells us "this test uses ?", "these are the crate attributes I would have moved", etc., then users may be able to easily and reliably construct their own wrappers and e.g. do a check instead of an .unwrap().

Of course, for things like crate attributes, it may be best to remove them so that the end user can re-add them where needed, so it wouldn't be the completely "unaltered" source code. But it could be an "adapted" version. Hidden lines should probably be normal lines in that adapted version.

Perhaps it makes sense to provide all those versions, i.e. the completely unaltered one for those that may want to do something complex or to render the text somewhere, the "adapted" version for customized test environments and the rustdoc computed code for those that can use directly that. In the kernel we would use the "adapted" one, only, I think.

@GuillaumeGomez
Copy link
Member Author

I don't think the approach taken here (or manually implementing serde::Serialize for internal structs) is a good one.

It means some of the internal structs are actually part of the public API, but the exact relation is unclear in source code.

I think the right way to structure this is to do what --output-format json does and keep the type definitions for (de)serialization very separate from the all rustc_*/rustdoc internal types, and have them in there own module/crate. This achieves:

* It's very clear when a PR adjusts the schema (whereas at the moment, that's not super apparent)

* We can just use `#[derive(Serialize, Deserialize)]` on the module (rather than having to implement manually)

* We can republish the module onto crates.io, so it's easy for (non-kernel) rust users (eg nextest, etc)

* We can the modules api-docs to describe the schema.

I think you're right, I took the easiest way which has the advantage to always keep the output up-to-date since it uses the same types for the code generation. I'll split it out.

Some other things:

* There should be an option (probably the default) to output to a file.

I'm not convinced that adding an option to output to a file is needed. I have these two cases in mind:

  1. From a shell you redirect stdout.
  2. From a program you can just get the stdout once again and then do what you need with it.

Do you think another case I didn't think about maybe?

* There should be a `FORMAT_VERSION` field at the root of the output. This was super critical for rustdoc-json to be able to evolve without breaking users in a really terrible way.

Good catch, gonna add it, thanks!

@aDotInTheVoid
Copy link
Member

I took the easiest way which has the advantage to always keep the output up-to-date since it uses the same types for the code generation.

I think the nicest way to get this is to unpack the “internal” types exhaustively in the internal->public conversion. This way you’re forced to think about the public repr when internals change, but they’re still seperate and easy to track.

Do you think another case I didn't think about maybe?

I think the big one is build-systems, which are much more used to chaining commands together via files than stdout.

@bors
Copy link
Contributor

bors commented Dec 21, 2024

☔ The latest upstream changes (presumably #134590) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-run-make Area: port run-make Makefiles to rmake.rs A-rust-for-linux Relevant for the Rust-for-Linux project S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants