Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter enhancement of JSDoc comments with a JSDoc parser #4310

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

RunDevelopment
Copy link
Contributor

Resolves #4223.
Resolves #1847.

This PR improves the type information is added to JSDoc comments. Instead of just adding @param and @returns tags to existing JSDoc comments, WBG will now parse the existing comment and add type information to existing tags (if any). This makes skip_jsdoc virtually unnecessary, since WBG won't generate tag that conflict with existing tags anymore. (Note that skip_jsdoc still has niche use cases.)

Example:

/// Some comment
/// @param a This is the first parameter.
#[wasm_bindgen]
pub fn foo(a: u32, b: &str) -> u32 { ... }
/**
 * Some comment
 * @param {number} a This is the first parameter.
 * @param {string} b
 * @returns {number}
 */
export function foo(a, b) { ... }

This behavior can be turned off with skip_jsdoc

/// Some comment
/// @param a This is the first parameter.
#[wasm_bindgen]
pub fn foo(a: u32, b: &str) -> u32 { ... }
/**
 * Some comment
 * @param a This is the first parameter.
 */
export function foo(a, b) { ... }

With the new smarter behavior, users documenting their code with standard JSDoc/TSDoc can now use @param/@returns tags without having WBG mess it up or having to using skip_jsdoc and duplicating the type info. WBG's JSDoc comment enhancement now works together with our users.

Moving forward, I believe that we should teach people to WBG will automatically add type information for them, and they should not manually add type info to @param and @returns tags at all. This has several advantages, one being that users don't get any ideas about trying to declare custom types as @param {1 | 2 | 3} arg, which does not affect the TS types. Once we have a way for users to declare custom TS type annotations, they will just work with the new JSDoc enhancements.


The main complexity of this PR is:

  1. The new JSDoc parser. I made a custom JSDoc parser in around 600 LoC (including comments).
  2. Functionality for modifying the parsed JSDoc AST. This is the logic for actually adding type info to existing JSDoc and is around 150 LoC (including comments).
  3. Testing the parser. The parser needs to handle a lot of edge cases, so I added a new snapshot test for it.

The parser is written in a way that makes any string valid JSDoc. This can also be seen by the signature of the parsing function: fn parse(comment: &str) -> JsDoc. The doc comment on the function explains how this is done. In generally, I extensively documented the parser. If you have any question regarding how the parser works, the answer is probably in a comment.

For the tags that are supported, the implementation is fairly complete. The parser supports all sorts of mean constructs like multiple-line TS types, comments in TS types, default values, and argument paths (e.g. @param {number} arg0.age). So it's fairly complete for what it supports. See the snapshot test for many more examples.

Copy link
Collaborator

@daxpedda daxpedda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can safely say I can't maintain this code. Just reviewing it so far has taken me a not-insignificant amount of time.

I've been thinking about it quite a lot by now and I can safely say I don't feel comfortable at all landing this.

Would love to land this if:

  • The parser is in a maintained and trusted external crate. Just remembered oxc, which gives me an orders of magnitude better impression so far than SWC. If you can do the research I'm happy to look into and make a decision on if a specific dependency is acceptable.
  • It is opt-in, preferably via an attribute (alternatively I proposed just a line saying wbg::jsdoc-start or something like that). Again: I'm really not a fan of parsing doc comments by default. But I'm happy to discuss this point again and be convinced otherwise.

If this doesn't work out, the easy alternative is still to support third-party crates via a bunch of WBG attributes. Definitely the less appealing way.

@daxpedda daxpedda marked this pull request as draft December 6, 2024 21:06
@RunDevelopment
Copy link
Contributor Author

That's a shame.

Just remembered oxc, which gives me an orders of magnitude better impression so far than SWC. If you can do the research I'm happy to look into and make a decision on if a specific dependency is acceptable.

To repeat my previous findings: There are essentially 3 packages on crates.io that need to be considered (all others aren't JSDoc parsers):

  • jsdoc. This is the swc dependency I was talking about before. The problem with it is that it pulls in 9 MB of transitive deps according to libs.rs.
  • tree-sitter-jsdoc. This is a plugin for tree-sitter, so I don't understand the AST format and the crate requires a C compiler.
  • doctor. The AST format is too simplistic for our use case.

While oxc itself is a JS toolbox(?), I did miss that it also has a JSDoc parser. The reason I missed it is that the JSDoc isn't part of oxc's AST/parser crates, but oxc_semantic, which deals with the semantic analysis of JS code. And befitting of that task, it pulls in an appropriate amount of transitive dependencies, since analyzing JS is a much harder task than just parsing a bit of JSDoc.

Anyway, the JSDoc parser itself seems usable. I did some tests, and it's not 100% correct when parsing type expressions, but it should be good enough. The only issue is that their JSDoc AST is immutable, so we'll have to implement the modifying and printing ourselves.

So yeah, that's the state of JSDoc parsing in the Rust ecosystem. The choice is either jsdoc from swc or oxc_semantic from oxc. Nothing else meets our requirements. So please make a decision.

It is opt-in, preferably via an attribute [...]

As I said before, the whole point is to change the default of WBG just adding @param/@returns tags blindly, causing it to potentially create nonsensical JSDoc. Put bluntly, the current default kinda sucks. So making it opt-in would be completely beside the point.

If this doesn't work out, the easy alternative is still to support third-party crates via a bunch of WBG attributes. Definitely the less appealing way.

As I explained before, this is not possible. Only the CLI knows the final return/parameter types. No matter what attributes we may add, only the CLI has the necessary information to perform this task. There is no alternative.

@daxpedda
Copy link
Collaborator

daxpedda commented Dec 7, 2024

So yeah, that's the state of JSDoc parsing in the Rust ecosystem. The choice is either jsdoc from swc or oxc_semantic from oxc. Nothing else meets our requirements. So please make a decision.

I've already made a decision on swc and its a no.
oxc_semantic is too heave for us. Do you think it can be separated from oxc_semantic into its own crate so its more lightweight? Could you tell me more about what needs to be done on wasm-bindgens side to make this work for us? If it defeats the point, can we contribute to oxc to make it work for us?

It is opt-in, preferably via an attribute [...]

As I said before, the whole point is to change the default of WBG just adding @param/@returns tags blindly, causing it to potentially create nonsensical JSDoc. Put bluntly, the current default kinda sucks. So making it opt-in would be completely beside the point.

We have already discussed this, I deemed this a non-goal for WBG. As I said many times before: our target audience are developers who want to builds tools on top of wasm-bindgen for other end users. Not end users directly. Convenience will always be an important goal, but a secondary one.

In this case I have reasons I laid out already why I think this isn't an ideal situation to do by default. Which I find more important then convenience in this case.

If the point is not convenience, then I apparently missed the point.

If this doesn't work out, the easy alternative is still to support third-party crates via a bunch of WBG attributes. Definitely the less appealing way.

As I explained before, this is not possible. Only the CLI knows the final return/parameter types. No matter what attributes we may add, only the CLI has the necessary information to perform this task. There is no alternative.

The last time we have discussed this we established that this is possible via attributes. From your comment here I think there is a misunderstanding.

I'm talking about a WBG-CLI implementation. The JSDoc is just passed via attributes instead of parsing Rustdoc comments. A third-party crate can parse the JSDoc in the Rustdoc and fill in these attributes instead of WBG doing it. The difference is that attributes can be split up into attributes on arguments, a js_doc_return attribute on the function, and so on. This way WBG doesn't have to parse JSDoc because the attributes act as a form of AST already.

If I'm missing some specific interaction that can not be support this way then please elaborate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants