Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse options #1763

Merged
merged 10 commits into from
Nov 3, 2023
Merged

Parse options #1763

merged 10 commits into from
Nov 3, 2023

Conversation

kddnewton
Copy link
Collaborator

@kddnewton kddnewton commented Nov 2, 2023

This PR introduces the concept of parse options which are common to all of our APIs. Consequentially, this changes a ton of APIs. I'm sorry about the churn. I'm hopefully that this is the last public API change before Ruby 3.3.0. The options that it introduces can be fed through all of the Ruby APIs, the C APIs, and the FFI APIs. I'll show how below.

First off, here are the APIs that are impacted:

  • Prism::parse(source, **options)
  • Prism::parse_file(filepath, **options)
  • Prism::lex(source, **options)
  • Prism::lex_compat(source, **options)
  • Prism::lex_file(filepath, **options)
  • Prism::parse_lex(source, **options)
  • Prism::parse_lex_file(filepath, **options)
  • Prism::dump(source, **options)
  • Prism::dump_file(filepath, **options)
  • Prism::parse_comments(source, **options)
  • Prism::parse_file_comments(filepath, **options)

The options that are supported:

  • filepath - the filepath of the source being parsed. This should be a string or nil.
  • encoding - the encoding of the source being parsed. This should be an encoding or nil.
  • line - the line number that the parse starts on. This should be an integer or nil. Note that this is 1-indexed.
  • frozen_string_literal - whether or not the frozen string literal pragma has been set. This should be a boolean or nil.
  • suppress_warnings - whether or not warnings should be suppressed. This should be a boolean or nil.
  • scopes - the locals that are in scope surrounding the code that is being parsed. This should be an array of arrays of symbols or nil.

In C, this means a couple of APIs have changed:

-PRISM_EXPORTED_FUNCTION void pm_parser_init(pm_parser_t *parser, const uint8_t *source, size_t size, const char *filepath);
+PRISM_EXPORTED_FUNCTION void pm_parser_init(pm_parser_t *parser, const uint8_t *source, size_t size, const pm_options_t *options);

-PRISM_EXPORTED_FUNCTION void pm_parse_serialize(const uint8_t *source, size_t size, pm_buffer_t *buffer, const char *metadata);
+PRISM_EXPORTED_FUNCTION void pm_serialize_parse(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data);

-PRISM_EXPORTED_FUNCTION void pm_lex_serialize(const uint8_t *source, size_t size, const char *filepath, pm_buffer_t *buffer);
+PRISM_EXPORTED_FUNCTION void pm_serialize_lex(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data);

-PRISM_EXPORTED_FUNCTION void pm_parse_lex_serialize(const uint8_t *source, size_t size, pm_buffer_t *buffer, const char *metadata);
+PRISM_EXPORTED_FUNCTION void pm_serialize_parse_lex(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data);

-PRISM_EXPORTED_FUNCTION void pm_parse_serialize_comments(const uint8_t *source, size_t size, pm_buffer_t *buffer, const char *metadata);
+PRISM_EXPORTED_FUNCTION void pm_serialize_parse_comments(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data);

Written another way:

  • pm_parser_init now accepts a const pm_options_t *. This is a pointer to a (usually stack-allocated) options struct that contains all of the values that we accept. You can see the docs/diff to see how to set one up. The pointer can be NULL, so if you were never passing anything to this function you don't need to change anything. If you were only passing a filepath, then you can get there with:
pm_options_t options = { 0 };
pm_options_filepath_set(&options, filepath);
  • pm_serialize_parse, pm_serialize_lex, pm_serialize_parse_lex, and pm_serialize_parse_comments - these functions all had different signatures/naming conventions. Some of them accepted data, some didn't. It was very confusing. Now all of their signatures line up, such that they accept buffer, source, size, and data. data is an optional pointer to a serialized options struct. For how to serialize you can check the docs or see how ffi.rb does it. I'll copy it in here for clarity as well:
# bytes field
4 the length of the filepath
... the filepath bytes
4 the line number
4 the length the encoding
... the encoding bytes
1 frozen string literal
1 suppress warnings
4 the number of scopes
... the scopes

Each scope is layed out as follows:

# bytes field
4 the number of locals
... the locals

Each local is layed out as follows:

# bytes field
4 the length of the local
... the local bytes

Again, sorry for this churn. But hopefully this is the last interface change for a while. And if we need to add more options, we now have the capability to do that without impacting other APIs since they are now consolidated.

I wasn't entirely sure how Java wanted to handle the start line option, so I've left it on the Nodes.Source object so that callers will have visibility. If we want to do more with it we can do it in follow-up PRs.

Fixes #1639
Fixes #821

cc @enebo @eregon @seven1m

Comment on lines +23 to +24
pm_options_line_set(pm_options_t *options, uint32_t line) {
options->line = line;
Copy link
Member

@eregon eregon Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the start line can be negative and this is used in ERB, other template engines, etc so they can inject extra lines at the start without affecting the line number of the user source:
https://github.com/ruby/erb/blob/8db8b8b5086713175523f7cab4bc8b779f069709/lib/erb.rb#L468
So this should be int32_t line.
-> #1783

this.lineOffsets = lineOffsets;
}

public void setStartLine(int startLine) {
assert startLine >= 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert should be removed, see the other comment

@eregon
Copy link
Member

eregon commented Nov 7, 2023

I wasn't entirely sure how Java wanted to handle the start line option, so I've left it on the Nodes.Source object so that callers will have visibility. If we want to do more with it we can do it in follow-up PRs.

That's perfect. In TruffleRuby we just keep a line offset per source as Truffle APIs do not support negative line numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants