Parse options #1763

kddnewton · 2023-11-02T22:11:24Z

This PR introduces the concept of parse options which are common to all of our APIs. Consequentially, this changes a ton of APIs. I'm sorry about the churn. I'm hopefully that this is the last public API change before Ruby 3.3.0. The options that it introduces can be fed through all of the Ruby APIs, the C APIs, and the FFI APIs. I'll show how below.

First off, here are the APIs that are impacted:

Prism::parse(source, **options)
Prism::parse_file(filepath, **options)
Prism::lex(source, **options)
Prism::lex_compat(source, **options)
Prism::lex_file(filepath, **options)
Prism::parse_lex(source, **options)
Prism::parse_lex_file(filepath, **options)
Prism::dump(source, **options)
Prism::dump_file(filepath, **options)
Prism::parse_comments(source, **options)
Prism::parse_file_comments(filepath, **options)

The options that are supported:

filepath - the filepath of the source being parsed. This should be a string or nil.
encoding - the encoding of the source being parsed. This should be an encoding or nil.
line - the line number that the parse starts on. This should be an integer or nil. Note that this is 1-indexed.
frozen_string_literal - whether or not the frozen string literal pragma has been set. This should be a boolean or nil.
suppress_warnings - whether or not warnings should be suppressed. This should be a boolean or nil.
scopes - the locals that are in scope surrounding the code that is being parsed. This should be an array of arrays of symbols or nil.

In C, this means a couple of APIs have changed:

-PRISM_EXPORTED_FUNCTION void pm_parser_init(pm_parser_t *parser, const uint8_t *source, size_t size, const char *filepath);
+PRISM_EXPORTED_FUNCTION void pm_parser_init(pm_parser_t *parser, const uint8_t *source, size_t size, const pm_options_t *options);

-PRISM_EXPORTED_FUNCTION void pm_parse_serialize(const uint8_t *source, size_t size, pm_buffer_t *buffer, const char *metadata);
+PRISM_EXPORTED_FUNCTION void pm_serialize_parse(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data);

-PRISM_EXPORTED_FUNCTION void pm_lex_serialize(const uint8_t *source, size_t size, const char *filepath, pm_buffer_t *buffer);
+PRISM_EXPORTED_FUNCTION void pm_serialize_lex(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data);

-PRISM_EXPORTED_FUNCTION void pm_parse_lex_serialize(const uint8_t *source, size_t size, pm_buffer_t *buffer, const char *metadata);
+PRISM_EXPORTED_FUNCTION void pm_serialize_parse_lex(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data);

-PRISM_EXPORTED_FUNCTION void pm_parse_serialize_comments(const uint8_t *source, size_t size, pm_buffer_t *buffer, const char *metadata);
+PRISM_EXPORTED_FUNCTION void pm_serialize_parse_comments(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data);

Written another way:

pm_parser_init now accepts a const pm_options_t *. This is a pointer to a (usually stack-allocated) options struct that contains all of the values that we accept. You can see the docs/diff to see how to set one up. The pointer can be NULL, so if you were never passing anything to this function you don't need to change anything. If you were only passing a filepath, then you can get there with:

pm_options_t options = { 0 };
pm_options_filepath_set(&options, filepath);

pm_serialize_parse, pm_serialize_lex, pm_serialize_parse_lex, and pm_serialize_parse_comments - these functions all had different signatures/naming conventions. Some of them accepted data, some didn't. It was very confusing. Now all of their signatures line up, such that they accept buffer, source, size, and data. data is an optional pointer to a serialized options struct. For how to serialize you can check the docs or see how ffi.rb does it. I'll copy it in here for clarity as well:

# bytes	field
`4`	the length of the filepath
...	the filepath bytes
`4`	the line number
`4`	the length the encoding
...	the encoding bytes
`1`	frozen string literal
`1`	suppress warnings
`4`	the number of scopes
...	the scopes

Each scope is layed out as follows:

# bytes	field
`4`	the number of locals
...	the locals

Each local is layed out as follows:

# bytes	field
`4`	the length of the local
...	the local bytes

Again, sorry for this churn. But hopefully this is the last interface change for a while. And if we need to add more options, we now have the capability to do that without impacting other APIs since they are now consolidated.

I wasn't entirely sure how Java wanted to handle the start line option, so I've left it on the Nodes.Source object so that callers will have visibility. If we want to do more with it we can do it in follow-up PRs.

Fixes #1639
Fixes #821

cc @enebo @eregon @seven1m

eregon · 2023-11-07T14:14:11Z

src/options.c

+pm_options_line_set(pm_options_t *options, uint32_t line) {
+    options->line = line;


Actually the start line can be negative and this is used in ERB, other template engines, etc so they can inject extra lines at the start without affecting the line number of the user source:
https://github.com/ruby/erb/blob/8db8b8b5086713175523f7cab4bc8b779f069709/lib/erb.rb#L468
So this should be int32_t line.
-> #1783

eregon · 2023-11-07T14:14:25Z

templates/java/org/prism/Nodes.java.erb

            this.lineOffsets = lineOffsets;
        }

+        public void setStartLine(int startLine) {
+            assert startLine >= 1;


This assert should be removed, see the other comment

eregon · 2023-11-07T14:17:02Z

I wasn't entirely sure how Java wanted to handle the start line option, so I've left it on the Nodes.Source object so that callers will have visibility. If we want to do more with it we can do it in follow-up PRs.

That's perfect. In TruffleRuby we just keep a line offset per source as Truffle APIs do not support negative line numbers.

kddnewton force-pushed the options branch from 061c8b2 to 8a8d7eb Compare November 2, 2023 22:21

kddnewton requested a review from jemmaissroff November 2, 2023 22:21

seven1m mentioned this pull request Nov 3, 2023

Eval locals natalie-lang/natalie#1426

Merged

kddnewton added 9 commits November 3, 2023 08:15

Create an options struct for passing all of the possible options

99e8161

Wire up options through the Ruby API

8582d37

Wire up options through the FFI API

f0aa8ad

Rename serialization APIs for consistency

5a2252e

Properly support the encoding option

4b538af

Properly support the suppress_warnings option

8422952

Properly support the start line option

33cc75a

Wire up options through the Java parser

13fa262

Wire up the options through JavaScript

81a9b28

kddnewton force-pushed the options branch from 8a8d7eb to 4f3a3e3 Compare November 3, 2023 12:24

Fix up lint

ed481b9

kddnewton force-pushed the options branch from 4f3a3e3 to ed481b9 Compare November 3, 2023 14:23

kddnewton merged commit ed8d1c0 into main Nov 3, 2023

kddnewton deleted the options branch November 3, 2023 14:25

eregon reviewed Nov 7, 2023

View reviewed changes

eregon mentioned this pull request Nov 7, 2023

Accept negative start line number #1783

Closed

kddnewton mentioned this pull request Nov 8, 2023

Start line number can be negative #1789

Closed

schneems mentioned this pull request Dec 1, 2023

Docs for lex_compat say result is an Array when actual result is ParseResult #1971

Closed

riseshia mentioned this pull request Dec 14, 2023

Update args of Prism.dump on document #2088

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse options #1763

Parse options #1763

kddnewton commented Nov 2, 2023 •

edited

Loading

eregon Nov 7, 2023 •

edited

Loading

eregon Nov 7, 2023

eregon commented Nov 7, 2023

		pm_options_line_set(pm_options_t *options, uint32_t line) {
		options->line = line;

Parse options #1763

Parse options #1763

Conversation

kddnewton commented Nov 2, 2023 • edited Loading

eregon Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

eregon Nov 7, 2023

Choose a reason for hiding this comment

eregon commented Nov 7, 2023

kddnewton commented Nov 2, 2023 •

edited

Loading

eregon Nov 7, 2023 •

edited

Loading