Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve command input recognition #805

Closed
wants to merge 4 commits into from
Closed

Improve command input recognition #805

wants to merge 4 commits into from

Conversation

st0012
Copy link
Member

@st0012 st0012 commented Dec 7, 2023

Currently, we simply split the input on whitespace and use the first word as the command name to check if it's a command. But this means that assigning a local variable which's name is the same as a command will also be recognized as a command.

For example, in the following case, info is recognized as a command:

irb(main):001> info = 123
`debug` command is only available when IRB is started with binding.irb
=> nil

This commit improves the command input recognition by using more sophisticated regular expressions.

Closes #803

@st0012 st0012 added the bug Something isn't working label Dec 7, 2023
@st0012 st0012 self-assigned this Dec 7, 2023
lib/irb.rb Outdated

if command_match
command_or_alias = command_match[:cmd_name]
arg = [command_match[:cmd_arg], command_match[:cmd_flag]].compact.join(' ')
Copy link
Member Author

@st0012 st0012 Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a follow up PR to see if extracting cmd_flag can simplify how commands implement options.

Update: #809

@st0012 st0012 requested a review from a team December 7, 2023 15:44
@tompng
Copy link
Member

tompng commented Dec 7, 2023

show_source =~ -s is a valid option for show_source command, and also a valid ruby code.
info = -1 will be still interpreted as command because it matches COMMAND_WITH_ARGS_AND_FLAGS_REGEXP.

I think info = 1 interpreted as command is consistent, not a bug but one of the possible design. I know it's surprising, so we need some workaround, but should be carefully done.

One idea is to exclude only these assignments

info = a # but accept =~ == ===
info += a # and other operators
info &&= a
info ||= a

Maybe we can check local variable existence just like ExtendCommandBundle.install_extend_commands checks for method existence

show_source == -s # command
show_source =~ -s # command
show_source = /regexp/ # ruby
show_source =~ -s # ruby (because show_source is local variable)
show_source == -s # ruby
info + 1 # command
info ||= 1 # ruby
info + 1 # ruby (because info is local variable)
info 1 # (ruby? 

@st0012
Copy link
Member Author

st0012 commented Dec 7, 2023

but should be carefully done.

Yeah I agree. But IMO the solution doesn't need to perfectly cover all cases. It only has to be slightly better than the current behaviour.

Also, sooner or later we need to do some kind of regexp match on input in order to make command options easier to support, instead of writing them in transform_args separately. With that in mind, if we can have a pure regexp solution on the problem, it'll greatly reduce the complexity.

To be clear, I'm not saying this PR is THE solution. I'd like to see how many cases it can support as well, so more cases are welcome 🙏

Maybe we can check local variable existence

I worry that this may make the behaviour harder to understand as the same input, like info, could behave differently depending on the context, which may not always obvious to users (like info's assignment happened 20 lines above the breakpoint).

@st0012
Copy link
Member Author

st0012 commented Dec 9, 2023

Capturing flags will regexp will enable refactoring like this: #809

@tompng
Copy link
Member

tompng commented Dec 9, 2023

I like #809 idea 👍

Although I have some concern about regexp including flag and args. It restricts not only command's implementation but also user's input. Input like show_source class method set_some_number 12 34 will be rejected.
show_source Klass method --s does not show Couldn't locate a definition for ... but raises undefined local variable or method `s'. The message wrongly indicates that command does not exist.
I think command should be recognized in another way, and using optparser/regexp or not can be controlled in each command implementation.

@tompng
Copy link
Member

tompng commented Dec 9, 2023

I agree local_variable defined or not will make implementation complicated and harder to understand the behavior. I think we can use a simple regexp to handle it.

I made a list of command-like ruby expressions. (maybe there are some missing pattern, but it should be a rare case)

Possible command-like style local variable assignment:

info = expr
info += expr # `*=` `&&=` `||=` and other
info ,other = expr

Possible command-like style local variable usage:

info
info # comment
info => pattern
info in pattern
info + expr # (+ - * ** / % ^ & | < <= <=> >= > << >> == =~ != !~ === && || and or)
info . method
info &. method
info ... range_end
info :: method
info :: Const
info [key]
info [key] = value
info [key] += value
info if expr
info ? expr : expr
info ; expr
info \ # (continue to next line)

Most of them are weird code, not realistic. I think rejecting only these patterns as command is enough.

info = expr
info += expr # *= &&= ||= etc.
info + expr # - * ** / % etc.

show_source =~ -s will be rejected with this rule but supporting show_source self.=~ might be enough.

@st0012
Copy link
Member Author

st0012 commented Dec 10, 2023

I think command should be recognized in another way

I personally expect us to inevitably come back to regexp for this, unless we're writing a custom parser for IRB commands, which is an overkill IMO.

and using optparser/regexp or not can be controlled in each command implementation.

I think to achieve good abstraction, command flags and arg(s) should be supplied to the command separately. And for consistency & ease of maintenance, I'd like to make command implementation only use optparser for flags in the beginning.

I made a list of command-like ruby expressions.

Thanks for listing them. I've updated the tests are it looks like these are newly supported by this PR:

info = expr
info += expr # `*=` `&&=` `||=` and other
info ,other = expr
info = expr
info += expr # *= &&= ||= etc.
info + expr # - * ** / % etc.
info if expr
info ? expr : expr
info ; expr
info # comment

@tompng
Copy link
Member

tompng commented Dec 10, 2023

How about something like this? It can be implemented with regexp.

if input.match?(/\A#{command_name_regexp}( |\z)/)
  # Maybe command
  if input.match?(/\A#{command_name_regexp} #{not_a_command_operators_regexp} /)
    # Not a command. `info = `, `info + `, `info *= ` will match. ruby code
  elsif input.match?(COMMAND_REGEXP)
    # Valid command
  else
    # Command with invalid option. not a ruby code.
    # I think warning message for this is important.
  end
end

@st0012
Copy link
Member Author

st0012 commented Dec 10, 2023

If you don't mind, I want to take a pause here and first make sure we're trying to find a solution to the same problem.

The currently implementation goes through these examination:

  1. Check the input matches either of these forms:
    • Simple command
    • Command with args
    • Command with flags
    • Command with args and flags
    • If doesn't match any of them, it's treated as Ruby code
    • If matches one of them, we check if the captured command name (or alias) actually is a registered command
    • If the command doesn't exist, then it's treated as Ruby code
    • If the command exists, then it's treated as command

If I understand your concern correctly, it's about we only treat input as either command or Ruby code, which will miss the opportunity to warn users about incorrect forms of commands?
Or it's about one of the steps I listed above?

@tompng
Copy link
Member

tompng commented Dec 11, 2023

it's about we only treat input as either command or Ruby code, which will miss the opportunity to warn users about incorrect forms of commands?

Yes, that's what I'm concerned about.

For example, when Foo#bar method exists,

irb(main):001> show_source Foo bar
Error: Couldn't locate a definition for Foo bar

This case, user can know that command exists and the usage is wrong.

irb(main):001> show_source Foo bar
(irb):1:in `<main>': undefined local variable or method `bar' for main:Object (NameError)

But in this case, user does not know what is wrong. maybe usage, maybe command name, or something else. I think this is a kind of degradation.
We already (maybe unintentionally) have a sort of warning for wrong command form, but this pull request drops it.

@st0012
Copy link
Member Author

st0012 commented Dec 12, 2023

I've made a change so that when show_source Foo bar is evaluated and caused an error, a warning would be printed:

# errors
        ... 2 levels...
The input `show_source Foo bar` was recognised as a Ruby expression, but it matched the name of the `show_source` command.
If you intended to run it as a command, please check if the syntax is correct.
irb(main):002> 

The same warning is not printed when there's no exception raised to avoid noise.

lib/irb.rb Outdated Show resolved Hide resolved
lib/irb.rb Outdated Show resolved Hide resolved
Currently, we simply split the input on whitespace and use the first
word as the command name to check if it's a command. But this means that
assigning a local variable which's name is the same as a command will
also be recognized as a command.

For example, in the following case, `info` is recognized as a command:

```
irb(main):001> info = 123
`debug` command is only available when IRB is started with binding.irb
=> nil
```

This commit improves the command input recognition by using more sophis-
ticated regular expressions.
…command

As we now proceed to stricter syntax checking for commands, incorrect
command input would be processed as Ruby instead and likely causes
errors. In that case, we should raise a warning to the user.

For example, if an input is `show_source Foo bar`, the user would see
a warning about the possible syntax error for using `show_source` command.

But with this approach, we also need add a condition for the `measure`
command as it's actually used as a method call with block.
@@ -66,6 +67,9 @@ def should_be_handled_by_debugger?
end

def evaluable_code
# Because measure command is used as a method, we need to treat it differently
return @code if @command == "measure"
Copy link
Member Author

@st0012 st0012 Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to avoid this once #624 is merged.

And we still need it even with #813 so we can print the warning on measure blocks.

@st0012 st0012 requested a review from tompng December 13, 2023 14:33
SIMPLE_COMMAND_REGEXP = /^#{COMMAND_NAME_REGEXP}\z/
COMMAND_WITH_ARGS_REGEXP = /^#{COMMAND_NAME_REGEXP} +#{COMMAND_ARG_REGEXP}\z/
COMMAND_WITH_FLAGS_REGEXP = /^#{COMMAND_NAME_REGEXP} +#{COMMAND_FLAG_REGEXP}\z/
COMMAND_WITH_ARGS_AND_FLAGS_REGEXP = /^#{COMMAND_NAME_REGEXP} +#{COMMAND_ARG_REGEXP} +#{COMMAND_FLAG_REGEXP}\z/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still matches to the second line. It should be /^/\A

Command (Last line matches to COMMAND_REGEXP)

show_source = 1..
2# Set#include?

Not a command (Both line does not match to COMMAND_REGEXP)

show_source = 1..
2 # Set#include?

@st0012
Copy link
Member Author

st0012 commented Dec 11, 2024

This is not needed anymore.

@st0012 st0012 closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Don't merge
Development

Successfully merging this pull request may close these issues.

Local variable assignment being incorrectly recognised as debugging commands
2 participants