-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC - format
implementation
#32
base: main
Are you sure you want to change the base?
RFC - format
implementation
#32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting benchmarks. The existing format-string does seem a bit excessive given that it hasn't really needed any extension since it was written. So I'd be open to a simplification.
I think there are some cut corners with your replacement functions. I would be interested to know what the benchmark numbers would look like if those issues were to be addressed.
if format.Type != LString { | ||
return env.Errorf("first argument is not a string") | ||
} | ||
pattern := formatTokenRegexp.ReplaceAllString(strings.ReplaceAll(strings.ReplaceAll(format.Str, "%", "%%"), "{}", "%s"), "%[$1]s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My brain hurts a bit trying to parse this. This also doesn't catch syntax errors at all (e.g. unclosed curly brace).
func quickformatBuiltin(env *LEnv, args *LVal) *LVal { | ||
format := args.Cells[0] | ||
//if formatBadTokenPattern.MatchString(format.Str) { | ||
//return env.Errorf("unmatched token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a test/error for unmatched tokens. The existing format-string does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Maybe this was trying to catch unclosed curly braces? What I meant was that format-string will report an error if you don't supply enough values.
elps> (format-string "hello {}, it's {} to meet you" "world")
stdin:1: lisp:format-string: too many formatting direcives for supplied values
Stack Trace [1 frames -- entrypoint last]:
height 0: stdin:1: lisp:format-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should do that - it's just fmt.Sprintf
inside, so it'll handle it the same way that Go does with a warning in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. fmt just outputs junk when there's a mismatch between argument and substitutions. But I don't think that is very good behavior for a language like lisp. I think a runtime error is preferable.
The annoying thing about the fmt package is that format argument mismatches can literally go unnoticed for onths or years. I think they chose to do that instead of forcing programmers to check errors for every formatting call. I've never really seen another argument for why formatting should work that way (though I guess I haven't looked hard if you had one).
In dynamic languages like python and ruby format argument mismatches cause runtime exceptions. Even if it's not completely preferable, I think it's more expected of a language like elps.
The implementation of
format
is cool but probably overcomplex for what it actually does. There's also aTODO
to add positional parameters.I've written a (much) simpler implementation that supports positional parameters. You'll find it in
format_test.go
asquickformatBuiltin
at the moment. I've also written a series of tests and benchmarks for both implementations.As it stands, there is code there to support unmatched brackets, but the more I think about it, the less desirable I think that actually is. Mistakes are obvious because the number of brackets won't match the number of parameters and this gives a way to use a literal curly bracket. Considering the performance impact of it as well, this seems reasonable.
With unmatched checking turned off, performance for this version is around 20% better for simple strings:
versus
With a lot of tokens, the gap is a lot wider:
versus
Performance is marginally slower with positional tokens, but not a lot:
Turning on the simple bad token check, performance drops to:
and if I use the "comprehensive" check: