RFC - `format` implementation #32

reubent · 2021-01-30T10:58:32Z

The implementation of format is cool but probably overcomplex for what it actually does. There's also a TODO to add positional parameters.

I've written a (much) simpler implementation that supports positional parameters. You'll find it in format_test.go as quickformatBuiltin at the moment. I've also written a series of tests and benchmarks for both implementations.

As it stands, there is code there to support unmatched brackets, but the more I think about it, the less desirable I think that actually is. Mistakes are obvious because the number of brackets won't match the number of parameters and this gives a way to use a literal curly bracket. Considering the performance impact of it as well, this seems reasonable.

With unmatched checking turned off, performance for this version is around 20% better for simple strings:

BenchmarkQuickFormatString
BenchmarkQuickFormatString-8                      	  727240	      1533 ns/op

versus

BenchmarkFormatString
BenchmarkFormatString-8                           	  631561	      1887 ns/op

With a lot of tokens, the gap is a lot wider:

BenchmarkQuickFormatStringWithATonOfTokens
BenchmarkQuickFormatStringWithATonOfTokens-8      	  116503	     10249 ns/op

versus

BenchmarkFormatStringWithATonOfTokens
BenchmarkFormatStringWithATonOfTokens-8           	   69565	     16725 ns/op

Performance is marginally slower with positional tokens, but not a lot:

BenchmarkQuickFormatStringWithPositionalToken
BenchmarkQuickFormatStringWithPositionalToken-8   	  727245	      1666 ns/op

Turning on the simple bad token check, performance drops to:

BenchmarkQuickFormatString
BenchmarkQuickFormatString-8                      	  599984	      1723 ns/op
BenchmarkQuickFormatStringWithATonOfTokens
BenchmarkQuickFormatStringWithATonOfTokens-8      	   88561	     13319 ns/op

and if I use the "comprehensive" check:

BenchmarkQuickFormatString
BenchmarkQuickFormatString-8                      	  380971	      2916 ns/op
BenchmarkQuickFormatStringWithATonOfTokens
BenchmarkQuickFormatStringWithATonOfTokens-8      	   18986	     63204 ns/op

bmatsuo-at-luthersystems

Interesting benchmarks. The existing format-string does seem a bit excessive given that it hasn't really needed any extension since it was written. So I'd be open to a simplification.

I think there are some cut corners with your replacement functions. I would be interested to know what the benchmark numbers would look like if those issues were to be addressed.

bmatsuo-at-luthersystems · 2021-02-25T04:57:29Z

lisp/format_test.go

+	if format.Type != LString {
+		return env.Errorf("first argument is not a string")
+	}
+	pattern := formatTokenRegexp.ReplaceAllString(strings.ReplaceAll(strings.ReplaceAll(format.Str, "%", "%%"), "{}", "%s"), "%[$1]s")


My brain hurts a bit trying to parse this. This also doesn't catch syntax errors at all (e.g. unclosed curly brace).

bmatsuo-at-luthersystems · 2021-02-25T05:00:16Z

lisp/format_test.go

+func quickformatBuiltin(env *LEnv, args *LVal) *LVal {
+	format := args.Cells[0]
+	//if formatBadTokenPattern.MatchString(format.Str) {
+	//return env.Errorf("unmatched token")


I think there should be a test/error for unmatched tokens. The existing format-string does this.

Ah. Maybe this was trying to catch unclosed curly braces? What I meant was that format-string will report an error if you don't supply enough values.

elps> (format-string "hello {}, it's {} to meet you" "world") stdin:1: lisp:format-string: too many formatting direcives for supplied values Stack Trace [1 frames -- entrypoint last]: height 0: stdin:1: lisp:format-string

This should do that - it's just fmt.Sprintf inside, so it'll handle it the same way that Go does with a warning in the output.

Right. fmt just outputs junk when there's a mismatch between argument and substitutions. But I don't think that is very good behavior for a language like lisp. I think a runtime error is preferable.

The annoying thing about the fmt package is that format argument mismatches can literally go unnoticed for onths or years. I think they chose to do that instead of forcing programmers to check errors for every formatting call. I've never really seen another argument for why formatting should work that way (though I guess I haven't looked hard if you had one).

In dynamic languages like python and ruby format argument mismatches cause runtime exceptions. Even if it's not completely preferable, I think it's more expected of a language like elps.

Don't merge- just an RFC on format string

b602f54

bmatsuo-at-luthersystems suggested changes Feb 25, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC - `format` implementation #32

RFC - `format` implementation #32

reubent commented Jan 30, 2021

bmatsuo-at-luthersystems left a comment

bmatsuo-at-luthersystems Feb 25, 2021

bmatsuo-at-luthersystems Feb 25, 2021

bmatsuo-at-luthersystems Feb 25, 2021

reubent Mar 9, 2021

bmatsuo-at-luthersystems Mar 12, 2021 •

edited

Loading

RFC - format implementation #32

Are you sure you want to change the base?

RFC - format implementation #32

Conversation

reubent commented Jan 30, 2021

bmatsuo-at-luthersystems left a comment

Choose a reason for hiding this comment

bmatsuo-at-luthersystems Feb 25, 2021

Choose a reason for hiding this comment

bmatsuo-at-luthersystems Feb 25, 2021

Choose a reason for hiding this comment

bmatsuo-at-luthersystems Feb 25, 2021

Choose a reason for hiding this comment

reubent Mar 9, 2021

Choose a reason for hiding this comment

bmatsuo-at-luthersystems Mar 12, 2021 • edited Loading

Choose a reason for hiding this comment

RFC - `format` implementation #32

RFC - `format` implementation #32

bmatsuo-at-luthersystems Mar 12, 2021 •

edited

Loading