-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split resolveMessage() into format() & formatToParts() #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction of this change. I recall being a bit confused by the resolveMessage()
API. (In retrospect, I should have filed an issue about it.) This PR addresses most of the reasons for my confusion and makes the API similar to what I imagined in my implementation experiment for MessageFormat 2 — but more complete with errors and resolution fallback. Some more early feedback below.
it resolves instead to the result of calling the `string` function | ||
with the value as input and no options. | ||
|
||
Otherwise, un-annotated values resolve to the following shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to suggest that format
and formatToParts
take a dictionary whose values are either unknown
or instances of MessageValue
. This allows two interesting use-cases:
- Pre-set certain formatting options on input values.
- Pass instances of custom types which extend
MessageValue
.
Then, you'd first check if an un-annotated value is an instance of MessageValue
. If so, carry on. If not, resolve to MessageUnknownValue
. This way, you don't have to try to resolve every expression to a number and string first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to suggest that
format
andformatToParts
take a dictionary whose values are eitherunknown
or instances ofMessageValue
. This allows two interesting use-cases:
- Pre-set certain formatting options on input values.
This is possible without making MessageValue
a fully-fledged class. To see what I mean, take a look at how the proposed behaviour of the :number
formatter constructs options.
- Pass instances of custom types which extend
MessageValue
.
What would be the benefit of allowing that to happen without a custom annotation on the expression? As proposed, it's possible to return custom MessageValue
extensions from a user-defined function, but it does require an annotation on the MF2 expression.
Then, you'd first check if an un-annotated value is an instance of
MessageValue
. If so, carry on. If not, resolve toMessageUnknownValue
.
How would you propose that we "check if an un-annotated value is an instance of MessageValue
"? As currently proposed, that's just an interface description for plain objects with some expected properties.
This way, you don't have to try to resolve every expression to a number and string first.
The number/string/unknown resolution that I think you're referring to here is what's applied with an unannotated expression, like {$foo}
. It's not applied when resolving e.g. {$foo :custom}
, which instead calls the custom
function that's defined in the constructor options, which is expected to return a MessageValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the benefit of allowing that to happen without a custom annotation on the expression? As proposed, it's possible to return custom MessageValue extensions from a user-defined function, but it does require an annotation on the MF2 expression.
"Require" isn't the best word here, because there's nothing we can do on the syntax or data model validation level to prevent such uses. So we're left with runtime errors or MessageUnknownValues
. Instead, I suggest we allow passing instances of MessageValues
as input variables, so that:
- they produce known
MessageParts
when used in a placeholder without an annotation, - they can be used as arguments and options to custom functions.
For example, it may be useful to be able to pass as input variables data that's not supposed to be formatted at all and instead be positioned in the correct spot inside the translation: see the WrappedValue
example from my implementation.
How would you propose that we "check if an un-annotated value is an instance of MessageValue"? As currently proposed, that's just an interface description for plain objects with some expected properties.
It should be a base class instead, and custom function authors should be able to subclass it for the purposes of their custom logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #24 to continue the discussion about whether MessageValue
should be subclassable, and if it should be allowed to be passed as input variables.
Co-authored-by: Stanisław Małolepszy <[email protected]>
#### Default Functions | ||
|
||
Two commonly used message functions `number` and `string` are provided as a starting point, | ||
and as handlers for placeholders without an annotation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and as handlers for placeholders without an annotation, | |
and as handlers for expressions without an annotation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is deliberate. Selectors without an annotation do not get a default handler.
Co-authored-by: Tim Chevalier <[email protected]>
The reviews from Tim and Stas are quite insightful and I don't think there's much more I can add here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thanks. I know its not from your PR but the onError
method would certainly make things tricky by making users employ two different error handling mechanisms. Could this behavior be replaced using regular JS error handling instead? Allowing people to catch
a subclassed "MessageFormatError" instead?
type: 'fallback'; | ||
value: undefined; | ||
locale: 'und'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it better to return something like null
that makes it obvious that it's not a useful value? Otherwise someone might write code that extracts this and passes this elsewhere without checking for "und"
(I suppose they could still do that with null
but atleast the error would be more obvious then).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we ought to make und
into a working reference for the root locale, but that's rather from this proposal. I need to organise my thoughts on this some more and raise an issue about this in TG2.
and as MF2 considers all literals to be strings, | ||
the JSON string representation should be supported for numerical and boolean values | ||
when used as an input or as an option value. | ||
Each function will need to parse these separately from their string representations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably link to the grammar in ECMA-404 for this
This is actually mandated by the MF2 spec, which currently includes in its Error Handling section:
To be fair, I did write that part of the MF2 spec, but did so rather intentionally and with the support of the MFWG. Intl.MessageFormat is different from the existing Intl formatters in that it relies on data (a message) provided by a user, and for any given message, the data is often not directly controlled by a developer, but provided by a translator. In other words, this interface ends up relying on inputs which will fail more often than existing formatters, and will do so without being caught in automated tests. From experience, it turns out to be really important for localization not to fail catastrophically, because that will lead to broken user experiences because not all formatting calls will be wrapped in a try...catch. Hence the default behaviour of warning on error, rather than throwing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I said in my previous review, the direction of this PR is good, but I have reservations about certain design choices here still. If you prefer to file new issues about these, I'll be happy to discuss them further outside this PR. (They will definitely be easier to discuss against this PR's spec than the current main
.)
A summary of my feedback:
-
There's more work needed around the
MessageValue
type and its use, both internal and potentially external; I think it should be a subclassable type available to users and authors of custom functions. -
The wording of the spec would benefit from defining abstract operations that convert unknown values to known interfaces.
-
MessageValue
should not have alocale
property. Instead, the formatting methods should takelocale
as argument.
it resolves instead to the result of calling the `string` function | ||
with the value as input and no options. | ||
|
||
Otherwise, un-annotated values resolve to the following shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the benefit of allowing that to happen without a custom annotation on the expression? As proposed, it's possible to return custom MessageValue extensions from a user-defined function, but it does require an annotation on the MF2 expression.
"Require" isn't the best word here, because there's nothing we can do on the syntax or data model validation level to prevent such uses. So we're left with runtime errors or MessageUnknownValues
. Instead, I suggest we allow passing instances of MessageValues
as input variables, so that:
- they produce known
MessageParts
when used in a placeholder without an annotation, - they can be used as arguments and options to custom functions.
For example, it may be useful to be able to pass as input variables data that's not supposed to be formatted at all and instead be positioned in the correct spot inside the translation: see the WrappedValue
example from my implementation.
How would you propose that we "check if an un-annotated value is an instance of MessageValue"? As currently proposed, that's just an interface description for plain objects with some expected properties.
It should be a base class instead, and custom function authors should be able to subclass it for the purposes of their custom logic.
localeContext?: LocaleContext; | ||
source?: string; | ||
meta?: Record<string, string>; | ||
locale: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MessageValue
should not have a locale
property. That's what Parts
are for.
Consider {$count :number}
. The number stored in $count
may be decorated with formatting options like minimumFractionDigits
etc., but it's not in any particular locale just yet. It's the locale of the message (or the placeholder if we allow overriding it) that decides how $count
will be formatted.
There's nothing inherently language-specific about the mathematical value of 5
:) Same goes for many string literals, like user names or Wi-Fi network names.
I understand that we need the locale
somewhere for toString
and toParts
to work. I think we should just pass it as argument: toString(locale)
. Perhaps also call it toLocaleString()
for consistency with other JS APIs.
Alternatively, we can consider passing in a larger structure: toString(fmtctx)
. Such "formatting context" may be useful if we want to allow implementations to cache formatters.
(A formatting context should be passed into formatting methods rather than be referenced by a property of MessageValue
to help avoid memory leaks and facilitate IPC use-cases.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the locale here to support messages which embed parts that are not in the message's locale. Consider for example:
The number 1,234.5 is formatted in French as "1 234,5".
When formatting that to parts, the last two parts end up as:
[
...,
{ type: 'number', locale: 'fr', parts: [{ type: 'integer', value: '1' }, ...] },
{ type: 'literal', value: '".' }
]
To get there, the fr
locale needs to be available in the corresponding MessageValue
. It does not need to be available in $count
, because :number
defines its own expected input shape, and for that the locale
is optional.
Fundamentally, MessageValue
is a resolved value that contains everything needed to format it, or to do something else with it like feed it into some custom function. That includes its locale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #25 to continue this, because I still think locale
is special and should not be part of MessageValue
.
Co-authored-by: Ujjwal Sharma <[email protected]>
@stasm I applied some changes in response to your comments, in particular wrt. the As it's currently structured, I think the value gained by making it a real class is marginal, and not worth the cost, but I'm open to being persuaded otherwise. Would it be possible for you to open a separate issue discussing the possibility of doing so as a further step, so that we don't block this PR on it? |
Thanks for the updates. I filed the following issues to continue the discussion without blocking this PR:
It's going to be easier to discuss them against this PR rather than the current state of |
const mf = new Intl.MessageFormat(source, ['en']); | ||
const notifications = mf.resolveMessage({ count: 1 }); | ||
notifications.toString(); // 'You have one new notification' | ||
const notifications = mf.format({ count: 1 }); | ||
// 'You have 1 new notification' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relatively common use case of formatting a message to a string requires the construction of a bunch of intermediate JS objects that are then immediately dismissed, which is pretty inefficient.
Looks like this PR still needs a lot of intermediate JS objects.
I thought it would be something like this:
const mf = new Intl.MessageFormat('en');
const notifications = mf.format(source, { count: 1 });
const another_string = mf.format(source2, { interop: 1 });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That approach would leave the greatest part of the work to the format()
calls, though, and that's quite different from how existing Intl formatters work.
Consider for instance how unit formatting works:
const nf = new Intl.NumberFormat('en-GB', { style: 'unit', unit: 'kilometer', unitDisplay: 'long' });
nf.format(42); // "42 kilometres"
There, everything except for the final runtime values is included in the constructor. Now, let's say that we wanted that distance to be embedded in a longer message. With the currently proposed API that would look like this:
const src = `
let $dist = {$dist :number style=unit unit=kilometer unitDisplay=long}
{Your destination is in {$dist}.}`;
const mf = new Intl.MessageFormat(src, 'en-GB');
mf.format({ dist: 42 }); // "Your destination is in 42 kilometres."
In other words, when getting either a NumberFormat or MessageFormat instance, we should be able to tell what sort of a thing it's formatting. If we leave the source to be a format()
argument for MessageFormat, we won't be able to really say anything at all about the MessageFormat instance, or whether the message source is even syntactically valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jack-Works As moving the source argument to the format()
and formatToParts()
methods could be considered as a next step from this PR, please open a separate issue to consider that. I would be happy to continue there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I agree, but does that mean we need to create a new object for every message we have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what it means.
This is a pretty deep rewrite of the proposed API. I started by drafting this as a sequence of smaller steps, but this is the smallest first step that I could separate out; further PRs will add support for markup expressions, bidi isolation, and additional default formatters. Much of this has been informed by offline conversations with @littledan in particular, as well as recent progress and conversations in the Unicode MFWG.
Fundamentally, this set of changes reconsiders how the API deals with objects. These need to be address by Intl.MessageFormat in three different places:
Previously with the
resolveMessage()
approach, a single definition was shared for all three use cases. Technically, this works, but not without at least three significant drawbacks:The alternative approach proposed here matches existing Intl formatters much more closely, with
format()
providing a string andformatToParts()
an array of static message parts, rather than values with functions attached to them. The match isn't perfect, but it's pretty close. This change does sacrifice some of the expressibility and re-use thatresolveMessage()
enabled, but it should be easier to use and reason about.Compared to the previous, this is how object values work here:
MessageValue
is very similar to its previous incarnation, but in most use cases, these objects won't ever need to be reified.{ type: string, value: string }
used by other Intl formatters. This is unfortunately unavoidable, as not all the values formatted by Intl.MessageFormat are representable as strings, and some (such as numbers) may contain sub-parts, rather than a single string value.Rather than looking at the diff, this PR may be easier to review by looking at its resulting shape on the updates branch. I've asked for reviews from a number of people for this, as I figured you might be interested parties in this significant a change. I don't really expect everyone to reply.