-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IntentName view of literals #459
Conversation
In the discussions around switching to "template" variants, my feeling was the consensus was that I think separating concept and literal in the gramar is problematical as there are no testable differences in behaviour as the concept list is open. I don't think we should be encouraging over-use of Specifically on letters, why restrict to just letters? Currently valid examples such as
As disussed in one of the issues, we could use something like categories Moving the description of But other parts seem good
I'm not sure about the name |
I'm still skeptical of having a special case meaning for an empty head. I'd rather see the production
That makes the template a little more awkward to generate since you have to come up with a name. You can still get the temple with something like Other than that, I like the rewrite. Unlike David, I think separating out literals makes it clearer that they are different. Not part of the rewrite, but something to change: "5.1.5.1: Tables". With properties, I think we have something to say. Or maybe nothing needs to be said since there is potentially a solution to the problem discussed in that section. |
My issue is that it makes them seem different when they are the same. We haven't finalised what is in core, but your suggested list in google sheets has 21 entries. So of the infinitely many names matching Probably the final core list will be bigger than 21, but it will be a small finite number and not include |
I think there is a simple rule of thumb - "does this string denote a concept or a textual override?" In arXiv:2209.06099 the author may have wanted to speak their rabbit polynomials directly, as in (from Theorem 1.1): <msub intent="_rabbit($arg)">
<mi>R</mi>
<mi arg="arg">d</mi>
</msub> But in a system that had pre-defined speech for rabbit polynomials (a hypothetical MathRabbit AT), they would have leveraged the concept-based AT support and marked it as: <msub intent="rabbit-polynomial($arg)">
<mi>R</mi>
<mi arg="arg">d</mi>
</msub> Similarly, a primary school teacher who wants to tone down the speech for their use of rabbit emojis could reach for <mi intent="_bunny">🐇</mi> but would otherwise have no reason to use a concept here otherwise, as simple variables tend to be self-voicing. |
I would have thought that the arXiv author would have been thinking of "rabbit" (or "rabbit-polynomial") as a concept rather than "just speech". Moreover, that MathRabbit AT may or may not exist at the time of authoring, but the intent markup should be valid (or "sensible") even for other AT. |
I see no reason to use
or
make perfect sense. The paper has rabbit in the title, hard to argue it's not a concept. The distinction is so arbitrary, subjective and time-dependent, I don't think we should separate them in the grammar. |
I understand the principle, but it still seems odd to me to distinguish between
`intent="multiplication"` and `intent="_times"`.
Another issue for me is (correct me if I am wrong), but the leading
underscore says "pronounce it this way". Without the underscore it says
"pronounce it this way unless you have additional information about
this concept". So, why would I add an underscore when all that does is
stop AT from possibly doing a better job?
|
according to the version here properties have no effect on unknown concepts.
To get a reading "x f` You would need a syntactic literal head,
Is that intentional? If not, more or less every reference to In the current spec version, this isn't an issue as a "literal" is an "unknown concept name" by definition. |
To answer some of the points that were raised:
If the state of AT at time of publication can do a good job vocalizing via a given Intent concept, that is certainly the right approach, I agree. There are two different situations when an underscore is helpful:
For
That is perfectly fine, then the concept use is appropriate. Since aliasing is unresolved, it's hard to know what MathRabbit will or will not do with the "rabbit" concept without trying out a concrete implementation. If it didn't do what was expected, and showed no desire to issue patches, the underscore mechanism gives the author an out. Maybe it's better to refocus on the primary school variation, where the speech override is the only aim:
When I was proof-reading, I thought my changes relegated the SHOULD to MAY for unknown concepts, but I would have to double-check. The text I was looking at was: "In the case of a [=concept=] name, the property MAY be used in choosing the alternatives supported by the AT." I am quite novice on the subtleties of spec use for MAY and SHOULD, but it seemed to me that if MAY was good enough for the known concepts, it should be just as adequate for the unknown ones. Thanks for the mention of using Unicode |
by is a type of multiplication an alias for dimensional product, I can't see any advantage for using underscore at all here
gives the correct reading and natural functional form. |
We part ways here. By is first and foremost an English preposition or adverb (dictionary). The use in 8-by-8 is abbreviated from "8 rows followed by 8 columns" (where one could argue for a different verb than "follow", it is the verb for reading out the written notation). A good article illustrating this is here:
One can imagine "multiply x by y", "divide x by y", "represent x by y" ... where the prepositional nature of the word is also clear. The principle is even clearer if we reach for conjunction words - |
The usage mentioned in the Dictionary you referenced is closest to "By
and measurements and amounts". where the meaning is more about combining
components of a vector space. As in "I'll whack you with a two by
four". I personally don't see it as being an abbreviation of anything
(or I can't quite guess what). It is also a different usage than
"multiply two by four".
Restricting concepts to only be "Formal" concepts, seems a very painful
route. OTOH, using such obviously overloaded (and thus ambiguous) terms
as (probably unknown) concents, rather than literals, does also seem
prone to eventual collisions.
|
It's more the converse - I want a vehicle that is never a concept, and is always raw text. Well-meaning practitioners may differ on what they view as a concept, so the author should always be the ultimate arbiter for their own materials. And the group can rely upon encyclopedic resources for example values that may easily reach consensus (as their presence in such resource is already proof of social consensus).
I cited a primary source for "followed by" above. Here is another use in compass arithmetic wiki page, where it is possibly closer to "bisecting by":
To me the natural way to represent such English use of by is to mark it as text: <mi mathvariant="normal" intent="_(northeast,_by,east)">NEbE</mi> and once the question is raised how to formalize, establish the concept at play (here it seems to be "quarter-wind") and mark it: <mi mathvariant="normal" intent="quarter-wind:silent(northeast,_by,east)">NEbE</mi> Maybe it is painful, but it is "the pain of formalization reserved for formalization" rather than "the pain of formalization extended into accessibility". And certainly less painful than getting whacked with a two by four :> Edit: I should mention, once we also have a <mi mathvariant="normal" intent="quarter-wind(northeast,east)">NEbE</mi> |
you can force a literal interpretation on the current draft using a leading underdcore you do not need this pr for that.roots on "by" of course there are multiple unrelated uses of that word, but that doesn't imply you should force it to be text with no implied semantics. root might be a radical or it might refer to roots of equations or possible other uses. but that has not stopped us using root() as a function name. |
Yes, the PR changes the framing of that feature, it does not introduce it. I will be cleaning it up in the next couple of days to be compatible with the current state of the spec. We're likely to discuss it again next Thursday (April 20th). I doubt much will be resolved before then.
It will be the "lack of AT support in cases deemed important to remediate" that will lead to forcing text, a very practical motivator. If AT does a good job, and/or the author is satisfied with the functional notation outcome, there won't be any need for overrides.
I had brought an example to the group some time back where one could no longer use "root" directly, which was to emphasize a conceptual nuance. In order to vocalize the principal-square-root, especially in pedagogical materials, one would reach to the differently named - but very much related - concept. Leveraging simple words for Core makes sense for the sake of convenience, such as <msqrt intent="_(principal-square-root,_over,$x)">
<mi arg="x">x</mi>
</msqrt> and delivering the desired narration. |
In the event I really wanted to force that wording I can't see ever wanting to encode a one argument function as a silently named function of three spurious arguments.
is a far more reasonable way to express this. But I'd probably use
"over" seems a strange word to use (although in other examples using |
See, this is why this PR is needed. Your perspective on underscore is as if it was a symbol from a Content Dictionary. Instead, it is just a pragmatic means to an end that interoperates with the functional syntax. If the syntax is completely unpalatable we still have the option to reach for something different, such as square brackets:
Btw, |
No sorry, this PR is a move in the wrong direction. as seen with |
In reading through the discussion, I think I have a third point of view. Part of this is that although I'm the AT advocate and not the content person, I'd still like to make the two needs as compatible as possible. So I'd rather see us encouraging One thought about the difference between literals and concept names that I have not seen mentioned is internationalization. With a repository of open names, then Deyan's idea of downloading them before an AT release and doing a translation is a possibility (I'd really like to see a column that uses the name in a phrase to improve the chance that auto translation does a reasonable job, but that's not this issue). My feeling is that literals never get translated because they wouldn't be listed in a repository of open names. Does anyone else think there is a difference between literals and concept names wrt to translation? |
As currently worded in the spec that's necessarily true as names known to the system are concepts, names not known are literals. As the list of names known to the system is system specific, and, as you say, possibly dynamic, there should not be a syntactic distinction and forcing the author to choose. I would say more or less any name usable on its own or as a function head or "real" argument of a function could be known by some system so should be in the same syntactic category as concept. If we want a syntactic "literal" for connective words, we could make them share a grammatical category with comma if you had something like
then you could replace commas by words
and recover a semantic expression by dropping initial and final literals and replacing intermediate ones by commas. This would also allow dropping spurious comma separated arguments from
still looks horrible but better than the version with commas, although
would be preferable, and easier to extract a semantically meaningful expression. |
Same here. But I don't believe in forcing the issue of creating Content forms, because the symbols we receive (even from group members) are often too artificial to be useful. Notice that even in David's last example he added a symbol that is too close to language with I have taken a view closer to "progressive enhancement", which I tried to illustrate recently with my quarter-wind example here. Remediators focused on speech can solve the harder issues they have quickly with What I expect to see in practice if the That will not really improve on The underscore allows well-meaning remediators to state "this is really just a text override", and avoid that confusion.
The state of art in translation currently uses neural language models, and they can be made to work with either setup, as long as there is a prior stage that fully serializes an intent expression in a textual form from the source language. ("free R algebra on X" already works in Google Translate for the Bulgarian translation, but Bing translate makes a mistake, translating "free" as in "cost-free algebra" rather than "libre algebra") For workflows that do not want to rely on neural models, dictionary lookup on the fixed parts of speech is possible (e.g. prepositions, determiners, conjunctions, common adverbs, numeral words). Mapping I think again here there is a question of timescale - if someone wanted a translation to their language today, they would use a translation engine that exists today. If we wanted a perfect symbolic translation of |
Yes as I say I would use |
although I'm the AT advocate and not the content
person, I'd still like to make the two needs as compatible as possible.
Compatible, yes, but not (I think) from the point of view of
*translating* presentation+intent to content. That will be perceived as
a replacement for Content MathML; and would be a very poor one unless
designed to be a *complete* replacement from the start, which is way out
of scope.
OTOH, where the semantic slant of Concepts helps accessibility, that's a
good thing!
And if a MathML generator aspires to create both intent and Content
MathML, it ought not to have to work with two completely different views
of "semantics" and collections of dictionaries and such. So,
compatibility is good!
My feeling is that literals never get translated because they wouldn't
be listed in a repository of open names. Does anyone else think there is
a difference between literals and concept names wrt to translation?
Although you might implement that way, ersonally, I wouldn't make that
assumption at the spec level; it seems to force a particular style of
implementation. But then the logistics of translation are definitely
something we need to come to grips with.
I see two basic strategies:
If dictionaries (core, open) have translation information, then
presumably the AT ends up with a sequence of translated phrases and
random literals. I think you're suggesting just keeping the literals
as-is, which will work sometimes and be awkward others. But the AT can
attempt to translate the literals in isolation, with some external tool,
and paste them back in; it might be not quite grammatical however.
The other approach is to generate the complete phrase in (say) English
and then pass that to an external translator. This, as @dginev
suggests, is probably the first, easiest, approach. This would likely
be more grammatical, but probably not mathematics appropriate.
|
d9f3981
to
6a69c6c
Compare
I'm still finding myself with mixed feelings here. I agree with @davidcarlisle that this PR goes too far in giving prominence to literals and seeming to encourage forcing specific speech. My understanding was that this was discouraged by the AT folks. OTOH, without some part of this PR, we're leaving the notion of "literal" a bit too vague with people implicitly landing on quite different interpretations. I think that at least we should be clearer about distinguishing "known" concepts (found in Core or Open) from unknown ones, and be clear that unknown concepts are treated like literals --- but are not literals. |
I don't think that there should be any syntactic difference. "open" (and even possibly "core") are not (in current proposals) machine readable, but rather simply web accessible lists where implementers can record the names for which they implement rules. "open" in particular is very time dependent. What matters at run time is neither of those lists but rather the system specific list implemented by the consuming system. This is unknown to the document author so the author should not have to distinguish known concepts from unknown ones. I can not see any cases where |
distinguish known concepts from unkown ones. I can not see any cases
where |intent="foo"| should be treated as a different grammatical
category to `intent="_foo"`` We already say that the latter won't be in
the lists so will be a literal.
IF "foo" eventually showed up in an Open Dictionary, with some
behavioral information (translations or something?) and the AT used that
open dictionary, the "foo" would be treated differently than "_foo".
I'm not suggesting as major a change as Deyan's PR. I'm just suggesting
that we be careful with the language and that: In the case where "foo"
was NOT found in a dictionary, it would be "treated as a literal", but
NOT that it would BE a literal.
|
Bookkeeping another clarifying example on the lines of I was delighted to catch in this lecture recording the same Core concept (multiplication) spoken with different words in rapid succession, transcribed:
"one-half of" vs "one-half times", both written as with I suspect the key thing to notice here is that each language pattern is sensible because we are talking about distinct but isomorphic mathematical operations. Multiplying x by a scalar 0.5 OR having a "one-half" function that halves x, are just different abstractions over the same operation. And hence - different possible speech. To tie the example with the discussion here, I very much agree that authors should not be encouraged to micromanage connector words. What I am trying to illustrate is that language overrides can have a clear boundary dividing them from intent Concepts. And that AT itself may want to use different connector words based on context. It wouldn't be an error if So if an author wanted to force a specific reading using Aside: This also reminded me that the "half"-based reading isn't common in Bulgarian. We tend to use "one second" (sic), "one third" etc ("една втора", "една трета") and then we can also naturally alternate between the "times" ("по") and "of" ("от") prepositions. Curiously, if one wanted to insist on the "half" ("половина") word in Bulgarian, it sounds very unnatural to use the "times" ("по") connector word after, where we would commonly say "half of" ("половина от"). Apologies for the long comment, mostly wanted to add the example somewhere. |
Authors should be strongly discouraged from doing this at all but if they really must it is far preferable to use |
Seen differently: When we are reaching for a "speech override" using a semantically void construct is exactly the correct design, because it clearly indicates the override was intended, and not accidental (= using some temporary implementation state of AT that may change with the next major version of the software). |
intent doesn't have a speech over-ride feature, it's just that if you (ab)use a silent function head and the fact that spurious arguments are spoken literally then you can in fact force any speech. even then |
The more it gets promoted as a thing to do, the more it feels like a bug rather than a feature. |
Resolved in light of #466 , as discussed in the meeting on May 4th 2023. |
This PR is a follow-up on the discussion in #457 , and also attempts at consolidating a range of perspectives that I have held for some time, but haven't succeeded in contributing to the spec text (yet?)
The grammar is quite useful for summarizing such changesets I think, so I am adding it excerpted here:
It can also be viewed at the gh-pages of dginev/mathml, though I suspect that preview is quite temporary.
Itemized list of proposals:
IntentName
that is a dash-separated sequence of letters, and use it for all name-like categories. KeepNCName
for id-like categories.concept
fromliteral
where only entries starting with_
are considered grammatically "literal"._by
should always be a literal in its English use, while_factorial
should only ever be a literal if it is trying to override thefactorial
name to always use that string.The top-level- independently added via table and script properties #462intent
is either provided, anexpression
, or instead it is leftimplied
while carrying a list of properties.atom
category. They happen to be the only expressions allowed properties for now (based on discussions so far)As to the substance of #457 - I have de-emphasized the role of "known properties" in the text, and re-emphasized the grammatical structure for
application
. I added the clarification for the_
function head to the application rule, which seems to be a good place to encounter it.I still find it hard to write these PRs, and my workflow is a bit clunky, so there may be markup errors here, or awkward phrasing. Language feedback is very welcome - I can improve the text as needed.
I also have a hat which likes to criticize grammars for getting more verbose - and this PR indeed does that. A lot of that comes as part of trying to wrestle with seeking clarity between the types of data we are working with (e.g. separate fully concepts vs literal) while also living with the realities imposed by basing this all on NCName. Not sure if this is optimal, but I think I like it more than what's currently in the spec.