-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Foreign object model; round-trip preservation #16
Comments
It's not clear to me that there is any issue to be addressed here, the fact that a string encoded the binary encoding has |
I tend to agree with David here. Let's ask @lars-hellstrom if he upholds his problem. |
Yes, I maintain that there is a problem! The problem is that it is not clear from the standard how foreign objects in one encoding correspond to those in another encoding. This becomes painfully clear if you imagine writing a tool that converts between the binary and XML encodings. Should a character The root of the problem is probably that the standard is vague about the importance of the
"XML namespace" should also be considered to include the two historical strings (that I don't have the zeal to look up right now). |
I don't really think < is special here.
In the binary encoding a foreign object is just a stream of bytes of
specified length even if it's "a" or "<" or byte 3 you can't really
reliably convert it to xml, other than (say) base64 encoding it and putting
it in the xml that way or writing the byte stream out to a file and
referencing it from xml. I think the proposed restriction would negate any
advantage of using the binary encoding, in the binary encoding you could
for example include an png image inline as a foreign object, you don't want
to have to encode that as an XML compatible string, if you then want to
write the OM object as xml, you will have to base64 encode that data to get
an xml-compatible string that you could put in an omforeign.
…On 5 October 2017 at 17:32, lars-hellstrom ***@***.***> wrote:
Yes, I maintain that there is a problem! The problem is that it is not
clear from the standard how foreign objects in one encoding correspond to
those in another encoding.
This becomes painfully clear if you imagine writing a tool that converts
between the binary and XML encodings. Should a character < in the payload
of a binary encoding foreign object be turned into < or < in the
contents of an XML encoding OMFOREIGN object? If the latter, there is no
way to put tags in those contents. If the former, you implicitly say that
the payload of a binary encoding foreign object is always XML, which is
equally silly.
The root of the problem is probably that the standard is vague about the
importance of the encoding attribute, since different types of foreign
objects need different translations between different OpenMath-encodings. I
propose that wording to the effect of imposing the following restrictions
should be added:
- If the encoding is an XML namespace, then the payload of a binary
encoding foreign object is XML code (in UTF-8 encoding).
- If the encoding is *not* an XML namespace, then the contents of an
XML encoding foreign object may only be character data. (I.e., no tags,
processing instructions, or the like.)
"XML namespace" should also be considered to include the two historical
strings (that I don't have the zeal to look up right now).
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABNcAnrSIpVKMb09Udr7XtTy8cKqrzGuks5spQStgaJpZM4PqoRm>
.
|
I agree with David on this. |
An OpenMath object is fundamentally the abstract thing described in Chapter 2, which may contain foreign objects. These abstract things then have a number of encodings, as described in the sections of Chapter 3. I have always presumed that in order to qualify as an encoding of OpenMath, a scheme must be able to encode all OpenMath objects, but it seems @davidcarlisle rather says each encoding has a different domain, because they support different sets of foreign objects? The point about what happens to a PNG when included in an XML format is an interesting one — I had assumed that the octets should just become characters
Thus: we can reliably tell which sequence of characters the payload of an OpenMath-binary foreign object encodes. What is unclear is how those characters should get encoded when transcoding to OpenMath-XML. If The two bullet points I proposed are precisely about clarifying when one should do one thing and when one should do the other. Perhaps it's easier if I try to phrase it in RNG? Right now we have for OpenMath-XML
I think that should be changed to (though bear in mind that I'm guessing on RNG syntax)
|
On 6 October 2017 at 13:51, lars-hellstrom ***@***.***> wrote:
An *OpenMath object* is fundamentally the abstract thing described in Chapter
2
<https://openmath.github.io/standard/om20-2017-07-22/omstd20.html#cha_obj>,
which may contain foreign objects. These abstract things then have a number
of encodings, as described in the sections of Chapter 3. I have always
presumed that in order to qualify as an *encoding of OpenMath*, a scheme
must be able to encode *all* OpenMath objects, but it seems @davidcarlisle
<https://github.com/davidcarlisle> rather says each encoding has a
different domain, because they support different sets of foreign objects?
The point about what happens to a PNG when included in an XML format is an
interesting one — I had assumed that the octets should just become
characters � through ÿ (looks horrible, but should process OK),
No it's not well formed, so fatal syntax error. (xml 1.0 doesn't allow any
numeric refereces to control characters, xml 1.1 allows all but 0 but no
one uses that and it doesn't really help.
You can put any sequence of bytes in an xml file if you interpret that as
base64 encoding the sequence and putting that in
so I assumed that going from binary to xml that's what you'd do. It does
mean that if you start with some xml encoded foreign
object in an xml openmath object and map it to the binary encoding and back
you may have a different encoding (base 64) of the original but it's not
really ambiguous.
although some of them being forbidden maybe requires something like adding
a base64-encoding layer on top of the octets instead — but that is a
separate problem. My example shows there are problems already within the
realm of *character based formats*. About the use of these in the binary
encoding, the standard explicitly says:
Character based formats (including XML based formats) should be encoded in
UTF-8 to produce a stream of bytes to use as the payload of the foreign
object.
Thus: we *can* reliably tell which sequence of characters the payload of
an OpenMath-binary foreign object encodes. What is *unclear* is how those
characters should get encoded when transcoding to OpenMath-XML. If < is
encoded as < then the transcoded object cannot contain tags, which is
wrong for MathML. If < is encoded as < then the character sequence had to
be well-formed XML already in the binary encoding, which is wrong for
everything not XML.
The two bullet points I proposed are precisely about clarifying when one
should do one thing and when one should do the other.
Perhaps it's easier if I try to phrase it in RNG? Right now we have for
OpenMath-XML
# foreign constructor
OMFOREIGN = element OMFOREIGN {
compound.attributes, attribute encoding {xsd:string}?,
(omel|notom)* }
I think that should be changed to (though bear in mind that I'm guessing
on RNG syntax)
# foreign constructor
OMFOREIGN = element OMFOREIGN {
compound.attributes, attribute encoding {xsd:anyURI}?,
(omel|notom)* }
|
element OMFOREIGN {
compound.attributes, attribute encoding {xsd:anyMIME}?,
text }
you can't syntactically distinguish a mimetype from a (possibly relative)
URI
text/xml could be that mimetype or it could be a relative url to text/xml.
You can of course special case case certain strings matching xml mime types
for special handling
which would help round trip thise special cases but to be honest I don't
see it's a problem if you start with
om-xml with a foreign xhtml document inlined, map to binary (just putting
the string representaion inline)
then map back to om-xml base64 encoding teh string to end up with the xhtml
document in a base64-ended-xhtml-encoding
the underlying object hasn't changed you just have a different encoding of
teh foreign object
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABNcAkgAgiLsTCyv5Ofp9cGoOnOZyzo1ks5spiJLgaJpZM4PqoRm>
.
|
@davidcarlisle wrote:
For PNG I suppose that works OK, provided you change the
If you do that, then I strongly suspect that the elements in that foreign object will no longer show up as elements in the DOM tree. That's not what is expected when you do a round trip — the conversion operations should be inverses of each other (up to relevant equivalence), not just injective.
Oh, right! I was thinking "absolute URI", not "any URI"; my bad. The idea was to catch XML namespaces — those should be absolute URIs, should they not? |
well there's the rub. I would consider an OM XML with a foreign object encoded as xml with encoding="foo" and another OM XML with the same foreign object encoded as encoding="base64-encoded-foo" as equivalent if they represent the same OM object, I think you'd consider them different. practically if you do the simplest, safest thing each time then every time you convert and convert back you will add another layer of base64 encoding, some systems may special case known encodings and avoid that (or in practice most systems will only handle a very limited range of foreign objects anyway) |
see OpenMath/OM3#148
The text was updated successfully, but these errors were encountered: