-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JSON schema descriptions for BQ and ES #18
Conversation
zschema/__main__.py
Outdated
@@ -27,26 +28,15 @@ def main(): | |||
print json.dumps(record.to_bigquery()) | |||
elif command == "elasticsearch": | |||
print json.dumps(record.to_es(recname)) | |||
elif command == "es-annotated": | |||
print json.dumps(record.to_es(recname, annotated=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's es-annotated
mean? Why would we output annotated data to elasticsearch? What is annotated data?
Test failure is fixed in #22. |
552ea55
to
b294950
Compare
zschema/compounds.py
Outdated
@@ -10,9 +10,10 @@ def _is_valid_object(name, object_): | |||
|
|||
class ListOf(Keyable): | |||
|
|||
def __init__(self, object_, max_items=10): | |||
def __init__(self, object_, max_items=10, category=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ListOf
doesn't recognize doc
; should that be pushed up to Keyable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I've been looking at this wrong; for the primitive types at least, everything works because e.g. String()
returns a new instance each time -- "this field is an instance of the String
type with these options".
So maybe the problem is using SubRecord()
, which returns an instance of the type instead of returning a constructor that can be used like String()
; i.e. we need something like Certificate = SubRecordType({ ... definition ...})
, then "IssuerCert": Certificate(doc="The server certificate's issuer")
. That might fit better with the current model than trying to introduce separate docs for fields and types.
To circle back, ListOf()
already works like this (unless you wanted to create an "alias" of a list type -- in which case you would need e.g. CertPool = ListOfType(Certificate(doc="A certificate pool entry."))
This actually dovetails with a broader issue I've come across, where we are attaching things like
doc
to the types, where they should really be attached to fields.An easy example is for certificates; you would probably want different docs on the issuer certificate and the subject certificate (though the docs for the fields inside would obviously be the same between them).
We could hack around this by making different "types" for each field, but that doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another example -- the SSH logs include both the client EndpointId
and the server EndpointId
, which are the same struct at the go level, but in the schema, one contains AnalyzedString()
types, while the other has String()
types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like
ListOf
doesn't recognizedoc
; should that be pushed up toKeyable
?
My initial thinking here was that a list just lists things of a given type, so the documentation from the type being listed ought to be sufficient. Thinking more, that is wrong: a list of strings has a distinct meaning from a string. It makes sense for a list itself to have documentation describing what the list actually does — the context for the objects of this type — and that should be easy to add (though even if some part of this moves up to Keyable
, we'll still have to modify the initializer for ListOf
).
So maybe the problem is using
SubRecord()
, which returns an instance of the type instead of returning a constructor that can be used likeString()
; i.e. we need something likeCertificate = SubRecordType({ ... definition ...})
, then"IssuerCert": Certificate(doc="The server certificate's issuer")
. That might fit better with the current model than trying to introduce separate docs for fields and types.…
This actually dovetails with a broader issue I've come across, where we are attaching things like
doc
to the types, where they should really be attached to fields.
If I can rephrase, to be sure we're on the same page: a record definition that gets reused, like Certificate
, currently cannot have different documentation depending on the context where it is reused.
This is a definite limitation that it's worth calling out. My thinking here is basically "hopefully it's clear from context what the (certificate, etc.) is," eg. the subject and issuer certs can be differentiated by the fact that one is named "subject" and the other is named "issuer."
Removing this limitation would be a major change that would, I think, require some nontrivial changes to zschema and major changes to our schema. I am not sure we want to undertake that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this limitation would be a major change that would, I think, require some nontrivial changes to zschema and major changes to our schema. I am not sure we want to undertake that right now.
I agree with that, @cdzombak; nice to have, and should happen, but out of scope of this PR.
It makes sense for a list itself to have documentation describing what the list actually does — the context for the objects of this type — and that should be easy to add (though even if some part of this moves up to
Keyable
, we'll still have to modify the initializer forListOf
).
@justinbastress & @cdzombak: do we want to implement that as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to implement that as part of this PR?
I'll work on it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5be9e26 introduces a doc
field for ListOf
.
zschema/compounds.py
Outdated
@@ -33,9 +34,22 @@ def to_bigquery(self, name): | |||
retv["mode"] = "REPEATED" | |||
return retv | |||
|
|||
def docs_bq(self, parent_category=None): | |||
retv = self.object_.docs_bq() | |||
category = self.category if self.category else parent_category |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pulled out into a method (e.g. get_category()
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pulled out into a method (e.g.
get_category()
)?
Good note, @justinbastress. While it might be an intuitive restructuring, it wouldn't gain us much since we still have to fallback to parent_category
passed into this method. Let's just leave the code as is (it could be made terser with category = self.category or parent_category
, but meh).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ what Andrew said. I do like the more concise suggestion, implemented in 5136ebf
zschema/compounds.py
Outdated
retv = { | ||
"category": category, | ||
"doc": self.doc, | ||
"type": self.__class__.__name__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it would be nice if this could be overridden...
__class__.__name__
doesn't keep package information (?)- Constrains our type names to match python class name rules (and constrains our class names to match our type names rules?)
Beyond the scope of this PR, but I just noticed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it would be nice if this could be overridden
…
Beyond the scope of this PR, but I just noticed it.
Good point, @justinbastress. It looks like coupling to __class__.__name__
has been the idiom for a while (e.g., an initial commit for the CLI), but an area for future improvement.
zschema/leaves.py
Outdated
"doc": self.doc, | ||
"required": self.required, | ||
} | ||
if hasattr(self, "values_s") and len(self.values_s): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are retv["values"
] and retv["examples"]
mutually exclusive?
It would be nice if the values could be individually documented (something like "Algorithms": Enum(documented_values={"value1": "docs for value1"})
, except more natural)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It se ms reasonable to let different types define their own type of '.doc' property. The default could be to just hand back._doc but could also do something else like compilea list of enumerated values into a doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are
retv["values"
] andretv["examples"]
mutually exclusive?
If there is an exhaustive list of possible values, I see no reason to support a list of example values as well.
It would be nice if the values could be individually documented (something like
"Algorithms": Enum(documented_values={"value1": "docs for value1"})
, except more natural)
Maybe, though I'm unconvinced this is worth the work to implement. The two examples of enums I recall offhand in our schema are elliptic curves and certificate types (leaf/intermediate/root), both of which seem plenty self-explanatory given the existence of a docstring for the field and a list of possible values.
It se ms reasonable to let different types define their own type of '.doc' property. The default could be to just hand back._doc but could also do something else like compilea list of enumerated values into a doc string.
@zakird I'm not sure I follow what you're suggesting here. As of this PR, types can have a docstring and a list of example values (or a list of possible values, for enums). What are you suggesting should change?
1ffd929
to
1b0cfb7
Compare
This is so we can assert on the format when testing our new "docs" output. This also required updating the BigQuery inline fixture. Co-authored-by: Chris Dzombak <[email protected]>
Co-authored-by: Chris Dzombak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 👍 to merge once the tests go green. Things 🆗 on your end, @justinbastress?
I say we ❕
Minor point of orderI don't think this PR closes #21, though they are related. #21 is about adding support for JSON Schema proper as an output format, whereas this PR outputs a zschema-specific media type. The issue remains an open backlog item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran through with my schema updates -- Looks good.
This PR adds two new output formats,
docs-es
anddocs-bq
. These formats are a JSON description of the schema, with a similar format between the two schemas. See https://gist.github.com/cdzombak/1d47457fd2b82f6e7bbe95508786c4a7 for example output based on the current ZTag schema.This includes…
Enum
leavesdoc
field for records and leavesexamples
field, available on records and leavescategory
property on records and leavesThe
category
field is intended to be a human-usable categorization of a record and its subrecords. This value cascades, so if you set it on a record it'll apply that category to all subrecords & leaves. A subrecord or leaf can also set a different category for itself and its children.I have also removed some unused code/ivars/arguments, unimplemented methods, and soon-to-be-unused output formats (we'll use these annotated outputs to build better documentation tools).
closes #21