-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions and remarks about supported types, missing types and named variables #213
Comments
I'll answer all these questions in a follow up comment, but I'll start by saying that I'm almost done with a proper manual for FQL which may clarify a lot of these questions. |
It's modeled after Go's implementation of the tuple layer. Thanks for pointing this out. I will need to think about what I want to do with this. Go's tuple layer makes all the integers look like int64 or uint64.
I have not added support yet. This will be supported, most likely as the tokens like
I haven't encountered any problems with parsing UUIDs myself. The context of the FQL query provided me with enough information to parse them without additional bounding characters.
Ah, this gives me a clue as to why you had problems parsing UUIDs. I allow my parser to look ahead, which allows me to see that the
I currently use Go's standard library to parse the hex string, which allows for either. In the syntax definition I only allow for uppercase, but I'm planning on changing this. I prefer lower case myself.
Yes, you are correct. Also, FQL doesn't support reading/writing key-value outside of the directory layer. Therefore, it doesn't support reading/writing to the system subspace. I may change this in the future.
Unicode is not currently supported, only ASCII. I do plan to add unicode support in the future.
Yes.
Not currently supported, though I may add support for this in the future.
Yes, version stamps will be supported sometime in the future. They are not supported yet.
Yes, I've considered this. I may add this in the future.
I have not added support for partitions yet. I still need to look into this. When you set up an FQL instance, you must provide a root directory which would contain all queries to within that directory. I expect partition to work in a similar way.
I don't plan to support reading/writing all possible key-values right now. For the near future, I'm focused on supporting the 99% of use cases which only includes key-values encoded using a directory (made of strings) and a tuple. After I have this working, documented, and well tested, then I may implement support for other cases like this one. This is similar to the question about system keys. Most user don't need to access these, so I will focus on the most common use cases first.
Yes, this is one of the goals of the project.
Yes, this is a feature which I will explain in the manual. The manual should be available within the next week or so. |
I've been able to write a very naive first version of a fql parser and query runner for range reads, and this is already VERY handy! In one of my tests, I used a schema for a very simple document+index layer I had, and simply by using fql queries, I was able to implement a query system on top of that, without much effort. This could be a game changer when writing layers, where adding long running queries has always been the most difficult task! If a fql Layer could implement proper long range reads, it could be reused by a lot of other layers. By the way, should it spelled "fql" or "FQL" for the "official" name?
I had the same issue when porting the original Tuple Layer from python to .NET. The layer has been mostly design in Python, so inherits its type system. I've been told that sorting python tuples or sorting the bytes from the packed tuples should yield the same order. I had to replicate this behavior in .NET, and it's probably that Go would have the same issue. In practice, I treat all numbers are equivalent, whatever their types, and only treat signed/unsigned and 32/64 bits as a limitation in the domains. Meaning that For decimal numbers, this is a bit complicated, because for me "1.0" is the same a "1", but in Python floating point numbers where encoded with a different prefix in the tuple layer. In my mind, the criteria for "is the same as" would be "do they produce the same bytes when encoded with the tuple layer" ?
My parser is also "look ahead", in fact it can see the whole statement. The remark was more regarding having a very noticeable visual queue when a human looks at the statement. This may vary from person to person, but for me, between these two strings, I can immediately spot the uuid in the second one:
I you want to add support for system keys (0xff) and the directory layer (0xfe), then you can use custom user types that exist in the current tuple layer spec. I don't know if they are implemented by the Go binding. They simply define an item with a byte header, followed by any number of bytes (so like any other pre-defined types). In the .NET binding, I have the following "well known" custom types:
Whatever you do, I hope you will standardize on UTF-8 being the only encoding. It supports ASCII out of the box, and you could simply use the same syntax as in JSON (https://datatracker.ietf.org/doc/html/rfc8259#section-7) |
So, if you use '/foo' has the root directory, does this mean that the query Since I had to deal with this in my tests (they are automatically running in a dedicated test partition with a deep path, including the worker host name, process id, etc..) I temporarily used "./bar/bar" to differentiate with "/bar/baz". I also need to be able to distinguish between "the root of the application" (which is fixed) and "the current directory", since I plan to add this to my command line shell, where you can
One very frequent thing is using integer constants to sub-divide a directory subspace, without involving the Directory Layer. For ex, in pseudo code:
If would be nice to be able to define constants to give names for 0, 1, 2, and 3, such as These could be encoded in some standardized JSON object that would be stored with a standard key at the root of a layer. Layers are supposed to have a non-empty layer_id with the name when created with the Directory Layer (which may not be a frequently used feature?), which make it easy to detect: You you enter a directory that has a non-empty "layer id", look for a "_schema" key (or something else), decode the JSON, and use the content to obtain a set of one or more "predefined" fql queries that would include named constants and variables: pseudo code: {
/* ... */
"queries": {
"metadata": "(<key:string>) = <value:any>",
"doc": "(0, <rid:int>) = <json:bytes>",
"pk": "(1, <doc_id:string>) = <rid:int>",
"idx_artist": "(2, <artist:string>, <rid:int>) = nil",
"idx_genre": "(3, <genre:string>, <rid:int>) = nil",
"idx_year": "(4, <year:uint>, <rid:int>) = nil"
}
} In a shell, the query short cut would be converted into an actual fql command:
Ideally, the value could also describe it's type, so that it could be decoded into a human readable format. Most values are usually counters, numbers, strings, json bytes, protobuf bytes, etc... |
Amazing. I'm glad these ideas are useful to you.
I don't have a strong opinion. I usually write it as FQL.
I don't think I've ever needed to replicate the ordering of tuples client side. I just allow FDB to order the tuples for me.
According to FQL's specification, 1.0 is no the same as 1. All floating point number require a decimal point.
Yes, this is the plan.
FQL doesn't currently have any concept of a "current directory". What I linked was the Go API for using FQL programmatically in Go without including query strings. I do plan to add a meta language which allows you to alias snippets of a query for use in the queries. For example, the syntax may look something like this:
The
Yup. This could be handled by the meta language I describe above.
I don't understand how this is better than using separate directories. Can you explain to me?
Yes, the meta language would include macros which could do stuff like this. For instance:
I'm still working on this meta language syntax, so don't take any of these examples as law quite yet. |
The directory layer has a non-trivial overhead: for each directory that you need to traverse, you need to perform sequential reads to the cluster. If you have a path of depth N, you need at least N round-trips to the database, before being able to read actual data! This can really add up ! (see below for a log of a transaction that opens the path Directory subspaces are not safe to cache either, since they could be renamed/moved/deleted/re-created and get a new prefix. If I added a lot of infrastructure in the .NET binding just to try to solve this issue (deferred value-checks), but this is not magic, and I don't think this exists in the Go binding. If you don't have a massive amount of data, the easiest solution is to have the minimum number of directories, and sub-divide them using a simple prefix like 0 (1 byte), or a small integer (2 bytes). In fact, this was how it was done before the Directory Layer was introduced, the notion of "subspaces" is adding a common prefix to a parent "subspace" to subdivide it. The Directory Layer only acts as a global map that translates long strings (directory names) into small integer + adding a hierarchy. It could be argued that for example in The last small advantage, is that splitting a subspace "by hand" will ensure that the data stay collocated, while if you were using sub-directories, they could have very different prefix, and be separated by other data from other layers/tenant in between them. Example log that shows the impact of the Directory Layer:
|
I guess this is where our workloads differ. I have never used the directory rename/move feature, so on application start up we open all the directories we plan to use and cache them for the entire life of the application. If you are not caching directories then I understand the concern. |
I used to do the same thing a long time ago and got bitten by this, so I encourage you to look into this. Maybe now you are not using tools that do this, but maybe in the future you will have to use a third party tool, or change how you backup/restore in such a way that the directories could be recreated in a different order (with different prefixes, ...). You will also have to address the issue of in-place schema migration at some point. You only need one rogue process somewhere in the cluster that was NOT properly killed or updated, or someone that resumes a paused VM, a network split that isolated part of the nodes, etc... The bad news is that the data layout used by the Directory Layer is not very friendly for caching, so there is no easy solution. And this hasn't changed in ~10 years now, and likely never will. It was designed before VersionStamps and atomic operations existed, so today it would be very different I guess. One solution I used was to cache all the prefixes of all the parents paths needed to traverse You could use watches to maybe flush the cache when a directory key is changed, but watches are async, it is possible in some cases that they won't fire, and also there is a limited number of them. Recently, I ended up with deeper and deeper directories, which made this too slow, so I took the bullet, and made a slight change to the DL, by adding a metadata key to each partition, that is atomically incremented every time a directory is deleted or renamed (not created). This way, I only need to check one key per partition (a single key if you only use the root partition), instead of N keys. |
Yea, we've accidentally recreated/moved directories, but because all our processes are stateless we just restarted them so that they cleared their cache and obtained the correct directory. So I agree it is a problem, just not a very bad problem in our case.
The way we address this at my job is by either using a new directory path or by supporting multiple KV schemas within a directory. In other words, we try to keep things backwards compatible as much as possible. |
Anyhow, using a manual prefix for subspaces is supported by FQL as long as it's within a directory. It also sounds like your use case could be supported by simply allowing keys to only contain a tuple, without a directory. Do you often use keys without the tuple layer? |
No, I've always used the Tuple layer because it's so much more easier to use when designing a layer, and is nicely printable in logs. Though I've heard of people using keys encoded with raw uuids, with protobuf or some other binary protocol. There are APIs in the DL to register such a subspace by telling the DL to NOT allocate a prefix that would collide with it by random chance. The only keys outside the DL I read are mostly system keys such as \xff/metadataVersion, or some other special keys that are there. But 99.9+% of my keys are encoded with tuples, and probably in a sub-partition (one per tenant) with paths with a very shallow depth to reduce the burden on the DL cache check. Note that in 7.3 there is a new Tenant API that basically does the same thing as partitions in the DL, but at the native client level. I have not yet had the chance to play with it. I've added basic support for it in the .NET Binding, but the cluster needs to be specifically configured for it, and it seems tailored for big users. I don't think this would impact FQL very much, except that the tool would need to select a tenant, and then it would be as if it was the only one in the cluster's keyspace (with its own Directory Layer instance). This makes path like |
By the way, your feedback has been very helpful. Online, it's difficult to see if the other person is enjoying the conversation, so I wanted to make it clear. You are helping me consider other workloads and use cases, which is good. |
Btw, I encourage you to go play with versiontamps: they are the one feature that most changed the way I design layers, and so I (ab)use them everywhere :) They are usually used as sequential keys in change logs, or as out-of-the-box sequential uuid in data stores, or as a cache busting mechanism. This makes them usually present in the very first part of a tuple, which means that FQL queries would frequently have to "say their name". They can also be present in the value as a pointer to another key in another subspace. And since I use them a lot, I'd vote for a shorter type name like For their text representation, I'm not sure if there is a standard. I use |
Yes, we have used them at my job. At one point we implemented a message queue on top of Foundation DB and used the versionstamps to order the messages. They are very useful and will absolutely be included in FQL.
Yes, you may have noticed I preferred shorter type names.
Yea, I think they should have their own unique textual form as you are saying. Simply using a hex string wouldn't be enough.
At one point we were using versionstamps as a clock. Eventually we decided to use our own clock key. We have a clock client which basically updating the clock key once per second and we synchronize all our services against this clock value. |
After implementing support for Big Integer in the tuple layer, I am even more confused by the whole distinction between Looking at the spec, it only has a single concept of "integer values", where both This means that if you find the bytes The only cases where this is evident, just by looking at the bytes, is for negative values, or very large integers (> 64 bits).
I've looked at the implementation in the Java binding, and it is also mixing Looking at the C++ implementation (which maybe could be considered the real reference implementation, since this is the one use the the client and server code at runtime?), there is also evidence that all encoded values from This makes me believe that the distinction between For my case, in the .NET space, I had to adapt the actual runtime types that exist to represent numbers, such as I would propose adding a Then, add another type If one needs to filter only positive values, then this would be the job of a constraint on the type, equivalent to being able to specify a valid domain for values or constants ( |
What if FQL only supported a single integer type, the same way there is only a single float type? When using the query language itself, not much would change. When using the Go API, I could provide this integer as either an
In the case of values, FQL currently supports "raw" values which are not encoded with the tuple layer. If you want to read an integer encoded using the tuple layer in the value you would write a query like this:
This is different from querying a "raw" value which would store the raw 64-bit integer in the value position. That kind of query looks like this:
I don't see a point for this "number" type. You can specify a variable which supports both types already by doing |
The issue could be interop with dynamic languages that have only a single type for numbers, like JavaScript where all numbers are 64-bit floats. This was a known issue at the time, but the Node.js binding did not exist at this point in time, so it was probably left unresolved in the spec. I'm not exactly sure how the maintainers of the Node.js binding as solved the issue of representation of out-of-range numbers? I'm not familiar with how Go handles the conversion from one type of number to the other (implicit? requires explicit casting? needs a method call?), but in .NET there are a lot of auto-casting, and in the .NET API I've handled this in two ways:
That's unfortunately a lot of cases to handle (the type conversion matrix can become big very fast!) but thankfully that's code you need to write once, and then forget about it). Side note 1: This is a good example of Python 2.x idiosyncrasies where a lot of layer code would encode (ASCII) strings using the 0x01 byte header (intended for bytes), instead of the 0x02 byte header (indented for strings), because in Python 2, bytes and ASCII strings would be the same type. I had to basically encode this behavior in .NET where, like Java, 0x01 is mapped to byte[] and 0x02 is mapped to strings, without overlap. This was required if I wanted to be able to decoded keys generated by python code, and vice versa. Side note 2: to be fair, I had the reverse issue in .NET where
I realize I did not consider the case of raw values, where here it makes a lot more sense to specify signed vs unsigned and size (for counters, etc...). Without a schema, you would not even be able to decode it anyway (and heuristics will always fail for some of the values) For keys, the scenario where you could use a generic "number" type (as in ℝ, the set of all real numbers) would be for indexes on a number that could be anything but is most usually an integer, and infrequently with a decimal part (and for this it could ".5", ".25"). The index could use the much more compact integer representation for actual integers and only pay the cost of 9 bytes when required. You would lose ordering, but not all number-based indexes have the concept of ranges. Then, if you want to go down a level and work with ℕ (set of positive integers) or ℤ (set of positive and negative integers), that's where you could make use of Side note 3: maybe that's why it looks weird for me, because having integers vs floats, sort of imply that integers (ℕ) are NOT real numbers (ℝ), which in maths is non-sense, but for us developers is somewhat "normal" ? :D. It seems more natural for me that ℝ would match all numbers, then ℤ would match only the part that are integers, and then ℕ only those that are positive (or non-negative, depending on if you include 0 or not). We are lucky to not have to deal with ℂ (complex numbers!) :) The type for "all kinds of numbers" (as in how it would exist in JavaScript) could indeed be constructed with <int|num|uint|....> but maybe an alias for "a number I don't care what kind, by opposition to a string" could be easier to use than have to list all of them, especially if additional types are added down the line. |
Looking at the node.js binding, I see this entry: https://github.com/josephg/fdb-tuple?tab=readme-ov-file#js-to-tuple-value-mapping
Seems like fun times, especially for folks that uses "64-bit integers as uuids" 😨 The encoder seems to check if a number is an actual integer or not: https://github.com/josephg/fdb-tuple/blob/master/lib/index.ts#L253-L281 It would be very easy for an application using the node.js binding to let a few floating points numbers "escape" in the keys, even if they only intended them to be only integers (and thus use |
Fair point. I have a math minor with my degree, so this makes sense to me. I think most programmers would prefer the traditional integer/float split though.
I think this can be solved by including custom types in the API. For instance, the Go API for FQL doesn't use the raw 'int' which Go provides. I define a custom type (AKA a "box") which contains an integer. This provides me with extra runtime information. I'd probably do the same for JS. |
BTW, based on these conversation I'm pretty sure I'm going to remove the |
They could still be useful for a raw value? I'm not sure if you have plans to be able to represent counters, which I use a lot in layers to maintain count of documents, statistics on indexes, or even as a simple change token that is watched or used for cache busting. I think if you had the ability to represent constraint on types (number of bytes, signed/unsigned, high-/low-endian, ...), PLUS the ability to create type aliases, then something like a "counter64" could be an alias on an integer + constraints. The one thing I've always been missing, is the ability to encode that the value is a JSON document, a counter updated with atomic_add, or some simple 64-bit integer or versionstamp. Bonus side-quest: being able to specify which .proto schema is used when encoding protobufs ! If you had just a set of "core" type: numbers (split in integer vs non integer maybe), strings (unicode), uuid, bytes, stamps, tuples and bools, + a constraint representation + creating type aliases, you'd be able to construct all the rest from that. You could ship with a set of predefined aliases (to get back 'int', 'uint', etc...) to make it easier.
I've seen other bindings use a similar concept, if one does not already exist in the language. I've used the same "Number" box concept in the context of JSON Numbers (JSON has the same issues with representing numbers, since it comes from the JavaScript side of the world...) |
I have a vague idea for a syntax component called "options". They can change the way a transaction, query, & variable behaves. I may have them apply to values as well. Transaction options control:
Query options control:
Variable options control:
Value options control:
So instead of handling these integer sizes at the type level, I may make them a value option. I don't know how the option syntax would look, but I'm thinking I could use a list of options after the object it's modifying. Something like this...
The option list |
OK. I gave this some more thought last night. The following is a first draft of my ideas: Options (maybe called Modifiers) are a collection of arguments which are included during object construction. You can think of them as parameters to a constructor, or as settings of the object. Options can modify transactions, queries, types, & elements. The options are specified as a list wrapped in braces ( Here is an example of an element being constructed with options:
The expression Types can also have options:
This specifies that when this type is used in a variable, the binary format is expected to be an unsigned 16-bit integer. I will also include pseudo types which are syntactical sugar for this kind of option. The type You can use this type in a variable like so:
Options can also be used to constrain a type:
Here, this instance of the Options can also be used to modify properties of queries. The main difference is that query options are specified on the line before the query itself:
The Options can also be used to modify transactions. I don't have the syntax for transaction defined yet, so I won't show examples, but the options for transactions would include things like read or write, byte limits, and max number of retries. Let me know what you think. |
The
The literal I'm unclear about what should be the default endianness. Parts of the FoundationDB API use high-endian (tuple encoding), while other uses little-endian (atomic_add). I'm not sure if the default should depend on where the variable is used, but I think that whatever the default will be, there will always be a common issue of forgetting to specify the endianness of the other. Typically a key that has both a counter in the key (high-endian) but is used to atomically increment a counter (which is high-endian) This could be fixed by defining a set of type aliases for : # id expected to be used in keys, little-endian by default
type id64 = int[u64]
# raw value expected to be updated with atomic_add
type counter64 = int[u64,he]
/path/to/...("counters", <id64:docId>) = <counter64:count> The modifiers could also encode things like In my experience, there's always some of these tags that will require a value, as in
Combined with the ability to use let USERS = 1
type userId = int[u64, label="UserId"]
type user = bytes[format=json, schema=user]
let ROLES = 2
type roleId = int[u64, label="RoleId"]
type role = bytes[format=json, schema=user]
let MESSAGES = 3
type msgId = versionstamp[label="MessageId"]
type msgPayload = bytes[format=protobuf, schema=messaage_v2.proto, param=something]
/path/to(USERS, <userId>) = <user>
/path/to(ROLES, <roleId>) = <role>
/path/to(MESSAGES, <msgId>) = <msgPayload> |
Yes, look at the query option example. I specify argument after a colon:
Exactly. That's the goal.
I'm hesitant to include JSON or Protobuf into the language. I'd rather have the programmer connect FQL to other tools like |
This could be done by convention, but for example the modifier From FQL point of view, this is simply another key/value in a list of modifiers, and could simply ignore it. But the a custom UI could use this modifier to select a better renderer for the value.
The common issue I have with JSON is that some documents can be very large, but in a condensed view, you frequently only want to see the most important fields, like the id / name / title, and use the |...| or expand button to show the full JSON. You can only go so far with heuristics that target common names like Same cases here, by convention, a UI could use a This is especially useful for binary formats that do not include the attribute names in the document (like JSON/XML do), because here a generic would have no way of rendering something useful, without access to the schema (.proto file, etc...). |
Ah OK. Yea I think we can leave room for custom options used by custom formatters or UIs. |
Have you already worked on writing a regular expression that would be able to match a FQL query? I'm looking for a The shell has several commands, a few of which will accept FQL queries as arguments, with a set of optional parameters. While the user is typing the FQL query itself, I want to validate and show in orange/red whenever the syntax is incorrect (immediate feedback), as well as start using the partially typed query to run the auto-completion tool in the background. note: The first part if outside the scope of FQL, I want to use the
From this point, I know that the next token is of type "FQL", so I can use the regexp to validate the rest, AND detect when this is complete (to continue parsing other types of tokens, like In the middle of writing a path:
Whenever a character updates the query, and it is validated, I can already see that, if you have typed "/path/to/" already, this is still incomplete, but would be a valid query on its own, so I could fetch the subdirectories of /path/to and propose them for auto-completion. The next is
Same here, whenever I see a We have finished the query, additional characters are not part of the FQL query itself:
As soon has I see the last |
FQL is a context-free language. Every regular language is context-free, but not the other way around, so there is a chance that FQL will not be parsable by RegEx. I have not tried. For this kind of validation, I'm planning on running my parser. If you combine many RegEx expressions with a state machine then you can definitely do what you're asking. This is how I implemented syntax highlighting for the language docs. |
I'm planning on implementing auto-complete via my parser as well. In normal mode, the parser returns an error if the expression stops before a valid end-state. In auto-complete mode, if the expression stops before a valid end-state then the parser will return possible options for continuing the expression. |
I guess I could re-execute the parser every time the expression is changed, and keep around the position of the last token that is not truncated. When in the process of typing So when the expression is still in progress, but we have only parsed up to I guess there are a few different cases that should be handled, like if you are in the middle of a variable ( Or when typing a string ( It is possible to hack the tuple encoding, to make it act like a "starts with" for strings or bytes. If you have the tuple The only tricky thing right now would be the API of the Directory Layer that probably only has a "List" method, but you could just filter the results of listing all the sub-directories. Though technically, implementing a "starts with" version of listing sub-directories is possible, since the children names are stored as tuples, so the same trick described above would work. |
Here is the rough draft of the FQL documentation. I still need to proofread it, rephrase certain things, etc. This document maps out what I plan to implement from now till summer 2025. Note that the grammar file has not yet been updated. Let me know what you think. |
Here is a list of notes while reading the draft:
|
Thanks. I'm gonna try to wrap up the manual this week. I will address these critiques. |
In American slang, "lil" means "little". But I'll use LE & BE instead so it's clearer to everyone. |
This is a good idea. I'll add it to my TODO list as a future optimization. |
Great idea! I've used tuples for a long time, and always wished there was a standard for representing them and querying them. Hopefully this could be the one :)
I apologize for the flood, they are notes I took when implementing a parser for fql in C#/.NET
I have a few remarks, coming from a long time user of tuples, mostly in the context of writing complex layers in .NET/C#, and not 100% familiar with go conventions.
int
anduint
? Is this a "go" thing, or is there an actual reason?-123
it is clear that it is not "uint", but what about123
? From my point of view it is both "int" and "uint"float
between 32-bit and 64-bit IEEE numbers?NaN
or infinities?{...}
or any other character.{xxxx-xxx...}
or"xxxx-xxxx-..."
. (or maybe it is a Windows thing?){...}
for uuids would make it non-ambiguous for the parser0x1234
, you first see the0
which could the be start of a number, a uuid, or a byte sequence.[ 1234 ]
or'\x12\x34'
(single quote, like it was in the python 2 tuple encoder) for bytes?0xFF
and0xFE
is ambiguous, because they are the byte prefix for the System subspace and Directory Layer, which are a single byte, where here I think they would be encoded as01 FF 00
and01 FE 00
instead?"こんにちは世界"
, high/low surrogates?\uXXXX
to encode any codepoint ?(...)
means "any tuple, empy or not? For ex, in(1, <int>, (...), <int>)
, does the middle part means "any tuple" ?("hello", 123, ..., <int>)
supported? This would help with "variable sized" tuples in some layers, where you still need to parse the last one or two parts of a tuple.There are a few types that I use frequently in tuples, and that are missing:
''
(two single quotes) to define them. What would we use here?nil
seems weird because it is different (for me) than the concept of "empy". Maybe add<empty>
or<none>
for values?stamp
orversionstamp
orvs
type in variables? ex:(1, <stamp>, ...)
0x32
and0x33
in the tuple encoding, followed by 10 or 12 bytes.Directory
(0xFE) andSystem
(0xFF) prefix, which are useful in tuples that have to query the system space, or inside the Directory Layer (each nested partition adds another 0xFE to the bytes).(0xFE, "hello")
which for me reads as "the key 'hello' in the top-level Directory Layer" and encoded asFE 02 h e l l o 00
, where I guess here it would be encoded as01 FE 00 02 h e l l o 00
which is not the same.xFF/metadataVersion
or other system keys? They usually don't use the tuple encoding for the keys.uint
vsint
distinction could be emulated with a "must be positive" or "must be exactly 64-bits" constraintRegarding directories:
15 2A
(==(42, )
), all keys will have this prefix, even sub-directories of this partition./foo/bar
, then ALL keys would start with/foo/bar/...
so in practice we represent them without the prefix, so something like.../my/dir
(similar to a webapp that could be hosted under any path)...
means "zero or more" in the tuples, maybe use./my/dir
or~/my/dir
to represent "from the root defined for this application" ?/foo/<string>/bar
or/foo/<uuid>/bar
?On top of querying, I see this as very useful to encoded the "schema" of a layer somehow, so that a UI could automatically decode data into an arbitrary subspace (using the optional Layer Id in directories).
For example, I used the following format to define the schema of a custom layer, like a typical "table with index + change log" mini layer:
(..., <metadata_key>) = <metadata_value>
(..., 0, <doc_id>) = <json_bytes>
(..., 1, <index_id>, <value>, <doc_id>) = ''
(..., 2, <index_id>, <value>) = <counter>
(..., 3, <version_stamp>, <doc_id>) = <change_event>
Legend:
...
means the prefix of the Directory where this layer is stored.<name>
was a placeholder for a type of data, but the type was not specifiedI think this could be adapted to use fql as the syntax, but this would required adding the support of named variables:
<foo:int>
or<int:foo>
would definefoo
to be a variable of typeint
<foo:any>
/<any:foo>
or<foo:>
/<:foo>
<uint32>
/<uint64>
?<uint:32>
/<uint:64>
?<uint,32>
/<uint,64>
?The above could become:
~/(<string:metadata_key>) = <any:metadata_value>
~/(0, <uint:index_id>, <uint:doc_id>) = <bytes:json>
~/(1, <uint:index_id>, <int|string|bytes:value>, <uint:doc_id>) = <empty>
~/(2, <uint:index_id>, <int|string|bytes:value>) = <uint64>
~/(3, <stamp:timestamp>, <uint:doc_id>) = <bytes:delta>
The text was updated successfully, but these errors were encountered: