Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration File Schema Validation #281

Open
NuSkooler opened this issue May 17, 2020 · 31 comments
Open

Configuration File Schema Validation #281

NuSkooler opened this issue May 17, 2020 · 31 comments
Labels
enhancement feedback requested Requesting feedback on design, etc.

Comments

@NuSkooler
Copy link
Owner

NuSkooler commented May 17, 2020

It may be nice to have very basic schema validation for configuration files (config.hjson, menu.hjson, etc.).

  • Help users identify issues with their configurations
  • Ideally something very light
  • Must allow for additional entries/sections that are not known to the schema

Configuration files are in HJSON, but this directly converts to JSON internally. It may be enough to work with the JSON.

JSON schema validation resources:

See also: #280

@NuSkooler NuSkooler added the feedback requested Requesting feedback on design, etc. label May 17, 2020
@louisnorthmore
Copy link
Contributor

This would be useful - would have saved me some time yesterday!

@louisnorthmore
Copy link
Contributor

louisnorthmore commented Sep 3, 2020

I might take a look at this later. @NuSkooler How do you see it working? A separate tool to begin with?

@NuSkooler
Copy link
Owner Author

@louisnorthmore I haven't put a ton of detailed thought into this, but something like:

  • When the configuration is (re)loaded do quick validation
  • Possibly an explicit ./oputil.js config validate if that can provide additional benefits.

I like the idea of veirfy-json since it's light weight and allows custom validators.

@NuSkooler
Copy link
Owner Author

At this point if it hasn't been picked up yet, I change my stance and suggest JSON schema as it's been the clear winner around JSON... schemas.

Upon validation error(s), writing to stderr and to the log (if possible) should be enough when starting up or hot reloading the config.

@cognitivegears
Copy link
Collaborator

Agreed with the overall approach, and this would be fantastic. I know when I was getting started having a broken config was my biggest struggle. The only one I'm not sure about is the last bullet point: "Must allow for additional entries/sections that are not known to the schema" - this would somewhat limit some of the usefulness I think? If someone is trying to debug why a configuration item is not taking effect, etc, this would be easier if there was at least the option to restrict to known elements.

@NuSkooler
Copy link
Owner Author

@cognitivegears I think we can validate things that are in the wrong place, and specifics of all known configuration elements, but consider e.g. a custom mod that is not shipped with enig, but wants some configuration from config.hjson. If the user adds a new block in the right area of the config, we won't have a way to really validate the inners. Unless of course we expose a way for mods to supply their schemas -- which feels a little gatekeeping as most people won't know what to do there.

In JSON schema at least, this probably mostly around allowing additional properties in certain blocks.

@cognitivegears
Copy link
Collaborator

Well, we could validate something that is missing, but not something that is optional but misnamed. And since there are a lot of optional properties in the enigma conf I feel this could be a problem. Not the end of the world for sure, but how about a slightly different approach:

Add an optional "mods" property at each level containing an array of objects of "any" type. That way, mods are free to add additional properties, they just have to put it under "mods". So like:

matrix: {
    desc: Login Matrix
    art: matrix
    mods: [
        mod1: [
             extendedDesc: This is a longer description for this mod. Or something.
             anotherProperty: true
        ]
        mod2SingleValue: 125
        // ...
    ]
    // ....
}

Since it is validated with the "any" type, it doesn't care or validate what is under the mod (if the mod wanted to validate the values itself with it's own schema it could, but doesn't have to.)

Still "wild west", just a little more "wild west, but stay in your sandboxes." :) Then we could validate everything else.

@NuSkooler
Copy link
Owner Author

In menu.hjson, like the "matrix" example above, we can validate the full allowed syntax. I don't think that should be much of a issue. There are some cases such as config blocks within that scope that allow arbitrary per-mod keys, but they are nested enough I think we can take care of them.

Here is what I'm thinking (this example with config.hjson) just to make sure we're on the same page:

// config.hjson
{
  general: {
    // we can validate all this regular type stuff easily
  }

  // example for 'my_mod' wanting config.hjson configuration:
  my_mod: { // we can validate that any "additional" non-official keys here are Objects
    // anything JSON/HJSON is fine here & not validated by enigma itself
  }
}

Another thing we can do, is expose some API(s) that mods can use to easily do their own validation if they want to opt-in.

// menu.hjson
my_menu: {
  module: fancy_mod
  config: {
    cls: true // validatable as bool
    // ...
    customKeyOne: "stuff", // fancy_mod could check this perhaps, otherwise we just let it through as an "additional property"
  }
}

@cognitivegears
Copy link
Collaborator

That looks great! Sounds like a good approach to me.

@NuSkooler
Copy link
Owner Author

I suppose another approach for config.hjson at least, would be to just ban anything other than enigma core configuration. Mods could get their own easy to use API for .hjson, or they can use whatever else configuration format/etc. they want.

@cognitivegears
Copy link
Collaborator

Yeah I really like that approach for the main config. This sounds super interesting hope we can do this soon

@crhultay
Copy link
Contributor

crhultay commented Nov 24, 2024

I suppose another approach for config.hjson at least, would be to just ban anything other than enigma core configuration. Mods could get their own easy to use API for .hjson, or they can use whatever else configuration format/etc. they want.

I think it makes sense to separate enigma config from mods, etc. It allows us to be more strict about what is in the configuration, and will guide the SysOp to "do the right thing". If they break it, the system will pat them on the back and tell them it's okay and make it better rather than going through frustration or feeling like they can't do it.

I can imagine we would want to be able to:

./oputil.js config verify

${path_to_config.hjson} ${n} validation errors

Missing Configuration Fields
> servers.ssh.port
 > servers.ssh.pem_password

Would you like to fix it now? [Y/n]

This is an extension of what @NuSkooler was saying early on in this issue.

When ./main.js is run and a configuration validation error is found, we can tell the user to run this new oputil function.

It will also solidify configuration and allow us to execute "Pre-Flight Checks" in which the system does integrity checks on things like configuration, whether configured ports are in use or not. See Issue #547 which asks for used port detection, as it will currently silently fail to use the specified port if already in use. I think that wants to be a priority.

In order to detect if ports are open, we need a very clear and robust way of knowing which ports to check, and whether that port is actually going to be used prior to considering it as important and doing a startup blocking port check.

@NuSkooler To move forward, we need a succinct list of all possible configuration values are; can I count on the default config.hjson containing every use case? If yes, I will proceed on writing this as my next opportunity to get familiar.

@crhultay
Copy link
Contributor

@NuSkooler (bump) This will be my next task, please comment on the above.

@NuSkooler
Copy link
Owner Author

I would love this to be worked on for sure. The official JSON Schema makes sense to use for a config schema. The less impactful approach is to probably like in #281 (comment) and we expose some API(s) for custom mods to do self-validation if they want to use a config block within; if they are using their own fully custom configuration files/etc., then they just do whatever they like.

@crhultay
Copy link
Contributor

I suggest that the custom mods have their own configuration outside of the scope of config.hjson.

Without me having looked deeply at the configuration logic, do we presently expect / allow mods to extend the config.hjson?

In practice, additions to config.hjson would be ignored; however I also think that modification to the original configuration be dissuaded with some sort of warning output during startup such as:

WARNING: Spooky action seen in config.hjson in section 'babyBackRibsMod', mods and extensions to Enigma1/2 should be stored within their own configuration schema.

@NuSkooler
Copy link
Owner Author

Any of the "built" in mods are/can be using the systems config values. I don't believe any custom mods to be adding anything to it, but certainly reach in for config (e.g. I want the board name).

Perhaps an API could easily allow enig/HJSON style custom mod config that is in config/mods/mymod.hjson or whatever with the mod specifying it's schema/validation. This could certainly be left for a latter effort though, since separating them would allow focus on the core config.

@crhultay
Copy link
Contributor

Just to clarify that I'm understanding correctly:

  1. Mods don't currently interact with config.hjson except to read values
  2. I can depend on the current structure of config.hjson to be "the truth" as a basis of making the schema rigid, and add validations?

@NuSkooler
Copy link
Owner Author

Just to clarify that I'm understanding correctly:

1. Mods don't currently interact with config.hjson except to read values

2. I can depend on the current structure of config.hjson to be "the truth" as a basis of making the schema rigid, and add validations?

(1) is correct.

(2) is a bit tricky: config_default.js, the JS representation is probably the closest, but the output config.hjson has some additional values that aren't present as defaults. Making things a bit worse, various bits of code read into sections that may not be present in either (e.g. check for a override or supplemental value)

Rounding up the stuff is probably the hardest bit currently. Searching for Config() can help find a lot of the extra code, but it's going to be a PITA I fear.

@crhultay
Copy link
Contributor

Then let's do this:

  1. Add all missing element from a first-time generated config.hjson to config_default.js
  2. Write a validator to use that as the jump-off point
  3. Raise a secondary PR which will be to painstakingly find all Config() elements and push them
  4. Add an internal validator that calls you out for trying to write a Config() value not found in config_default.js, and that you have to add it there first or face dire consequences like the validator sending an SMS to your significant other to tell them they're fat.

@NuSkooler
Copy link
Owner Author

That all sounds very reasonable. There are some tools out there that can help a bit in taking an existing document and inferring a schema -- can take care of a good chunk of the busy work.

like the validator sending an SMS to your significant other to tell them they're fat.

Compliance via fear and loathing, I love it!

@crhultay
Copy link
Contributor

Using https://www.npmjs.com/package/to-json-schema, I was unable to convert to a schema I'M GUESSING because of how loginServers.ssh.algorithms is interpreted.

This is thrown by the to-json-schema library:

/home/egonis/enigma-bbs/node_modules/to-json-schema/lib/helpers.js:40
      if (schema1.type === 'array' && schema2.type === 'array') {

@NuSkooler
Copy link
Owner Author

here is a schema generated by converting my config to json and using a online tool... it may be a OK start. Some things I noticed skimming are some enumerations from entries in things like my file areas, or _snips is a reusable section. Honestly my config is a mess since it's been built on over such a long time 😅

JSON generated with:
./oputil.js config cat | sed -r "s/\x1B\[([0-9]{1,3}(;[0-9]{1,2};?)?)?[mGK]//g"| hjson -json because --no-color wasn't working.

@crhultay
Copy link
Contributor

crhultay commented Nov 29, 2024

When I run ./oputil.js config cat --no-colors --no-comments, it works.

There is a typo between '--no-color' and '--no-colors' in the cat config.

Raised #583 to fix this.

@crhultay
Copy link
Contributor

crhultay commented Nov 29, 2024

@NuSkooler I am writing a new function ./oputil.js config harden which will read the current configuration, and will output a schema.

Writing this as a function will allow us to call this function on-demand, making it trivial to update the schema.

I ran into a problem, however that you likely know the answer to.
HJSON stringified is missing commas, therefore it cannot be loaded as a JSON object to be converted into a schema using the npm package to-json-schema.

Do you know what I could do to solve this?

function hardenCurrentConfig() {
    // Export Configuration without any flourishes
    const config = hjson.rt.parse(fs.readFileSync(getConfigPath(), 'utf8'));
    const hjsonOpts = Object.assign({}, HJSONStringifyCommonOpts, {
        colors: false,
        keepWsc: false,
        quotes: "always",
    });
    const stringConfiguration = hjson.stringify(config, hjsonOpts);

    // Load JSON Object
    const jsonConfiguration = JSON.parse(stringConfiguration);

    // NOTE: This fails, because the stringConfiguration has omitted commas from each line, e.g.
    // {
    //     general: {
    //         boardName: "My Awesome BBS" <-- missing comma
    //         prettyBoardName: "|08XXXXX"
    //         telnetHostname: "xibalba.l33t.codes:44510"
    //         sshHostname: "xibalba.l33t.codes:44511"
    //         ...

    // Convert to Schema
    // const schema = toJsonSchema(stringConfiguration);

    // Write to Schema File
    // ...
}

@NuSkooler
Copy link
Owner Author

@crhultay I don't think you're going to be able to infer the schema. We don't have the tagging/etc. required to do so in the config. The example I put above could be used as a start, though.

Instead of stringifying and re-parsing, just use the object: The HJSON parse() returns a JSON object, you can feed that to whatever you like.

@crhultay
Copy link
Contributor

crhultay commented Nov 30, 2024

@crhultay I don't think you're going to be able to infer the schema. We don't have the tagging/etc. required to do so in the config. The example I put above could be used as a start, though.

Instead of stringifying and re-parsing, just use the object: The HJSON parse() returns a JSON object, you can feed that to whatever you like.

Here's my line of thinking:
config_default.in.hjson would be the file that hardenConfiguration() would be able to on-the-fly conversion to a schema.

As a developer, you will be modifying the config template to add new features / etc (config_default.in.hjson) and not worry about updating another source file for validation startup routes. It prevents us from making an omission in future features.

The user now runs ./oputil.js config new and hardenConfguration() is executed as a part of the new config steps (which I hope to flourish with more options soon).

To your point about it missing the tagging, could I ask you for an example of what this should look like?

If we put in the effort of tagging in config_default.in.hjson, then it makes good habits when defining our config parameters in the template, and prevents us from forgetting to update stuff elsewhere (as much).

As a side bonus, the hardened schema is user-generated by running config new, so it remains backwards compatible for people running git pull on this repo, and we don't break their system since it won't validate newly introduced fields by default, since the schema reference was user-generated.

This brings us to the opportunity:
A SysOp runs git pull, runs ./oputil.js config upgrade and it is able to infer the schema from /misc/config_default.in.hjson from the new repo and compare it with config/schema.json and then ask questions for each of the missing elements, and then updates.

Do you think this could be of value, or should I put the crack pipe down?

@NuSkooler
Copy link
Owner Author

@crhultay I guess I'm missing part of what you're suggesting. The config_default.in.hjson template file really only provides some guidance to new users by showing some things they can override, or otherwise provide to the config. The closest to the source of truth is the config_default.js. Neither really have a nice way to mark fields for validation.

A JSON schema would provide a schema to the entire config, and has all the bells and whistles to state and validate. That would easily give us e.g. oputil.js config check, slurping up the entire config and ensuring it's sane.

Sometimes more than a config needs to happen for that, so we need a from-to migrate type system. I think that's a related but separate concern though.

@crhultay
Copy link
Contributor

@NuSkooler Then maybe I'm mistaken, maybe I was thinking of config_default.js.

So am I right in thinking that we need to create the schema from the ground up then?

I was hoping we could use config_default.js / config_default.in.hjson as the baseline and could double it as a validation source.

@crhultay
Copy link
Contributor

crhultay commented Dec 3, 2024

So I've given this some thought, and a bit of research as a guy that's never done JSON Schemas, and here is my new line of thinking:

Create a new dir misc/config/schema, and within it write a schema file for each major property found within the configuration.

Sample Directory Structure
image

Write a property file, e.g. loginServers/ssh.schema.json based on Introduction to JSON Schema which would look like:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://github.com/NuSkooler/enigma-bbs/tree/master/config/schema/loginServers/ssh.schema.json",
  "title": "ssh",
  "description": "SSH Login Server Configuration",
  "type": "object",
  "properties": {
    "port": {
      "description": "TCP Port",
      "type": "integer",
      "exclusiveMinimum": 0,
    },
    "enabled": {
      "description": "Enable SSH",
      "type": "boolean",
    },
    ...
  },
  "required": [ "port", "enabled", ... ]
}

While this is going to be painstaking, it will give us the validation features that we need.

@NuSkooler How do you feel about this?

@NuSkooler
Copy link
Owner Author

@crhultay That's basically what I'm thinking. I'm not sure if it should be broken down that much, however. IIRC a lot of tooling doesn't support inclusions, so having the entire config in a single schema file is probably desirable.

If you look at the inferred schema I posted, you'll see some similarities to the fragment you posted. IMO it would be a good start (I could take a stab if you like), but will need a lot of touch ups.

It's going to be a bit painstaking no matter what, I think.

@crhultay
Copy link
Contributor

crhultay commented Dec 3, 2024

@NuSkooler Okay, so day one I will scaffold out a config.schema.json with the elements that I care about.

After that, I will write config_validate.js which will give user feedback if they are out of spec.
Then, I will write a pre_flight.js which will perform checks as to whether ports are in use already, and will give user feedback; this will address a concern from #547 and will allow us to close this issue.

Finally, once we have fleshed out full schema validation we can enhance oputil.js config new so that all questions are asked, and even SSH Key auto-generation occurs and enables SSH on-the-fly, which will make it easy for new SysOps.

As an aside, I sent you a reply on Xibalba a few days ago and deleted our old messages from my inbox. When I go to 'e' and then 'i' to check my messages from the main menu, it displays an empty template and then jumps to the previous screen. I feel like this wants to have a new screen that displays 'Send a letter to get a letter' sort of message, otherwise it feels like it's broken to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feedback requested Requesting feedback on design, etc.
Projects
None yet
Development

No branches or pull requests

4 participants