Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are you planning to add TypeCompiler.Compile just like typebox, to achieve high performance? #934

Open
cit-gif opened this issue Nov 19, 2024 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@cit-gif
Copy link

cit-gif commented Nov 19, 2024

Example: https://github.com/sinclairzx81/typebox?tab=readme-ov-file#typecompiler

@fabian-hiller
Copy link
Owner

I have similar ideas with a build time compiler in mind, but at the moment it is not feasible in terms of time. I can see that TypeBox and others are much faster, but Valibot is still not slow and probably not the bottleneck if your application has performance problems. It is also important to mention that the speed of most of these high performance schema libraries is not "free". There are drawbacks. For example, many of them only check the properties of an object you define and return the original object. This could be a security problem if you put this object in your database, as it could contain malicious data. When I have more time, I will probably write a blog post about the performance of schema libraries.

@fabian-hiller fabian-hiller self-assigned this Nov 21, 2024
@fabian-hiller fabian-hiller added the question Further information is requested label Nov 21, 2024
@sinclairzx81
Copy link

There are drawbacks. For example, many of them only check the properties of an object you define and return the original object. This could be a security problem if you put this object in your database, as it could contain malicious data. When I have more time, I will probably write a blog post about the performance of schema libraries.

FYI, there isn't a need to modify an object if the schematic outright prohibits additional data (additionalProperties: false). This is actually better for servers / databases as they can flat out reject malformed data (meaning a sender is mandated by the schematic to send correct data). In this regard, no provisions are made to transform a senders payload to match a target schematic. This is the principle applied in high performance validators.

Extra value processing (on top of Check), be it culling additional properties, value coercion, transforms or other kinds of processing is where bottlenecks are going to occur. TypeBox supports uncompiled versions of these, but the jury is still out on the most efficient implementation for each kind of operation. This said, Ajv still stands as one of the best integrated implementations for extra value processing. If you're looking to publish blog material on the topic of high performance validation + additional operations, Ajv would be a good thing to research.

Anyway, found this issue while browsing for TypeBox issues, and thought I'd leave a comment. Food for thought.
Cheers
S

@fabian-hiller
Copy link
Owner

Hey Haydn, I appreciate you taking the time to leave a comment.

FYI, there isn't a need to modify an object if the schematic outright prohibits additional data (additionalProperties: false)

I agree. Valibot currently follows a different philosophy and parses complex data structures that could include transformations (creating a copy) to not unintentional manipulate the input data. However, changing this behaviour and only "parsing" objects if necessary is not an option for Valibot in terms of bundle size as it would require additional code. In the long run, if we offer a compiler, we can change this depending on the schema provided and priorities for performance.

This is actually better for servers / databases as they can flat out reject malformed data (meaning a sender is mandated by the schematic to send correct data).

I didn't fully understand the advantage. Can you explain this in more detail?

Extra value processing (on top of Check), be it culling additional properties, value coercion, transforms or other kinds of processing is where bottlenecks are going to occur.

That's true. I see a lot of benefit in giving our users such powerful tools like transformations. That's why I prioritized functionality over maximum performance here.

If you're looking to publish blog material on the topic of high performance validation + additional operations, Ajv would be a good thing to research.

Thank you for the tip! I will keep that in mind.

@sinclairzx81
Copy link

sinclairzx81 commented Nov 23, 2024

Hi @fabian-hiller

Thanks for the reply (I had been searching for reported TypeBox issues following a recent update, and happened across this issue, thought I'd leave a comment :)). I can try answer the above question, but usually prefer to go into detail on such things as it helps to explain why an approach might be preferred within a broader context, ill try my best below.

Validate, Don't Parse

This is actually better for servers / databases as they can flat out reject malformed data (meaning a sender is mandated by the schematic to send correct data).

I didn't fully understand the advantage. Can you explain this in more detail?

So, what I mean here is rather than having the validator try to accommodate for data that is mismatched to the target schematic (i.e. Transform, Truncate, Coerce, Initialize (defaults), etc), the validators role is reduced to simply checking if the data is mismatched. This is going to be faster (as noted above), but it also pushes back to a Sender to adhere to the contract (Type) expected by a Receiver.

Here is a more concrete example.

class Publisher<Type extends TSchema> {
   constructor(private readonly type: Type) { ... }
   
   public Publish(value: Static<Type>) {
      // There is no potential for a value to be transformed before 
      // Send. The caller of Publish must pass a correct value. The 
      // call to Check (non-mutable) is an assurance to the Publisher 
      // that the value is correct with no additional processing.
      
      if(Value.Check(this.type, value)) return Send(value)
      
      // Fail
      throw Error('invalid value')
   }
}

The responsibility of transforming the value now rests with the Sender (exterior to the Receiver)

const publisher = new Publisher(Type.String())

publisher.Publish('hello')                // ok

publisher.Publish(12345)                  // error

publisher.Publish((12345).toString())     // explicit coercion | transformation

The principle here is that by having a validator only validate (not parse), it forces a clear distinction between the roles of the Sender, Receiver and Validator. The implementation effectively removes all guess work as to whether the value will be transformed in-flight by the validator (a good thing if the Receiver happens to be a Database) ... and the Receiver is always going to be fast (another good thing)

Of course, a possible downside one might consider with this approach is that Senders are now required to perform explicit transformations "before" sending values (i.e. (12345).toString() as above). However a Sender is still free to implement transforms in other ways unknown to the Receiver (where we move the performance cost of transformation to the Sender side)

// publisher.Publish( (12345).toString() )

publisher.Publish( Value.Convert(Type.String(), 12345) ) // Make the Sender Pay the Cost

This is the principle taken by TypeBox where any operation that may implicate performance is made opt in and explicit. By breaking down operations to their constituents (rather than embedding operations like transform within a validator) it becomes very clear what component of a system is performing what operation, and generally results in less guess work overall.

It should always be clear where transformations are taking place (imo)


So in saying all this, TypeBox does actually have a Parse function ...

const T = Value.Parse(Type.String(), 'hello')

.. And bi-directional codecs.

const T = Type.Transform(Type.String({ format: 'date-time' }))
   .Decode(value => new Date(value)) // Safe because validation checks Iso format
   .Encode(value => value.toISOString()) // Safe because typescript asserts as Date

It's just that in most implementations I work on, I've tended to avoid using these functions due to the overhead and complexity they can introduce (although Transform/Decode does see some usage in Server work). TypeBox got these functions (albeit somewhat reluctantly) due to "Parse, don't Validate" causing some expectations for library authors to adopt these principles, but I've rarely needed to use them in practice.

Hope this brings some insight!
Cheers
S

@cit-gif
Copy link
Author

cit-gif commented Dec 5, 2024

Thank you, @fabian-hiller and @sinclairzx81! It's such a pleasure that the authors of the two libraries I’m interested in have responded to my question.
After exploring the approaches of several libraries:

On the backend, I’ve decided to use Typebox for validations requiring strict data type correctness to achieve maximum performance. For more complex validation logic, I’ll implement custom checks.
As for Valibot, I’ll use it for certain asynchronous validations and some straightforward business logic integrations.

On the frontend, I plan to use Valibot exclusively to take advantage of its quick customization options and its small footprint.

Once again thank you both 👍

@fabian-hiller
Copy link
Owner

Thank you @sinclairzx81 for sharing your philosophy. It totally makes sense now. I really enjoy such conversations! I will keep your words in mind for future research and decisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants