Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contemplating on String #14

Open
happy-barney opened this issue Jun 4, 2023 · 6 comments
Open

Contemplating on String #14

happy-barney opened this issue Jun 4, 2023 · 6 comments

Comments

@happy-barney
Copy link
Collaborator

Motivation / Goal

Nowadays common usage of Perl programs is some kind of backend. There it usually needs 3 types of checks (better word here will be contract)

  • description of API I/O (mostly checks corresponding with JSON Schema (openapi) or XML Schema (XML over HTTP, XML/RPC, SOAP)
  • description of internal representation
  • description of storage representation (usually SQL)

It will be nice to have Perl checks / contracts specified that way so it will be possible to generate external descriptions directly from Perl definition.

Example: (syntax symbolic)

# declare Bar => String [ min_length => 3, max_length => 16 ];
sub operation_handler :returns (Bar) { ... }

say Bar->to_openapi;
# - <...>
#   - type: string
#   - max-length: 16
#   - min-length: 3

say Bar->to_xsd;
# <xs:simpleType>
#   <xs:restriction base="xs:string">
#    <xs:maxLength value="16"/>
#    <xs:minLength value="3"/>
#   </xs:restriction>
# </xs:simpleType>

String variants

restrictions

Typical String restrictions (I like more XML schema's word facet) are

  • min-length
  • max-length
  • pattern

There restrictions are supported by both JSON and XML schema as well (though they don't support perl regex).

It will be nice to support named restrictions, eg:

Str [ min_length => 10 ];
Str [ min_length (10) ];
Str :min_length (10);

binary vs text

It will be nice to be able to declare whether value is generic binary string or text string, eg:

  • Str - string treated as utf-8
  • Binary - generic binary string

XML schema

  • supported by dedicated type base64Binary

JSON schema

  • supported by string type property contentEncoding: base64
  • supports also content-type

It will be nice to be able to specify context encoding and related implicit coercions to/from internal encoding:

Binary :encoding (base64);
Binary :encoding (uuencode);
Binary :encoding (deflate);
Str :encoding (Latin-2);

documentation

It will be nice to be able to specify some description of check, eg:

Str :abstract (This is abstract);
Str :abstract_uri (https://...)

common derived checks (subtypes)

URI

XML schema

  • built-in type anyURI

JSON schema

  • string with format, one of
    • uri
    • uri-reference
    • iri
    • iri-reference

Although it is easy to write subcheck using pattern restriction, it will be IMHO handy to provide built-in checks:

  • URI
    • URL
    • URN

Date / time

XML schema

  • date
  • dateTime
  • duration
  • gDay
  • gMonth
  • gMonthDay
  • gYear
  • gYearMonth
  • time

JSON schema

  • date-time
  • date
  • time
  • duration

It will be nice to provide also date/time related checks with possible encodings

  • strict ISO 8601
  • relaxed variant allowing space as date-time separator (default?)
  • misc national format

Value represented by these checks may be dual valued, once there will good enough implementation of datetime object.

other useful checks

  • UUID (JSON schema: uuid)
  • Identifier (XML schema: token / ID / Name)
@Ovid
Copy link
Collaborator

Ovid commented Jun 4, 2023

I think introspection would be awesome. We'll definitely want to think of something like that post-MVP, but it has MVP impacts we should be aware of now. You wrote:

# declare Bar => String [ min_length => 3, max_length => 16 ];
sub operation_handler :returns (Bar) { ... }

say Bar->to_openapi;

Bar is not a sub in your namespace (we have tons of checks, so exporting them as subroutines would be disastrous).

So we would need something that could introspect a check to get the data you want. However, it would not produce XSD, OpenAPI definitions, or anything like that. Instead, it would just return a data structure, or an AST, and the consumer can write the custom transformation code they want.

@tobyink has considered similar ideas in this discussion, but that was when I was considering releasing Data::Checks as a module.

@happy-barney
Copy link
Collaborator Author

please note "syntax symbolic" about that example. Mentioned code should represent only description of behaviour, not an actual code.
I'm aware of fact that also this should be somehow pluggable, there are tons of other I/O protocols (known or unknown yet)

@tobyink
Copy link
Member

tobyink commented Jul 11, 2023

For what it's worth, Types::XSD supports all of the above, and I definitely plan on adding some kind of methods to "export" Type::Tiny types as Data::Check checks. Somehow.

@happy-barney
Copy link
Collaborator Author

@tobyink I know I tried to use it :-)

IMHO I will be good exercise to write these specialized checks using Data::Check syntax

@duncand
Copy link

duncand commented Jul 18, 2023

I feel it is important for Perl to internally track the intention of whether a string is octets/raw or text/characters, much as we now as of 5.36.0 have it track whether or not a scalar is intended to be a boolean or not. I realize for legacy compatibility reasons that we'd likely need at least 3 options in the general case, which are definitely text, definitely raw, and don't-know, and it would be nice to be able to eliminate occurrences of the last one where possible. But any time for example an IO is done with an explicit encoding, we should know for sure, if it is raw the result is known octets and if it is eg UTF-8 or lots of others, it is characters. Also any strings derived from string literals in Perl source are also definitely text. So then combined with this internal notion we should have routines like builtin::is_text() and builtin::is_raw() or such as the reliable way for a program to assess what kind of thing it was given in a general context, similar to the existing builtin::is_bool() or whatever, and one can stop testing the utf8 flag or doing encoding tests etc to determine this. The encoding tests would still be relevant but in a different context, which is to take a string internally considered raw and convert it to text if applicable, say if we want that conversion to be a separate step than the actual IO.

@zmughal
Copy link
Contributor

zmughal commented Jul 18, 2023

Pydantic https://docs.pydantic.dev/latest/why/#json-schema also has similar ideas with respect to JSON Schema / OpenAPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants