Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please allow types re-definition (custom mapping) #33

Open
Hubbitus opened this issue Dec 14, 2021 · 19 comments · Fixed by #34 · May be fixed by #47
Open

Please allow types re-definition (custom mapping) #33

Hubbitus opened this issue Dec 14, 2021 · 19 comments · Fixed by #34 · May be fixed by #47

Comments

@Hubbitus
Copy link
Contributor

Hubbitus commented Dec 14, 2021

Now most interesting for me to define optional (nullable fields), so, instead of just:

"type": "string",

Allow to set something like:

"type" : [ "null", "string" ],
"default": null

But there are also more interesting cases when e.g. instead of just int I would like to use logicalType date.

Please allow providing some functional interface to override fixed mechanism by providing an array of mappers, or just callable.

@nick-zh
Copy link
Member

nick-zh commented Dec 14, 2021

Thanks for the input, I'll look into it ✌️
Do you need this in the merger or generator context?

@nick-zh
Copy link
Member

nick-zh commented Dec 15, 2021

So the generator is still a bit experimental, so i think the following things are missing right now (you pointed out most of them):

  • union types
  • defaults
  • logical type
  • doc comment

IMHO it would make sense to have doc comments to be able to define them,
what do you think? I would propose:

  • @avro-type (to not calculate the type but force one)
  • @avro-default
  • @avro-logical-type
  • @avro-doc

In terms of types, i think unions are not properly supported yet, so i will enhance that.
I think i will do it with #26, since it will make some of these things easier

Additionally, i agree that it would be nice if one could use an own Parser that complies with a newly introduced ParserInterface

@Hubbitus
Copy link
Contributor Author

Hubbitus commented Dec 15, 2021

Do you need this in the merger or generator context?

I primarily use generators.

And honestly, I need generation schema by a particular object, not for all classes.

@Hubbitus
Copy link
Contributor Author

Hubbitus commented Dec 15, 2021

IMHO it would make sense to have doc comments to be able to define them,
what do you think? I would propose:

Good proposal. That may be useful. Meantime my intention was to provide some schema generation configuration, instead of changing classes.

In my use case, I need to generate AVRO schema for PHP generated classes (PIMCore). So, generally, I have no control over classes themselves but want to change generated schema. E.g. take PHP doc field comment into AVRO doc.

@nick-zh
Copy link
Member

nick-zh commented Dec 15, 2021

Thx for the insight, i will factor this in ✌️

@nick-zh
Copy link
Member

nick-zh commented Dec 18, 2021

So i've created a pre-release for this new functionality. I still need to do some more testing, but if you want you can already check it out.
Basically you are able to write your own ClassPropertyParserInterface which i think is what you want.
Additionally you need your own AppContainer and console file, so you can register your own property parser.
If you do it directly in PHP, you can check the readme or the example folder ✌️

@nick-zh nick-zh linked a pull request Dec 18, 2021 that will close this issue
@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 5, 2022

@nick-zh, thank you very much! It looks very promising!

@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 6, 2022

The concept itself looks good. Thank you!
But could you please make some enhancements:

  1. In ClassPropertyParser::parseProperty($property) provide some context. E.g. class file, or just pass the reference to ClassParser caller?
  2. Allow skipping such property in the scheme (some hidden, implementation details, transient). For example by throwing something like SkipPropertyException, or simply return null or false?

@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 7, 2022

  1. Why PhpClassPropertyInterface::NO_DEFAULT = 'there-was-no-default-set' has so strange string value, used then in PhpClassProperty constructor by default?

I would like to suggest making the default value null by default.

@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 7, 2022

  1. Why PhpClassConverter::getConvertedProperties silently skips properties which type is not string??
    What if I already return there array like ['null', 'string'] from an one of the ClassPropertyParser::parseProperty($property) descendant implementation?
    Instead of skipping I would like to just return it as is. What you think?

@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 7, 2022

  1. PhpClassConverter is one or the converter implementation and intended to be sub-classed, is not? Why most methods has private modifier and not at least protected?

@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 7, 2022

  1. PhpClassConverter::getConvertedUnionType logic looks strange:
        if (0 !== count($convertedUnionType) && [] !== $arrayType) {
            $convertedUnionType[] = $arrayType;
        } elseif (0 === count($convertedUnionType) && [] !== $arrayType) {
            return $arrayType;
        }

Why if $arrayType is empty array we return it? Looks like a mistake. And all else branch must be ommited.

@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 7, 2022

  1. You hardcode types to skip in PhpClassConverter::typesToSkip and include there also type null. But allow provide null as default value. That will lead to generating in AVRO scheme fields like:
{"name":"fullName","type":["string"],"default":null}

said by @var null|string or ?string.
But that is totally incorrect by AVRO specification, citing:

Union types are denoted as union { typeA, typeB, typeC, ... }. For example, this record contains a string field that is optional (unioned with null).

Also there in spec note what if there present default value, that must match the first type:

(Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing "null", the "null" is usually listed first, since the default value of such unions is typically null.)

So, that is logically incorrect to exclude null-type as that is legitime in AVRO.

I would like to propose:

  1. Exclude null type from skipped. Or at least allow overriding that array (e.g. parameter in constructor and/or setter)
  2. On generation phase check that on manner:
diff --git a/src/Generator/SchemaGenerator.php b/src/Generator/SchemaGenerator.php
index 9e99bc6..81fbaee 100644
--- a/src/Generator/SchemaGenerator.php
+++ b/src/Generator/SchemaGenerator.php
@@ -5,6 +5,7 @@ declare(strict_types=1);
 namespace PhpKafka\PhpAvroSchemaGenerator\Generator;
 
 use PhpKafka\PhpAvroSchemaGenerator\Avro\Avro;
+use PhpKafka\PhpAvroSchemaGenerator\Exception\SchemaGeneratorException;
 use PhpKafka\PhpAvroSchemaGenerator\PhpClass\PhpClassInterface;
 use PhpKafka\PhpAvroSchemaGenerator\PhpClass\PhpClassPropertyInterface;
 use PhpKafka\PhpAvroSchemaGenerator\Registry\ClassRegistryInterface;
@@ -106,6 +107,9 @@ final class SchemaGenerator implements SchemaGeneratorInterface
 
         if (PhpClassPropertyInterface::NO_DEFAULT !== $property->getPropertyDefault()) {
             $field['default'] = $property->getPropertyDefault();
+            if (null === $field['default'] && (!is_array($field['type']) or empty($field['type']) or 'null' != @$field['type'][0])){
+                throw new SchemaGeneratorException('Provided default value "null", but that type is not provided as first possible in union (see https://avro.apache.org/docs/current/spec.html#Unions)!');
+            }
         }
 
         if (null !== $property->getPropertyDoc() && '' !== $property->getPropertyDoc()) {

@nick-zh
Copy link
Member

nick-zh commented Jan 7, 2022

Hey @Hubbitus
Thanks for your answers and requests, i'll try to answer:

  1. This was left open on purpose and only my implementation checks for PhpParser\Node\Stmt\Property so if you want to create your own class and stick to the interface, you are not forced to use nikic/php-parser like i do in my implementation
  2. I am not sure if we need to add this in my base implementation, since you can also easily create your own ClassParser that can skip properties. Could you elaborate maybe more what kind of properties and why you would want to skip?
  3. The problem is also a bit with Improve @avro-default annotation #43 it is hard to represent an empty string in annotations and since i also need to support null itself at somepoint, that probably won't do
  4. Due to the fact that i am offering interfaces to give the user to override certain parts i also need some safeguards for my classes that for cases that i did not implement. While i myself return always string for PhpClassPropertyInterface::getPropertyType other people might want to maybe return different things. I could make the the interface more strict, to only allow string return, then i would not need to skip other types than string.
  5. I was a bit torn, normally maintainers often make classes final to prevent extending, so you don't get issues that have nothing to do with implementation. I agree, i should either make the class final or add protected, i am still a bit undecided.
  6. This logic is a bit more complex, agreed, the first if would add an array type { type: array, items: string } to other basic types (int, string, etc) if present. If the type is only an array union type { type: array, items: int,string } we would return it in the elseif. I hope that makes sense
  7. Thank you very much, total oversight on my end, null should indeed not be skipped and needs to be first type in that case if we have a union type 😄

Many thanks for having a look at the new implementation and challenging it 🙏 i appreciate this a lot

@nick-zh
Copy link
Member

nick-zh commented Jan 7, 2022

FYI: Null is now allowed (alpha3), i will add the order check separately. I am using a different lib for schema check / deployment, but i agree, it should be a part of this project as well

Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 8, 2022
@Hubbitus Hubbitus linked a pull request Jan 8, 2022 that will close this issue
@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 8, 2022

  1. Unfortunately problem there what that method declared in the public interface \PhpKafka\PhpAvroSchemaGenerator\Parser\ClassPropertyParserInterface, and although you are right, there passed property of \PhpParser\Node\Stmt\Property (that extends \PhpParser\Node\Stmt <- \PhpParser\NodeAbstract). That implements \PhpParser\Node. I see there I am able to use file start and end positions, but can't even understand in which file! Have I looked badly? I think there will be very good to have a parent node specification in the nodes hierarchy. The main purpose there - I want to see other sources of properties descriptions and typing. For example:
    • To find getter end setter and look on them in reflection (or by parsing)
    • In my case (I have already mentioned I have classes generated by PIMCore) I have also in file a prolog comment which looks like this:
      /**
      * Inheritance: no
      * Variants: no
      
      
      Fields Summary:
      - code [input]
      - name [input]
      - fullName [input]
      - masterSystem [multiselect]
      - roleSubject [select]
      - roleObject [multiselect]
      - topic [multiselect]
      - description [textarea]
      */
      
      I've placed one example.
  2. ?
  3. Honestly it is not so important for me but looks very ugly. What is the problem to provide @avro-default '', @avro-default null, @avro-default 'null'??
  4. As I see you made PhpClassConverter class final. Is it unintentional? It looks very convenient to just extend it and override said single method parseProperty($property). Why not? Because it is a single-provided implementation you can't threat that as non-public internal details. Do you call it just "demo-implementation" and provide only API on interfaces?
  5. I would like to introduce there stricter type interface like PhpClassPropertyTypeInterface. Please look at the pull request #33 Schema generation extensions #47

In that PR I also adjust most of the autotests, but not all. Some of them failed because of the step to reflection which we discussed separately (#38). But I could continue work if you like the overall approach. Please look.

P.S. And @nick-zh, really big thank you for the fast responses and spent time! I've appreciate that.

@nick-zh
Copy link
Member

nick-zh commented Jan 8, 2022

To find getter end setter and look on them in reflection

This seems very specific and i don't think it is something the default implementation should cover or be able to extend upon.
I think in this case it would be better for you to copy the exsiting ClassParser and extend it's functionality in your project

I have also in file a prolog comment which looks like this

This seems again very specific

The spirit of the generator is, you have some properties and some comments / types defined for these properties, and we generate an avro schema for that. Any logic beyond that will be hard to cover in a general way, since projects differ a lot in the real world and how people tend to write their classes. If there is anything i can do to make extension easier though, i am totally open to that as long as it is not too specific, e.g. see proposal in #49

I moved the other questions to discussions, it is getting a bit overwhelming otherwise and these are good points, but more on a general level for this project 😄

Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 8, 2022
Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 8, 2022
@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 8, 2022

Sure such logic is very project-specific!!! I did not ask to add parsing types by seeing in getters in core implementation!
I've asked to just pass ClassParser in the interface! Please look idea in 1cc81b7

One note, please, src/Parser/AvroClassPropertyParser.php is just like demonstration how that may be used for override. That is not intended to be part of real PR.

Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 8, 2022
…p (that should act as example of extending and override)
Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 8, 2022
Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 8, 2022
@Hubbitus
Copy link
Contributor Author

Hubbitus commented Jan 8, 2022

@nick-zh, please look I've fixed almost all tests, but there still 2 failed like:

PhpKafka\PhpAvroSchemaGenerator\Tests\Integration\Parser\ClassParserTest::testGetProperties
ParseError: syntax error, unexpected '|', expecting variable (T_VARIABLE)

/var/www/html/example/classes/SomeTestClass.php:83
/var/www/html/src/Parser/ClassParser.php:221
/var/www/html/src/Parser/ClassParser.php:158
/var/www/html/tests/Integration/Parser/ClassParserTest.php:56
phpvfscomposer:///var/www/html/vendor/phpunit/phpunit/phpunit:97

But that related to the \PhpKafka\PhpAvroSchemaGenerator\Example\SomeTestClass::$blaaaaaaaa declaration:

public int|string $blaaaaaaaa;

That use Union type declaration which is available since PHP 8.0.0.
So they failed expectedly on PHP container based on version 7.4.
Should I upgrade the container to version 8.0.0 or do I just need to disable 2-3 tests? FYI all tests passed on my machine locally with PHP 8.0.2.

Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 9, 2022
Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 9, 2022
Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 9, 2022
…lassPropertyTypeItem::jsonSerialize in tests
Hubbitus pushed a commit to Hubbitus/php-avro-schema-generator that referenced this issue Jan 9, 2022
…lassPropertyTypeItem::jsonSerialize in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants