-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make FieldValueTypeInformation creators take a TypeDescriptor parameter #32081
make FieldValueTypeInformation creators take a TypeDescriptor parameter #32081
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Run Java PreCommit |
Run Java_GCP_IO_Direct PreCommit |
Run Java PreCommit |
Run Java_Hadoop_IO_Direct PreCommit |
assign set of reviewers |
Assigning reviewers. If you would like to opt out of this review, comment R: @m-trieu for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
@damondouglas could you please take a look at this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tilgalas Thank you for working on this! Would you mind to first remove the "nullness"
value in the @SuppressWarnings
annotation for the involved classes in this PR? Please consider this suggestion as non-blocking but I would like to see if this is possible with your helpful changes. As a second phase, may we also consider removing the "rawtypes"
? Again, non-blocking but it's something we would like to strive for to improve the code quality.
sure, with pleasure! |
Reminder, please take a look at this pr: @m-trieu @damondouglas |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
|
waiting on author |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
|
hi reviewers, I'm going to split this PR and put the nullness and rawness refactoring commit into its own PR |
since there's a #32757, I'm closing this PR |
reopening after discussing offline |
@@ -39,9 +41,9 @@ | |||
public class CachingFactory<CreatedT> implements Factory<CreatedT> { | |||
private transient @Nullable ConcurrentHashMap<TypeDescriptor<?>, CreatedT> cache = null; | |||
|
|||
private final Factory<CreatedT> innerFactory; | |||
private final @NotOnlyInitialized Factory<CreatedT> innerFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these new tags for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in RowValueGettersFactory
there's this code:
RowValueGettersFactory(Factory<List<FieldValueGetter<T, Object>>> gettersFactory) {
this.gettersFactory = gettersFactory;
this.cachingGettersFactory = new CachingFactory<>(this);
}
since we're in the constructor, the this
parameter is still not fully initialized, so by using this annotation we can temporarily (for the duration of the initalization of the CachingFactory object) allow assigning the innerFactory
field a value which itself is under initialization (or indeed whose initialization status is unknown, as the signature of the CachingFactory constructor tells us)
see also https://checkerframework.org/manual/#circular-initialization
@@ -29,7 +30,7 @@ | |||
* <p>Implementations of this interface are generated at runtime to map object fields to Row fields. | |||
*/ | |||
@Internal | |||
public interface FieldValueGetter<ObjectT, ValueT> extends Serializable { | |||
public interface FieldValueGetter<ObjectT extends @NonNull Object, ValueT> extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have ObjectT extends Object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so apparently there's a difference (I don't understand why it would be so, though) between a @NonNull T
and T extends @NonNull Object
- here's an example, consider:
public interface I<T extends @NonNull Object> {
T getT();
}
with a signature as above, we can only write:
public static <T extends @NonNull Object> void f(I<T> i) {
System.out.println(i.getT().getClass());
}
or alternatively:
public static <T> void f(I<@NonNull T> i) {
System.out.println(i.getT().getClass());
}
The checkerframework will make sure that we propagate the non-nullness of the parameter.
This won't work for example:
public static <@Nullable T> void f(I<T> i) {
T t = i.getT();
if (t != null) {
System.out.println(t.getClass());
}
}
nor will this:
public static <T> void f(I<@Nullable T> i) {
T t = i.getT();
if (t != null) {
System.out.println(t.getClass());
}
}
but it will work if I change the signature of I
to:
public interface I<@NonNull T> {
T getT();
}
in other words, the @NonNull T
annotation on the interface is not really enforced anywhere, it's more like a suggestion, and so it's not really expressing the concept of the interface only works with @NonNull
types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, at least not when we're talking type parameters, for example:
public interface I<T> {
T getT();
}
public static <T> void f(I<T> i) {
System.out.println(i.getT().getClass());
}
gives me
error: [dereference.of.nullable] dereference of possibly-null reference i.getT()
System.out.println(i.getT().getClass());
^
@@ -126,17 +123,26 @@ public static FieldValueTypeInformation forOneOf( | |||
} | |||
|
|||
public static FieldValueTypeInformation forField(Field field, int index) { | |||
TypeDescriptor<?> type = TypeDescriptor.of(field.getGenericType()); | |||
return forField(null, field, index); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class wasn't really meant to be public. I would add an @internal tag to the class, and not bother keeping this backwards-compatible method
} | ||
|
||
@Override | ||
public Row apply(T input) { | ||
return Row.withSchema(schema).withFieldValueGetters(getterFactory, input); | ||
return Row.withSchema(schema).withFieldValueGetters((Factory) getterFactory, input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, good question - I'm playing with that code right now and I think I could get rid of it actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -836,10 +838,10 @@ public int nextFieldId() { | |||
} | |||
|
|||
@Internal | |||
public Row withFieldValueGetters( | |||
Factory<List<FieldValueGetter>> fieldValueGetterFactory, Object getterTarget) { | |||
public <T extends @NonNull Object> Row withFieldValueGetters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear on this bound as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change it to <@NonNull T>
I suppose - it compiles, but then my IDE complains that "Non-null type argument is expected" on the T
parameter of the FieldValueGetter<T, ?>
type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer @nonnull T (is this simply a weakness in the IDE?), but I guess I don't have strong feelings here.
super(schema); | ||
this.getterTarget = getterTarget; | ||
this.getters = getterFactory.create(TypeDescriptor.of(getterTarget.getClass()), schema); | ||
} | ||
|
||
@Override | ||
@SuppressWarnings({"TypeParameterUnusedInFormals", "unchecked"}) | ||
public <T> @Nullable T getValue(int fieldIdx) { | ||
public @Nullable Object getValue(int fieldIdx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove type parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's probably a matter of taste - my view is that this class can't possibly know what type of value a particular getter is going to return, so it's not commiting to anything by making the method generic - "I only know it's an Object, if you need a specific type, you'll have to cast it yourself" - I should probably also change the type of the getters list to List<FieldValueGetter<T, Object>>
but then it would probably open another can of worms of covariant types, that is whether a, say, FieldValueGetter<T, String>
is a FieldValueGetter<T, Object>
- the current type of List<FieldValueGetter<T, ?>>
is slightly incorrect as the ?
wildcard means an unknown type, but which is at least shared by all members of the list (which is not true)
I can understand the approach where the caller requests a specific type via a generic type parameter and is happy to accept the method throwing a ClassCastException if they had wrong assumptions about the getter - let me know if you want me to make it back a generic method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I will change the type of the list elements to FieldValueGetter<T, Object>
anyway, as I find it more correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base class uses a parameter here, so I think for now we should just match the base class.
result = false) | ||
private static boolean isNullOrEmpty(@Nullable String str) { | ||
return str == null || str.isEmpty(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is wrong with Strings.isNullOrEmpty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason the checkerframework doesn't understand the nullness assertion when using it:
/usr/local/google/home/mszwaja/src/beam/sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/schemas/utils/AvroUtils.java:538: error: [argument] incompatible argument for parameter namespace of AvroUtils.toAvroField.
org.apache.avro.Schema.Field recordField = toAvroField(field, childNamespace);
^
found : @Initialized @Nullable String
required: @Initialized @NonNull String
/usr/local/google/home/mszwaja/src/beam/sdks/java/extensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/schemas/utils/AvroUtils.java:541: error: [argument] incompatible argument for parameter name of Schema.createRecord.
return org.apache.avro.Schema.createRecord(schemaName, "", schemaNamespace, false, fields);
^
found : @Initialized @Nullable String
required: @Initialized @NonNull String
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this example has to do with isNullOrEmpty (which returns a boolean)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the error that appears when I use Strings.isNullOrEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in other words, checkerframework doesn't understand that by using this method (and getting a false returned from it) we've just asserted, that the variable is @NonNull
- my "polyfill" method uses the checkerframework's annotation:
@EnsuresNonNullIf(
expression = {"#1"},
result = false)
which tells it that if it returns false the first argument is verified to be @NonNull
- I don't know why checkerframwork doesn't understand guava method's postconditions here (maybe it's fixed in some newer version?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood
Fixes #32081 |
A bunch of tests are failing. PTAL to see if this is due to your PR or just flaky tests. |
Run Java PreCommit |
1 similar comment
Run Java PreCommit |
Run Java_GCP_IO_Direct PreCommit |
lgtm |
@@ -451,8 +484,7 @@ public static Field toBeamField(org.apache.avro.Schema.Field field) { | |||
public static org.apache.avro.Schema.Field toAvroField(Field field, String namespace) { | |||
org.apache.avro.Schema fieldSchema = | |||
getFieldSchema(field.getType(), field.getName(), namespace); | |||
return new org.apache.avro.Schema.Field( | |||
field.getName(), fieldSchema, field.getDescription(), (Object) null); | |||
return new org.apache.avro.Schema.Field(field.getName(), fieldSchema, field.getDescription()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to break https://github.com/apache/beam/actions/workflows/beam_PostCommit_Java_Avro_Versions.yml
I think it is because in some older versions of avro, the constructor is different - https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/Schema.Field.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think #33102 should fix this (I cherry-picked it to the current release branch as well)
This likely breaks DataflowTemplate unit test: https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/2014/checks?check_run_id=33057418339 I'm trying to understand the scope of breaking change The stacktrace
MapCoder.encode -> SchemaCoder.encode ->GetterBasedSchemaProvider.ToRowWithValueGetters passes a null value to withFieldValueGetters somehow, causing IllegalStateException This PR made substantial change to the related code path. Somehow some encoding worked before now crash with the Exception above Update: I now suspect it is due to unintended consequence of nullable annotation fix. I checked that where
|
This is a 2nd PR in the series of PRs (based on the now closed #31648, with the first PR being #31785) whose ultimate goal is to add support for generic classes to schema providers.
FieldValueTypeInformation
creators will now accept aTypeDescriptor
parameter describing the field's containing class, that will let them infer more accurate type information about that field. For example - consider aMyClass<T>
class - now with aTypeDescriptor
ofMyClass<String>
and the getter of typeT
the type resolver can infer the type of field to beString
, see the added test class that shows the new functionalityThank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.