You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Aggregation function COUNT_DISTINCT seems is not supported yet.
When I try to use that, it seems auto-types the feature to STRING, ignoring the type I explicitly defined, which is integer,
and thus spark throws an error.
My codes:
# total number of different currencies used for transaction in the past week
num_currency_type_in_week = Feature(
name="num_currency_type_in_week",
key=account_id,
feature_type=INT32,
transform=WindowAggTransformation(
agg_expr="transactionCurrencyCode", agg_func="COUNT_DISTINCT", window="7d"
),
)
Caused by: java.lang.RuntimeException: java.lang.Integer is not a valid external type for schema of string
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.StaticInvoke_12$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_7$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:207)
Willingness to contribute
No. I cannot contribute a bug fix at this time.
Feathr version
0.9.0
System information
NA
Describe the problem
Aggregation function
COUNT_DISTINCT
seems is not supported yet.When I try to use that, it seems auto-types the feature to STRING, ignoring the type I explicitly defined, which is integer,
and thus spark throws an error.
My codes:
Spark logs:
and the error:
I see the test function for count_distinct is commented out:
https://github.com/feathr-ai/feathr/blob/main/feathr-impl/src/test/scala/com/linkedin/feathr/offline/SlidingWindowAggIntegTest.scala
Tracking information
No response
Code to reproduce bug
No response
What component(s) does this bug affect?
Python Client
: This is the client users use to interact with most of our API. Mostly written in Python.Computation Engine
: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.Feature Registry API
: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)Feature Registry Web UI
: The Web UI for feature registry. Written in ReactThe text was updated successfully, but these errors were encountered: