Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When inputing an Ibis table it still outputs Pandas df #1

Open
ShawnStrasser opened this issue Dec 8, 2024 · 0 comments
Open

When inputing an Ibis table it still outputs Pandas df #1

ShawnStrasser opened this issue Dec 8, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ShawnStrasser
Copy link
Owner

I'm running traffic-anomaly on Spark and fed it an Ibis table, it worked but output a Pandas DataFrame instead of an Ibis table. To make this work in production on a Spark cluster it should output an Ibis table instead of Pandas.

A workaround is to set to_sql=True and then just execute the Spark SQL with Spark.

import traffic_anomaly  
import ibis  

# spark connection established previously  
con = ibis.pyspark.connect(spark)  

# Spark df established previously  
df.createOrReplaceTempView("df")  

# Convert df to ibis table  
df_ibis = con.table('df')  

decomp_sql = traffic_anomaly.median_decompose(  
    data=df_ibis,  # Pandas DataFrame or Ibis Table (for compatibility with any SQL backend)  
    datetime_column='ds',  
    value_column='y',  
    entity_grouping_columns=['unique_id'],  
    freq_minutes=15,  # Frequency of the time series in minutes  
    rolling_window_days=7,  # Rolling window size in days. Should be a multiple of 7 for traffic data  
    drop_days=7,  # Should be at least 7 for traffic data  
    min_rolling_window_samples=96,  # Minimum number of samples in the rolling window, set to 0 to disable.  
    min_time_of_day_samples=14,  # Minimum number of samples for each time of day (like 2:00pm), set to 0 to disable  
    drop_extras=False,  # lets keep seasonal/trend for visualization below  
    to_sql=True  # Return SQL queries instead of Pandas DataFrames for running on SQL backends  
)  

# Execute the SQL query on Spark  
decomp = spark.sql(decomp_sql)  
@ShawnStrasser ShawnStrasser added the bug Something isn't working label Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant