Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing WARN messages indicating native execution is disabled #180

Closed
sagarlakshmipathy opened this issue Mar 9, 2024 · 6 comments · Fixed by #181
Closed

Seeing WARN messages indicating native execution is disabled #180

sagarlakshmipathy opened this issue Mar 9, 2024 · 6 comments · Fixed by #181
Labels
bug Something isn't working

Comments

@sagarlakshmipathy
Copy link

Describe the bug

While running Comet with OSS Spark, I noticed warning messages on some queries indicating that Comet native execution is disabled. Wondering why that is.

Here's the execution log:

====================================================================================================
RUNNING: Query # 15 (round 1) (1 statements)
----------------------------------------------------------------------------------------------------
24/03/09 23:16:27 WARN QueryPlanSerde: Comet native execution is disabled due to: unsupported Spark expression: 'might_contain(Subquery subquery#8915, [id=#74608], xxhash64(cs_sold_date_sk#277, 42))' of class 'org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain
24/03/09 23:16:27 WARN QueryPlanSerde: Comet native execution is disabled due to: unsupported Spark expression: 'might_contain(Subquery subquery#8915, [id=#74608], xxhash64(cs_sold_date_sk#277, 42))' of class 'org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain
24/03/09 23:16:27 WARN DAGScheduler: Broadcasting large task binary with size 1047.8 KiB
24/03/09 23:16:33 WARN DAGScheduler: Broadcasting large task binary with size 1096.7 KiB
24/03/09 23:16:33 WARN DAGScheduler: Broadcasting large task binary with size 1143.9 KiB
24/03/09 23:16:35 WARN DAGScheduler: Broadcasting large task binary with size 1131.6 KiB
Time taken: 8596 ms                                                             
----------------------------------------------------------------------------------------------------
FINISHED: Query # 15 (round 1)
====================================================================================================

Here's the query itself

--TPC-DS Q15
select  ca_zip
       ,sum(cs_sales_price)
 from catalog_sales
     ,customer
     ,customer_address
     ,date_dim
 where cs_bill_customer_sk = c_customer_sk
 	and c_current_addr_sk = ca_address_sk 
 	and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
                                   '85392', '85460', '80348', '81792')
 	      or ca_state in ('CA','WA','GA')
 	      or cs_sales_price > 500)
 	and cs_sold_date_sk = d_date_sk
 	and d_qoy = 2 and d_year = 2002
 group by ca_zip
 order by ca_zip
 limit 100;

Regardless, I could see that the queries ran faster.

Steps to reproduce

  1. Run a TPCDS query test, maybe just for query 15

Apologies for mentioning minimal steps here. Thats all thats needed fortunately.

Expected behavior

No WARN messages

Additional context

This only happened for some queries. For example, Q46 ran without any issues.

====================================================================================================
RUNNING: Query # 46 (round 1) (1 statements)
----------------------------------------------------------------------------------------------------
Time taken: 18658 ms                                                            ]
----------------------------------------------------------------------------------------------------
FINISHED: Query # 46 (round 1)
====================================================================================================
--TPC-DS Q46
select  c_last_name
       ,c_first_name
       ,ca_city
       ,bought_city
       ,ss_ticket_number
       ,amt,profit 
 from
   (select ss_ticket_number
          ,ss_customer_sk
          ,ca_city bought_city
          ,sum(ss_coupon_amt) amt
          ,sum(ss_net_profit) profit
    from store_sales,date_dim,store,household_demographics,customer_address 
    where store_sales.ss_sold_date_sk = date_dim.d_date_sk
    and store_sales.ss_store_sk = store.s_store_sk  
    and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
    and store_sales.ss_addr_sk = customer_address.ca_address_sk
    and (household_demographics.hd_dep_count = 3 or
         household_demographics.hd_vehicle_count= 1)
    and date_dim.d_dow in (6,0)
    and date_dim.d_year in (1999,1999+1,1999+2) 
    and store.s_city in ('Midway','Fairview','Fairview','Midway','Fairview') 
    group by ss_ticket_number,ss_customer_sk,ss_addr_sk,ca_city) dn,customer,customer_address current_addr
    where ss_customer_sk = c_customer_sk
      and customer.c_current_addr_sk = current_addr.ca_address_sk
      and current_addr.ca_city <> bought_city
  order by c_last_name
          ,c_first_name
          ,ca_city
          ,bought_city
          ,ss_ticket_number
  limit 100;
@sagarlakshmipathy sagarlakshmipathy added the bug Something isn't working label Mar 9, 2024
@viirya
Copy link
Member

viirya commented Mar 10, 2024

24/03/09 23:16:27 WARN QueryPlanSerde: Comet native execution is disabled due to: unsupported Spark expression: 'might_contain(Subquery subquery#8915, [id=#74608], xxhash64(cs_sold_date_sk#277, 42))' of class 'org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain

I think it is due to there are unsupported expressions as shown in the warning log.

@viirya
Copy link
Member

viirya commented Mar 10, 2024

For BloomFilterMightContain support , we have opened a ticket #145 and there is some work from community contributor.

@viirya
Copy link
Member

viirya commented Mar 10, 2024

BTW, I think the warning log is misleading for now. It actually means a certain operator/expression is not supported. Not that the while query is not supported.

@viirya
Copy link
Member

viirya commented Mar 10, 2024

In query explain string or query plan in Spark UI, you can see which operators are transformed to Comet and which operators are kept as Spark ones.

@sagarlakshmipathy
Copy link
Author

Thank you @viirya

@sunchao
Copy link
Member

sunchao commented Mar 11, 2024

We can probably de-duplicate the error messages too since many of them are the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants