You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenMetadata 0.12.1 is a new release with many bugfixes. One bug not fixed affects data profiling (the modulo operator is not correctly translated for Trino, resulting in an unexpected % in the profiling input stream). I reported the bug last night and this PR resolved it this morning:
Given that we are just at the start of our OpenMetadata journey, and given that our experience thus far is that the community is extremely responsive to bug fixes (they integrated our use case for multiple pipelines in a single service for the 0.12 release), I think we serve ourselves best by tracking their latest and being able to apply patches (such as 7920) as needed.
Steps to Reproduce
Go to the OpenMetadata services page
Select the Trino service
Select the ingestions tab
Run profiler_essd
The profiling will report success, but the log will show failure due to the problem fixed by 7920.
Expected behaviour
I expect the data profiler to properly profile data.
Screenshots
Herre's the logfile:
profileressdprofilertask-63df70d1309d4ea89e9337ea55b5bd9b
*** Reading local file: /opt/airflow/logs/dag_id=profiler_essd/run_id=manual__2022-10-03T22:10:45+00:00/task_id=profiler_task/attempt=1.log
[2022-10-03 22:11:03,576] {taskinstance.py:1179} INFO - Dependencies all met for <TaskInstance: profiler_essd.profiler_task manual__2022-10-03T22:10:45+00:00 [queued]>
[2022-10-03 22:11:03,586] {taskinstance.py:1179} INFO - Dependencies all met for <TaskInstance: profiler_essd.profiler_task manual__2022-10-03T22:10:45+00:00 [queued]>
[2022-10-03 22:11:03,587] {taskinstance.py:1376} INFO -
--------------------------------------------------------------------------------
[2022-10-03 22:11:03,587] {taskinstance.py:1377} INFO - Starting attempt 1 of 4
[2022-10-03 22:11:03,587] {taskinstance.py:1378} INFO -
--------------------------------------------------------------------------------
[2022-10-03 22:11:03,763] {taskinstance.py:1397} INFO - Executing <Task(PythonOperator): profiler_task> on 2022-10-03 22:10:45+00:00
[2022-10-03 22:11:03,767] {standard_task_runner.py:52} INFO - Started process 113 to run task
[2022-10-03 22:11:03,770] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'profiler_essd', 'profiler_task', 'manual__2022-10-03T22:10:45+00:00', '--job-id', '418', '--raw', '--subdir', 'DAGS_FOLDER/profiler_essd.py', '--cfg-path', '/tmp/tmpqm9373l8', '--error-file', '/tmp/tmpatytxlr1']
[2022-10-03 22:11:03,772] {standard_task_runner.py:80} INFO - Job 418: Subtask profiler_task
[2022-10-03 22:11:04,532] {task_command.py:371} INFO - Running <TaskInstance: profiler_essd.profiler_task manual__2022-10-03T22:10:45+00:00 [running]> on host profileressdprofilertask-63df70d1309d4ea89e9337ea55b5bd9b
[2022-10-03 22:11:05,627] {logging_mixin.py:115} WARNING - /home/airflow/.local/lib/python3.9/site-packages/airflow/models/renderedtifields.py:249 SAWarning: Coercing Subquery object into a select() for use in IN(); please pass a select() construct explicitly
[2022-10-03 22:11:05,639] {taskinstance.py:1589} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=profiler_essd
AIRFLOW_CTX_TASK_ID=profiler_task
AIRFLOW_CTX_EXECUTION_DATE=2022-10-03T22:10:45+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-10-03T22:10:45+00:00
[2022-10-03 22:11:05,997] {database_service.py:206} INFO - Scanned [OSC-Trino.osc_datacommons_dev.essd.ch4_gwp]
[2022-10-03 22:11:06,054] {core.py:376} INFO - Computing profile metrics for OSC-Trino.osc_datacommons_dev.essd.ch4_gwp...
[2022-10-03 22:11:06,054] {sqa_interface.py:285} INFO - Computing metrics with 5.0 threads.
[2022-10-03 22:11:09,083] {client.py:807} INFO - <Response [204]>
[2022-10-03 22:11:09,665] {base.py:1900} ERROR - Error closing cursor
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/sqlalchemy/dialect.py", line 333, in do_execute
cursor.execute(statement, parameters)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 460, in execute
added_prepare_header = self._prepare_statement(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 333, in _prepare_statement
for _ in result:
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 614, in __iter__
rows = self._query.fetch()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 791, in fetch
status = self._request.process(response)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 536, in process
raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00004_uejbx)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1897, in _safe_close_cursor
cursor.close()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 579, in close
self.cancel()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 573, in cancel
raise trino.exceptions.OperationalError(
trino.exceptions.OperationalError: Cancel query failed; no running query
[2022-10-03 22:11:09,666] {base.py:1900} ERROR - Error closing cursor
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/sqlalchemy/dialect.py", line 333, in do_execute
cursor.execute(statement, parameters)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 460, in execute
added_prepare_header = self._prepare_statement(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 333, in _prepare_statement
for _ in result:
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 614, in __iter__
rows = self._query.fetch()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 791, in fetch
status = self._request.process(response)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 536, in process
raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00002_uejbx)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1897, in _safe_close_cursor
cursor.close()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 579, in close
self.cancel()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 573, in cancel
raise trino.exceptions.OperationalError(
trino.exceptions.OperationalError: Cancel query failed; no running query
[2022-10-03 22:11:09,667] {base.py:1900} ERROR - Error closing cursor
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/sqlalchemy/dialect.py", line 333, in do_execute
cursor.execute(statement, parameters)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 460, in execute
added_prepare_header = self._prepare_statement(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 333, in _prepare_statement
for _ in result:
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 614, in __iter__
rows = self._query.fetch()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 791, in fetch
status = self._request.process(response)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 536, in process
raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00001_uejbx)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1897, in _safe_close_cursor
cursor.close()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 579, in close
self.cancel()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 573, in cancel
raise trino.exceptions.OperationalError(
trino.exceptions.OperationalError: Cancel query failed; no running query
[2022-10-03 22:11:09,667] {base.py:1900} ERROR - Error closing cursor
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/sqlalchemy/dialect.py", line 333, in do_execute
cursor.execute(statement, parameters)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 460, in execute
added_prepare_header = self._prepare_statement(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 333, in _prepare_statement
for _ in result:
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 614, in __iter__
rows = self._query.fetch()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 791, in fetch
status = self._request.process(response)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 536, in process
raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00003_uejbx)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1897, in _safe_close_cursor
cursor.close()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 579, in close
self.cancel()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 573, in cancel
raise trino.exceptions.OperationalError(
trino.exceptions.OperationalError: Cancel query failed; no running query
[2022-10-03 22:11:09,733] {sqa_interface.py:452} WARNING - Error trying to compute profile for ch4_gwp.gas: (trino.exceptions.TrinoUserError) TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00004_uejbx)
[SQL: /* {"app": "OpenMetadata", "version": "0.12.0.0"} */
WITH ch4_gwp_rnd AS
(SELECT essd.ch4_gwp.sector_code AS sector_code, essd.ch4_gwp.fossil_bio AS fossil_bio, essd.ch4_gwp.gas AS gas, essd.ch4_gwp.gwp_ar5_feedbacks AS gwp_ar5_feedbacks, essd.ch4_gwp.gwp_ar5 AS gwp_ar5, essd.ch4_gwp.description AS description, essd.ch4_gwp.subsector AS subsector, essd.ch4_gwp.chapter_title AS chapter_title, essd.ch4_gwp.subsector_title AS subsector_title, ABS(RANDOM()) * 100 %% ? AS random
FROM essd.ch4_gwp),
ch4_gwp_sample AS
(SELECT ch4_gwp_rnd.sector_code AS sector_code, ch4_gwp_rnd.fossil_bio AS fossil_bio, ch4_gwp_rnd.gas AS gas, ch4_gwp_rnd.gwp_ar5_feedbacks AS gwp_ar5_feedbacks, ch4_gwp_rnd.gwp_ar5 AS gwp_ar5, ch4_gwp_rnd.description AS description, ch4_gwp_rnd.subsector AS subsector, ch4_gwp_rnd.chapter_title AS chapter_title, ch4_gwp_rnd.subsector_title AS subsector_title, ch4_gwp_rnd.random AS random
FROM ch4_gwp_rnd
WHERE ch4_gwp_rnd.random <= ?)
SELECT avg(LENGTH(gas)) AS mean, count(gas) AS "valuesCount", count(DISTINCT gas) AS "distinctCount", NULL AS anon_1, min(LENGTH(gas)) AS "minLength", NULL AS anon__1, max(LENGTH(gas)) AS "maxLength", SUM(CASE WHEN (gas IS NULL) THEN ? ELSE ? END) AS "nullCount", NULL AS anon__2, NULL AS anon__3
FROM ch4_gwp_sample
LIMIT ?]
[parameters: (100, 16.0, 1, 0, 1)]
(Background on this error at: https://sqlalche.me/e/14/f405)
[2022-10-03 22:11:09,733] {sqa_interface.py:452} WARNING - Error trying to compute profile for ch4_gwp.fossil_bio: (trino.exceptions.TrinoUserError) TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00001_uejbx)
[SQL: /* {"app": "OpenMetadata", "version": "0.12.0.0"} */
WITH ch4_gwp_rnd AS
(SELECT essd.ch4_gwp.sector_code AS sector_code, essd.ch4_gwp.fossil_bio AS fossil_bio, essd.ch4_gwp.gas AS gas, essd.ch4_gwp.gwp_ar5_feedbacks AS gwp_ar5_feedbacks, essd.ch4_gwp.gwp_ar5 AS gwp_ar5, essd.ch4_gwp.description AS description, essd.ch4_gwp.subsector AS subsector, essd.ch4_gwp.chapter_title AS chapter_title, essd.ch4_gwp.subsector_title AS subsector_title, ABS(RANDOM()) * 100 %% ? AS random
FROM essd.ch4_gwp),
ch4_gwp_sample AS
(SELECT ch4_gwp_rnd.sector_code AS sector_code, ch4_gwp_rnd.fossil_bio AS fossil_bio, ch4_gwp_rnd.gas AS gas, ch4_gwp_rnd.gwp_ar5_feedbacks AS gwp_ar5_feedbacks, ch4_gwp_rnd.gwp_ar5 AS gwp_ar5, ch4_gwp_rnd.description AS description, ch4_gwp_rnd.subsector AS subsector, ch4_gwp_rnd.chapter_title AS chapter_title, ch4_gwp_rnd.subsector_title AS subsector_title, ch4_gwp_rnd.random AS random
FROM ch4_gwp_rnd
WHERE ch4_gwp_rnd.random <= ?)
SELECT avg(LENGTH(fossil_bio)) AS mean, count(fossil_bio) AS "valuesCount", count(DISTINCT fossil_bio) AS "distinctCount", NULL AS anon_1, min(LENGTH(fossil_bio)) AS "minLength", NULL AS anon__1, max(LENGTH(fossil_bio)) AS "maxLength", SUM(CASE WHEN (fossil_bio IS NULL) THEN ? ELSE ? END) AS "nullCount", NULL AS anon__2, NULL AS anon__3
FROM ch4_gwp_sample
LIMIT ?]
[parameters: (100, 16.0, 1, 0, 1)]
(Background on this error at: https://sqlalche.me/e/14/f405)
[2022-10-03 22:11:09,733] {sqa_interface.py:452} WARNING - Error trying to compute profile for ch4_gwp.gwp_ar5_feedbacks: (trino.exceptions.TrinoUserError) TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00002_uejbx)
[SQL: /* {"app": "OpenMetadata", "version": "0.12.0.0"} */
WITH ch4_gwp_rnd AS
(SELECT essd.ch4_gwp.sector_code AS sector_code, essd.ch4_gwp.fossil_bio AS fossil_bio, essd.ch4_gwp.gas AS gas, essd.ch4_gwp.gwp_ar5_feedbacks AS gwp_ar5_feedbacks, essd.ch4_gwp.gwp_ar5 AS gwp_ar5, essd.ch4_gwp.description AS description, essd.ch4_gwp.subsector AS subsector, essd.ch4_gwp.chapter_title AS chapter_title, essd.ch4_gwp.subsector_title AS subsector_title, ABS(RANDOM()) * 100 %% ? AS random
FROM essd.ch4_gwp),
ch4_gwp_sample AS
(SELECT ch4_gwp_rnd.sector_code AS sector_code, ch4_gwp_rnd.fossil_bio AS fossil_bio, ch4_gwp_rnd.gas AS gas, ch4_gwp_rnd.gwp_ar5_feedbacks AS gwp_ar5_feedbacks, ch4_gwp_rnd.gwp_ar5 AS gwp_ar5, ch4_gwp_rnd.description AS description, ch4_gwp_rnd.subsector AS subsector, ch4_gwp_rnd.chapter_title AS chapter_title, ch4_gwp_rnd.subsector_title AS subsector_title, ch4_gwp_rnd.random AS random
FROM ch4_gwp_rnd
WHERE ch4_gwp_rnd.random <= ?)
SELECT avg(gwp_ar5_feedbacks) AS mean, count(gwp_ar5_feedbacks) AS "valuesCount", count(DISTINCT gwp_ar5_feedbacks) AS "distinctCount", min(gwp_ar5_feedbacks) AS min, NULL AS anon_1, max(gwp_ar5_feedbacks) AS max, NULL AS anon__1, SUM(CASE WHEN (gwp_ar5_feedbacks IS NULL) THEN ? ELSE ? END) AS "nullCount", STDDEV_POP(gwp_ar5_feedbacks) AS stddev, SUM(gwp_ar5_feedbacks) AS sum
FROM ch4_gwp_sample
LIMIT ?]
[parameters: (100, 16.0, 1, 0, 1)]
(Background on this error at: https://sqlalche.me/e/14/f405)
[2022-10-03 22:11:09,784] {sqa_interface.py:452} WARNING - Error trying to compute profile for ch4_gwp.sector_code: (trino.exceptions.TrinoUserError) TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00003_uejbx)
[SQL: /* {"app": "OpenMetadata", "version": "0.12.0.0"} */
WITH ch4_gwp_rnd AS
(SELECT essd.ch4_gwp.sector_code AS sector_code, essd.ch4_gwp.fossil_bio AS fossil_bio, essd.ch4_gwp.gas AS gas, essd.ch4_gwp.gwp_ar5_feedbacks AS gwp_ar5_feedbacks, essd.ch4_gwp.gwp_ar5 AS gwp_ar5, essd.ch4_gwp.description AS description, essd.ch4_gwp.subsector AS subsector, essd.ch4_gwp.chapter_title AS chapter_title, essd.ch4_gwp.subsector_title AS subsector_title, ABS(RANDOM()) * 100 %% ? AS random
FROM essd.ch4_gwp),
ch4_gwp_sample AS
(SELECT ch4_gwp_rnd.sector_code AS sector_code, ch4_gwp_rnd.fossil_bio AS fossil_bio, ch4_gwp_rnd.gas AS gas, ch4_gwp_rnd.gwp_ar5_feedbacks AS gwp_ar5_feedbacks, ch4_gwp_rnd.gwp_ar5 AS gwp_ar5, ch4_gwp_rnd.description AS description, ch4_gwp_rnd.subsector AS subsector, ch4_gwp_rnd.chapter_title AS chapter_title, ch4_gwp_rnd.subsector_title AS subsector_title, ch4_gwp_rnd.random AS random
FROM ch4_gwp_rnd
WHERE ch4_gwp_rnd.random <= ?)
SELECT avg(LENGTH(sector_code)) AS mean, count(sector_code) AS "valuesCount", count(DISTINCT sector_code) AS "distinctCount", NULL AS anon_1, min(LENGTH(sector_code)) AS "minLength", NULL AS anon__1, max(LENGTH(sector_code)) AS "maxLength", SUM(CASE WHEN (sector_code IS NULL) THEN ? ELSE ? END) AS "nullCount", NULL AS anon__2, NULL AS anon__3
FROM ch4_gwp_sample
LIMIT ?]
[parameters: (100, 16.0, 1, 0, 1)]
(Background on this error at: https://sqlalche.me/e/14/f405)
[2022-10-03 22:11:09,870] {base.py:1900} ERROR - Error closing cursor
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/sqlalchemy/dialect.py", line 333, in do_execute
cursor.execute(statement, parameters)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 460, in execute
added_prepare_header = self._prepare_statement(
File "/home/airflow/.local/lib/python3.9/site-packages/trino/dbapi.py", line 333, in _prepare_statement
for _ in result:
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 614, in __iter__
rows = self._query.fetch()
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 791, in fetch
status = self._request.process(response)
File "/home/airflow/.local/lib/python3.9/site-packages/trino/client.py", line 536, in process
raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 3:393: mismatched input '%'. Expecting: <expression>", query_id=20221003_221109_00007_uejbx)
[truncated...too ridiculously long]
Additional context
Since OpenMetadata is still very new, it would be great to get the latest and greatest working well before more people start crowding on to the platform and making updates/upgrades more challenging.
The text was updated successfully, but these errors were encountered:
Describe the Problem
OpenMetadata 0.12.1 is a new release with many bugfixes. One bug not fixed affects data profiling (the
modulo
operator is not correctly translated for Trino, resulting in an unexpected%
in the profiling input stream). I reported the bug last night and this PR resolved it this morning:open-metadata/OpenMetadata#7920
Given that we are just at the start of our OpenMetadata journey, and given that our experience thus far is that the community is extremely responsive to bug fixes (they integrated our use case for multiple pipelines in a single service for the 0.12 release), I think we serve ourselves best by tracking their latest and being able to apply patches (such as 7920) as needed.
Steps to Reproduce
The profiling will report success, but the log will show failure due to the problem fixed by 7920.
Expected behaviour
I expect the data profiler to properly profile data.
Screenshots
Herre's the logfile:
Additional context
Since OpenMetadata is still very new, it would be great to get the latest and greatest working well before more people start crowding on to the platform and making updates/upgrades more challenging.
The text was updated successfully, but these errors were encountered: