Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by 22-35% or more by caching partial parse artefact #904

Merged
merged 20 commits into from
May 1, 2024

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Mar 28, 2024

Improve the performance to run the benchmark DAG with 100 tasks by 34% and the benchmark DAG with 10 tasks by 22%, by persisting the dbt partial parse artifact in Airflow nodes. This performance can be even higher in the case of dbt projects that take more time to be parsed.

With the introduction of #800, Cosmos supports using dbt partial parsing files. This feature has led to a substantial performance improvement, particularly for large dbt projects, both during Airflow DAG parsing (using LoadMode.DBT_LS) and also Airflow task execution (when using ExecutionMode.LOCAL and ExecutionMode.VIRTUALENV).

There were two limitations with the initial support to partial parsing, which the current PR aims to address:

  1. DAGs using Cosmos ProfileMapping classes could not leverage this feature. This is because the partial parsing relies on profile files not changing, and by default, Cosmos would mock the dbt profile in several parts of the code. The consequence is that users trying Cosmos 1.4.0a1 will see the following message:
13:33:16  Unable to do partial parsing because profile has changed
13:33:16  Unable to do partial parsing because env vars used in profiles.yml have changed
  1. The user had to explicitly provide a partial_parse.msgpack file in the original project folder for their Airflow deployment - and if, for any reason, this became outdated, the user would not leverage the partial parsing feature. Since Cosmos runs dbt tasks from within a temporary directory, the partial parse would be stale for some users, it would be updated in the temporary directory, but the next time the task was run, Cosmos/dbt would not leverage the recently updated partial_parse.msgpack file.

The current PR addresses these two issues respectfully by:

  1. Allowing users that want to leverage Cosmos ProfileMapping and partial parsing to use RenderConfig(enable_mock_profile=False)

  2. Introducing a Cosmos cache directory where we are persisting partial parsing files. This feature is enabled by default, but users can opt out by setting the Airflow configuration [cosmos][enable_cache] = False (exporting the environment variable AIRFLOW__COSMOS__ENABLE_CACHE=0). Users can also define the temporary directory used to store these files using the [cosmos][cache_dir] Airflow configuration. By default, Cosmos will create and use a folder cosmos inside the system's temporary directory: https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir .

This PR affects both DAG parsing and task execution. Although it does not introduce an optimisation per se, it makes the partial parse feature implemented #800 available to more users.

Closes: #722

I updated the documentation in the PR: #898

Some future steps related to optimization associated to caching to be addressed in separate PRs:
i. Change how we create mocked profiles, to create the file itself in the same way, referencing an environment variable with the same name - and only changing the value of the environment variable (#924)
ii. Extend caching to the profiles.yml created by Cosmos in the newly introduced tmp/cosmos without the need to recreate it every time (#925).
iii. Extend caching to the Airflow DAG/Task group as a pickle file - this approach is more generic and would work for every type of DAG parsing and executor. (#926)
iv. Support persisting/fetching the cache from remote storage so we don't have to replicate it for every Airflow scheduler and worker node. (#927)
v. Cache dbt deps lock file/avoid installing dbt steps every time. We can leverage package-lock.yml introduced in dbt t 1.7 (https://docs.getdbt.com/reference/commands/deps#predictable-package-installs), but ideally, we'd have a strategy to support older versions of dbt as well. (#930)
vi. Support caching partial_parse.msgpack even when vars change: https://medium.com/@sebastian.daum89/how-to-speed-up-single-dbt-invocations-when-using-changing-dbt-variables-b9d91ce3fb0d
vii. Support partial parsing in Docker and Kubernetes Cosmos executors (#929)
viii. Centralise all the Airflow-based config into Cosmos settings.py & create a dedicated docs page containing information about these (#928)

How to validate this change

Run the performance benchmark against this and the main branch, checking the value of /tmp/performance_results.txt.

Example of commands run locally:

# Setup
AIRFLOW_HOME=`pwd` AIRFLOW_CONN_AIRFLOW_DB="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 hatch run tests.py3.11-2.7:test-performance-setup

# Run test for 100 dbt models per DAG:
MODEL_COUNT=100 AIRFLOW_HOME=`pwd` AIRFLOW_CONN_AIRFLOW_DB="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 hatch run tests.py3.11-2.7:test-performance

An example of output when running 100 with the main branch:

NUM_MODELS=100
TIME=114.18614888191223
MODELS_PER_SECOND=0.8757629623135543
DBT_VERSION=1.7.13

And with the current PR:

NUM_MODELS=100
TIME=75.17766404151917
MODELS_PER_SECOND=1.33018232576064
DBT_VERSION=1.7.13

Copy link

netlify bot commented Mar 28, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 2a3f007
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/663251880d28e10008c3746b

Copy link

codecov bot commented Mar 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.69%. Comparing base (808c9b9) to head (47aa75d).
Report is 1 commits behind head on main.

❗ Current head 47aa75d differs from pull request most recent head 2a3f007. Consider uploading reports for the commit 2a3f007 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #904      +/-   ##
==========================================
+ Coverage   94.79%   95.69%   +0.90%     
==========================================
  Files          57       59       +2     
  Lines        2767     2858      +91     
==========================================
+ Hits         2623     2735     +112     
+ Misses        144      123      -21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cosmos/cache.py Outdated Show resolved Hide resolved
@jlaneve
Copy link
Collaborator

jlaneve commented Apr 2, 2024

do we want to cache this locally or to s3/cloud store? it'd be really cool to do to blob store as well, especially for KE, etc

@kaxil
Copy link
Collaborator

kaxil commented Apr 11, 2024

do we want to cache this locally or to s3/cloud store? it'd be really cool to do to blob store as well, especially for KE, etc

Worth using ObjectStore for the newer Airflow version too, that way we can support more Cloud Stores.

And not great, but a "hacky" approach can be to see if users have a "XCom backend" configured, and if yes, you can store the cache in XCom. From 2.9, you can also use ObjectStore as XCom backend: https://airflow.apache.org/blog/airflow-2.9.0/#object-storage-as-xcom-backend

It can happen in a separate PR too, so that you can add a caching layer in this one, and extend it to remote store (or XCom backend) in the next one.

And once the PR is out of draft and ready for prime time :) & reviews, worth adding what performance issues this solves & what it won't to set the right expectations with users.

@tatiana tatiana force-pushed the cache_partial_parse branch from 7b4b027 to 5c010cc Compare April 18, 2024 22:38
@tatiana tatiana force-pushed the cache_partial_parse branch from ec7e502 to e3fdce9 Compare April 24, 2024 09:58
@tatiana tatiana added area:performance Related to performance, like memory usage, CPU usage, speed, etc parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing execution:local Related to Local execution environment execution:virtualenv Related to Virtualenv execution environment dbt:parse Primarily related to dbt parse command or functionality labels Apr 24, 2024
@tatiana tatiana marked this pull request as ready for review April 24, 2024 10:38
@tatiana tatiana requested a review from a team as a code owner April 24, 2024 10:38
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:config Related to configuration, like YAML files, environment variables, or executer configuration area:parsing Related to parsing DAG/DBT improvement, issues, or fixes labels Apr 24, 2024
@tatiana
Copy link
Collaborator Author

tatiana commented May 1, 2024

Thanks a lot for the feedback, @pankajkoti ! I addressed all of them except for the minor #904 (comment), which can be improved in a follow-up ticket once you reply. Since you approved the PR, I'll mark this comment as resolved and merging .

@tatiana tatiana merged commit 458be0c into main May 1, 2024
57 checks passed
@tatiana tatiana deleted the cache_partial_parse branch May 1, 2024 14:51
tatiana added a commit that referenced this pull request May 1, 2024
Improves docs to highlight the limitation of the parsing parsing
approach (introduced in #800), following up on the feedback on #722 and the changes introduced in #904
@tatiana tatiana mentioned this pull request May 2, 2024
pankajkoti added a commit that referenced this pull request May 10, 2024
With the introduction of enabling partial parse in PR #904, upon
testing the implementation, it is observed that the seeds files
were not been able to be located as the partial parse file contained
a stale `root_path` from previous command runs. This PR attempts
to correct the `root_path` in the partial parse file by replacing
it with the needed project directory where the project files are
located.

closes: #937
pankajkoti added a commit that referenced this pull request May 10, 2024
With the introduction of enabling partial parse in PR #904, upon
testing the implementation, it is observed that the seeds files
were not been able to be located as the partial parse file contained
a stale `root_path` from previous command runs. This PR attempts
to correct the `root_path` in the partial parse file by replacing
it with the needed project directory where the project files are
located.

closes: #937
tatiana added a commit that referenced this pull request May 10, 2024
With the introduction of enabling partial parse in PR #904, 
upon testing the implementation, it is observed that the seeds 
files were not been able to be located as the partial parse file 
contained a stale `root_path` from previous command runs. 
This issue is observed on specific earlier versions of dbt-core like
`1.5.4` and `1.6.5`, but not on recent versions of dbt-core `1.5.8`,
`1.6.6`
and `1.7.0`. I am suspecting that PR
dbt-labs/dbt-core#8762
is likely the fix and the fix appears to be backported to later version 
releases of `1.5.x` and `1.6.x`.

However, irrespective of the dbt-core version, this PR attempts to 
correct the `root_path` in the partial parse file by replacing it with 
the needed project directory where the project files are located. 
And thus ensures that the feature runs correctly for older and 
newer versions of dbt-core.

closes: #937

---------

Co-authored-by: Tatiana Al-Chueyr <[email protected]>
tatiana added a commit that referenced this pull request May 13, 2024
Features

* Add dbt docs natively in Airflow via plugin by @dwreeves in #737
* Add support for ``InvocationMode.DBT_RUNNER`` for local execution mode
by @jbandoro in #850
* Support partial parsing to render DAGs faster when using
``ExecutionMode.LOCAL``, ``ExecutionMode.VIRTUALENV`` and
``LoadMode.DBT_LS`` by @dwreeves in #800
* Improve performance by 22-35% or more by caching partial parse
artefact by @tatiana in #904
* Add Azure Container Instance as Execution Mode by @danielvdende in
#771
* Add dbt build operators by @dylanharper-qz in #795
* Add dbt profile config variables to mapped profile by @ykuc in #794
* Add more template fields to ``DbtBaseOperator`` by @dwreeves in #786
* Add ``pip_install_options`` argument to operators by @octiva in #808

Bug fixes

* Make ``PostgresUserPasswordProfileMapping`` schema argument optional
by @FouziaTariq in #683
* Fix ``folder_dir`` not showing on logs for ``DbtDocsS3LocalOperator``
by @PrimOox in #856
* Improve ``dbt ls`` parsing resilience to missing tags/config by
@tatiana in #859
* Fix ``operator_args`` modified in place in Airflow converter by
@jbandoro in #835
* Fix Docker and Kubernetes operators execute method resolution by
@jbandoro in #849
* Fix ``TrinoBaseProfileMapping`` required parameter for non method
authentication by @AlexandrKhabarov in #921
* Fix global flags for lists by @ms32035 in #863
* Fix ``GoogleCloudServiceAccountDictProfileMapping`` when getting
values from the Airflow connection ``extra__`` keys by @glebkrapivin in
#923
* Fix using the dag as a keyword argument as ``specific_args_keys`` in
DbtTaskGroup by @tboutaour in #916
* Fix ACI integration (``DbtAzureContainerInstanceBaseOperator``) by
@danielvdende in #872
* Fix setting dbt project dir to the tmp dir by @dwreeves in #873
* Fix dbt docs operator to not use ``graph.gpickle`` file when
``--no-write-json`` is passed by @dwreeves in #883
* Make Pydantic a required dependency by @pankajkoti in #939
* Gracefully error if users try to ``emit_datasets`` with ``Airflow
2.9.0`` or ``2.9.1`` by @tatiana in #948
* Fix parsing tests that have no parents in #933 by @jlaneve
* Correct ``root_path`` in partial parse cache by @pankajkoti in #950

Docs

* Fix docs homepage link by @jlaneve in #860
* Fix docs ``ExecutionConfig.dbt_project_path`` by @jbandoro in #847
* Fix typo in MWAA getting started guide by @jlaneve in #846
* Fix typo related to exporting docs to GCS by @tboutaour in #922
* Improve partial parsing docs by @tatiana in #898
* Improve docs for datasets for airflow >= 2.4 by @SiddiqueAhmad in #879
* Improve test behaviour docs to highlight ``warning`` feature in the
``virtualenv`` mode by @mc51 in #910
* Fix docs typo by @SiddiqueAhmad in #917
* Improve Astro docs by @RNHTTR in #951

Others

* Add performance integration tests by @jlaneve in #827
* Enable ``append_env`` in ``operator_args`` by default by @tatiana in
#899
* Change default ``append_env`` behaviour depending on Cosmos
``ExecutionMode`` by @pankajkoti and @pankajastro in #954
* Expose the ``dbt`` graph in the ``DbtToAirflowConverter`` class by
@tommyjxl in #886
* Improve dbt docs plugin rendering padding by @dwreeves in #876
* Add ``connect_retries`` to databricks profile to fix expensive
integration failures by @jbandoro in #826
* Add import sorting (isort) to Cosmos by @jbandoro in #866
* Add Python 3.11 to CI/tests by @tatiana and @jbandoro in #821, #824
and #825
* Fix failing ``test_created_pod`` for
``apache-airflow-providers-cncf-kubernetes`` after v8.0.0 update by
@jbandoro in #854
* Extend ``DatabricksTokenProfileMapping`` test to include session
properties by @tatiana in #858
* Fix broken integration test uncovered from Pytest 8.0 update by
@jbandoro in #845
* Add Apache Airflow 2.9 to the test matrix by @tatiana in #940
* Replace deprecated ``DummyOperator`` by ``EmptyOperator`` if Airflow
>=2.4.0 by @tatiana in #900
* Improve logs to troubleshoot issue in 1.4.0a2 with astro-cli by
@tatiana in #947
* Fix issue when publishing a new release to PyPI by @tatiana in #946
* Pre-commit hook updates in #820, #834, #843 and #852, #890, #896,
#901, #905, #908, #919, #931, #941
@maslick
Copy link

maslick commented May 28, 2024

Enabling caching can quickly fill up the 30GB ephemeral storage on MWAA workers if there are 500 or more DAGs.

pankajkoti pushed a commit that referenced this pull request Jun 27, 2024
Partial parsing support was introduced in #800 and improved in #904
(caching). However, as the caching layer was introduced, we removed
support to use partial parsing if the cache was disabled.

This PR solves the issue.

Fix: #1041
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
…ct (astronomer#904)

Improve the performance to run the benchmark DAG with 100 tasks by 34%
and the benchmark DAG with 10 tasks by 22%, by persisting the dbt
partial parse artifact in Airflow nodes. This performance can be even
higher in the case of dbt projects that take more time to be parsed.

With the introduction of astronomer#800, Cosmos supports using dbt partial parsing
files. This feature has led to a substantial performance improvement,
particularly for large dbt projects, both during Airflow DAG parsing
(using LoadMode.DBT_LS) and also Airflow task execution (when using
`ExecutionMode.LOCAL` and `ExecutionMode.VIRTUALENV`).

There were two limitations with the initial support to partial parsing,
which the current PR aims to address:

1. DAGs using Cosmos `ProfileMapping` classes could not leverage this
feature. This is because the partial parsing relies on profile files not
changing, and by default, Cosmos would mock the dbt profile in several
parts of the code. The consequence is that users trying Cosmos 1.4.0a1
will see the following message:
```
13:33:16  Unable to do partial parsing because profile has changed
13:33:16  Unable to do partial parsing because env vars used in profiles.yml have changed
```

2. The user had to explicitly provide a `partial_parse.msgpack` file in
the original project folder for their Airflow deployment - and if, for
any reason, this became outdated, the user would not leverage the
partial parsing feature. Since Cosmos runs dbt tasks from within a
temporary directory, the partial parse would be stale for some users, it
would be updated in the temporary directory, but the next time the task
was run, Cosmos/dbt would not leverage the recently updated
`partial_parse.msgpack` file.

The current PR addresses these two issues respectfully by:

1. Allowing users that want to leverage Cosmos `ProfileMapping` and
partial parsing to use `RenderConfig(enable_mock_profile=False)`

2. Introducing a Cosmos cache directory where we are persisting partial
parsing files. This feature is enabled by default, but users can opt out
by setting the Airflow configuration `[cosmos][enable_cache] = False`
(exporting the environment variable `AIRFLOW__COSMOS__ENABLE_CACHE=0`).
Users can also define the temporary directory used to store these files
using the `[cosmos][cache_dir]` Airflow configuration. By default,
Cosmos will create and use a folder `cosmos` inside the system's
temporary directory:
https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir .

This PR affects both DAG parsing and task execution. Although it does
not introduce an optimisation per se, it makes the partial parse feature
implemented astronomer#800 available to more users.

Closes: astronomer#722

I updated the documentation in the PR: astronomer#898

Some future steps related to optimization associated to caching to be
addressed in separate PRs:
i. Change how we create mocked profiles, to create the file itself in
the same way, referencing an environment variable with the same name -
and only changing the value of the environment variable (astronomer#924)
ii. Extend caching to the `profiles.yml` created by Cosmos in the newly
introduced `tmp/cosmos` without the need to recreate it every time
(astronomer#925).
iii. Extend caching to the Airflow DAG/Task group as a pickle file -
this approach is more generic and would work for every type of DAG
parsing and executor. (astronomer#926)
iv. Support persisting/fetching the cache from remote storage so we
don't have to replicate it for every Airflow scheduler and worker node.
(astronomer#927)
v. Cache dbt deps lock file/avoid installing dbt steps every time. We
can leverage `package-lock.yml` introduced in dbt t 1.7
(https://docs.getdbt.com/reference/commands/deps#predictable-package-installs),
but ideally, we'd have a strategy to support older versions of dbt as
well. (astronomer#930)
vi. Support caching `partial_parse.msgpack` even when vars change:
https://medium.com/@sebastian.daum89/how-to-speed-up-single-dbt-invocations-when-using-changing-dbt-variables-b9d91ce3fb0d
vii. Support partial parsing in Docker and Kubernetes Cosmos executors
(astronomer#929)
viii. Centralise all the Airflow-based config into Cosmos settings.py &
create a dedicated docs page containing information about these (astronomer#928)

**How to validate this change**

Run the performance benchmark against this and the `main` branch,
checking the value of `/tmp/performance_results.txt`.

Example of commands run locally:

```
# Setup
AIRFLOW_HOME=`pwd` AIRFLOW_CONN_AIRFLOW_DB="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 hatch run tests.py3.11-2.7:test-performance-setup

# Run test for 100 dbt models per DAG:
MODEL_COUNT=100 AIRFLOW_HOME=`pwd` AIRFLOW_CONN_AIRFLOW_DB="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 hatch run tests.py3.11-2.7:test-performance
```

An example of output when running 100 with the main branch:
```
NUM_MODELS=100
TIME=114.18614888191223
MODELS_PER_SECOND=0.8757629623135543
DBT_VERSION=1.7.13
```

And with the current PR:
```
NUM_MODELS=100
TIME=75.17766404151917
MODELS_PER_SECOND=1.33018232576064
DBT_VERSION=1.7.13
```
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
Improves docs to highlight the limitation of the parsing parsing
approach (introduced in astronomer#800), following up on the feedback on astronomer#722 and the changes introduced in astronomer#904
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
With the introduction of enabling partial parse in PR astronomer#904, 
upon testing the implementation, it is observed that the seeds 
files were not been able to be located as the partial parse file 
contained a stale `root_path` from previous command runs. 
This issue is observed on specific earlier versions of dbt-core like
`1.5.4` and `1.6.5`, but not on recent versions of dbt-core `1.5.8`,
`1.6.6`
and `1.7.0`. I am suspecting that PR
dbt-labs/dbt-core#8762
is likely the fix and the fix appears to be backported to later version 
releases of `1.5.x` and `1.6.x`.

However, irrespective of the dbt-core version, this PR attempts to 
correct the `root_path` in the partial parse file by replacing it with 
the needed project directory where the project files are located. 
And thus ensures that the feature runs correctly for older and 
newer versions of dbt-core.

closes: astronomer#937

---------

Co-authored-by: Tatiana Al-Chueyr <[email protected]>
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
Features

* Add dbt docs natively in Airflow via plugin by @dwreeves in astronomer#737
* Add support for ``InvocationMode.DBT_RUNNER`` for local execution mode
by @jbandoro in astronomer#850
* Support partial parsing to render DAGs faster when using
``ExecutionMode.LOCAL``, ``ExecutionMode.VIRTUALENV`` and
``LoadMode.DBT_LS`` by @dwreeves in astronomer#800
* Improve performance by 22-35% or more by caching partial parse
artefact by @tatiana in astronomer#904
* Add Azure Container Instance as Execution Mode by @danielvdende in
astronomer#771
* Add dbt build operators by @dylanharper-qz in astronomer#795
* Add dbt profile config variables to mapped profile by @ykuc in astronomer#794
* Add more template fields to ``DbtBaseOperator`` by @dwreeves in astronomer#786
* Add ``pip_install_options`` argument to operators by @octiva in astronomer#808

Bug fixes

* Make ``PostgresUserPasswordProfileMapping`` schema argument optional
by @FouziaTariq in astronomer#683
* Fix ``folder_dir`` not showing on logs for ``DbtDocsS3LocalOperator``
by @PrimOox in astronomer#856
* Improve ``dbt ls`` parsing resilience to missing tags/config by
@tatiana in astronomer#859
* Fix ``operator_args`` modified in place in Airflow converter by
@jbandoro in astronomer#835
* Fix Docker and Kubernetes operators execute method resolution by
@jbandoro in astronomer#849
* Fix ``TrinoBaseProfileMapping`` required parameter for non method
authentication by @AlexandrKhabarov in astronomer#921
* Fix global flags for lists by @ms32035 in astronomer#863
* Fix ``GoogleCloudServiceAccountDictProfileMapping`` when getting
values from the Airflow connection ``extra__`` keys by @glebkrapivin in
astronomer#923
* Fix using the dag as a keyword argument as ``specific_args_keys`` in
DbtTaskGroup by @tboutaour in astronomer#916
* Fix ACI integration (``DbtAzureContainerInstanceBaseOperator``) by
@danielvdende in astronomer#872
* Fix setting dbt project dir to the tmp dir by @dwreeves in astronomer#873
* Fix dbt docs operator to not use ``graph.gpickle`` file when
``--no-write-json`` is passed by @dwreeves in astronomer#883
* Make Pydantic a required dependency by @pankajkoti in astronomer#939
* Gracefully error if users try to ``emit_datasets`` with ``Airflow
2.9.0`` or ``2.9.1`` by @tatiana in astronomer#948
* Fix parsing tests that have no parents in astronomer#933 by @jlaneve
* Correct ``root_path`` in partial parse cache by @pankajkoti in astronomer#950

Docs

* Fix docs homepage link by @jlaneve in astronomer#860
* Fix docs ``ExecutionConfig.dbt_project_path`` by @jbandoro in astronomer#847
* Fix typo in MWAA getting started guide by @jlaneve in astronomer#846
* Fix typo related to exporting docs to GCS by @tboutaour in astronomer#922
* Improve partial parsing docs by @tatiana in astronomer#898
* Improve docs for datasets for airflow >= 2.4 by @SiddiqueAhmad in astronomer#879
* Improve test behaviour docs to highlight ``warning`` feature in the
``virtualenv`` mode by @mc51 in astronomer#910
* Fix docs typo by @SiddiqueAhmad in astronomer#917
* Improve Astro docs by @RNHTTR in astronomer#951

Others

* Add performance integration tests by @jlaneve in astronomer#827
* Enable ``append_env`` in ``operator_args`` by default by @tatiana in
astronomer#899
* Change default ``append_env`` behaviour depending on Cosmos
``ExecutionMode`` by @pankajkoti and @pankajastro in astronomer#954
* Expose the ``dbt`` graph in the ``DbtToAirflowConverter`` class by
@tommyjxl in astronomer#886
* Improve dbt docs plugin rendering padding by @dwreeves in astronomer#876
* Add ``connect_retries`` to databricks profile to fix expensive
integration failures by @jbandoro in astronomer#826
* Add import sorting (isort) to Cosmos by @jbandoro in astronomer#866
* Add Python 3.11 to CI/tests by @tatiana and @jbandoro in astronomer#821, astronomer#824
and astronomer#825
* Fix failing ``test_created_pod`` for
``apache-airflow-providers-cncf-kubernetes`` after v8.0.0 update by
@jbandoro in astronomer#854
* Extend ``DatabricksTokenProfileMapping`` test to include session
properties by @tatiana in astronomer#858
* Fix broken integration test uncovered from Pytest 8.0 update by
@jbandoro in astronomer#845
* Add Apache Airflow 2.9 to the test matrix by @tatiana in astronomer#940
* Replace deprecated ``DummyOperator`` by ``EmptyOperator`` if Airflow
>=2.4.0 by @tatiana in astronomer#900
* Improve logs to troubleshoot issue in 1.4.0a2 with astro-cli by
@tatiana in astronomer#947
* Fix issue when publishing a new release to PyPI by @tatiana in astronomer#946
* Pre-commit hook updates in astronomer#820, astronomer#834, astronomer#843 and astronomer#852, astronomer#890, astronomer#896,
astronomer#901, astronomer#905, astronomer#908, astronomer#919, astronomer#931, astronomer#941
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
Partial parsing support was introduced in astronomer#800 and improved in astronomer#904
(caching). However, as the caching layer was introduced, we removed
support to use partial parsing if the cache was disabled.

This PR solves the issue.

Fix: astronomer#1041
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration area:parsing Related to parsing DAG/DBT improvement, issues, or fixes area:performance Related to performance, like memory usage, CPU usage, speed, etc dbt:parse Primarily related to dbt parse command or functionality execution:local Related to Local execution environment execution:virtualenv Related to Virtualenv execution environment lgtm This PR has been approved by a maintainer parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tasks are running full parse each time
6 participants