Docs: Grammar files 20-40 (#1847)

* files 20-40 * Update docs/website/docs/general-usage/http/rest-client.md Co-authored-by: Violetta Mishechkina <[email protected]> * Update docs/website/docs/general-usage/http/rest-client.md Co-authored-by: Violetta Mishechkina <[email protected]> * Update docs/website/docs/general-usage/http/rest-client.md Co-authored-by: Violetta Mishechkina <[email protected]> * fix snippet --------- Co-authored-by: Violetta Mishechkina <[email protected]>
dlt-hub · Sep 20, 2024 · 875bf29 · 875bf29
1 parent 9a9bdf7
commit 875bf29
Show file tree

Hide file tree

Showing 20 changed files with 293 additions and 310 deletions.
diff --git a/docs/website/docs/dlt-ecosystem/staging.md b/docs/website/docs/dlt-ecosystem/staging.md
@@ -16,9 +16,9 @@ Such a staging dataset has the same name as the dataset passed to `dlt.pipeline`
 [destination.postgres]
 staging_dataset_name_layout="staging_%s"
 ```
-The entry above switches the pattern to `staging_` prefix and for example, for a dataset with the name **github_data**, `dlt` will create **staging_github_data**.
+The entry above switches the pattern to a `staging_` prefix and, for example, for a dataset with the name **github_data**, `dlt` will create **staging_github_data**.
 
-To configure a static staging dataset name, you can do the following (we use the destination factory)
+To configure a static staging dataset name, you can do the following (we use the destination factory):
 ```py
 import dlt
 
@@ -41,21 +41,21 @@ truncate_staging_dataset=true
 Currently, only one destination, the [filesystem](destinations/filesystem.md), can be used as staging. The following destinations can copy remote files:
 
 1. [Azure Synapse](destinations/synapse#staging-support)
-1. [Athena](destinations/athena#staging-support)
-1. [Bigquery](destinations/bigquery.md#staging-support)
-1. [Dremio](destinations/dremio#staging-support)
-1. [Redshift](destinations/redshift.md#staging-support)
-1. [Snowflake](destinations/snowflake.md#staging-support)
+2. [Athena](destinations/athena#staging-support)
+3. [Bigquery](destinations/bigquery.md#staging-support)
+4. [Dremio](destinations/dremio#staging-support)
+5. [Redshift](destinations/redshift.md#staging-support)
+6. [Snowflake](destinations/snowflake.md#staging-support)
 
 ### How to use
-In essence, you need to set up two destinations and then pass them to `dlt.pipeline`. Below we'll use `filesystem` staging with `parquet` files to load into the `Redshift` destination.
+In essence, you need to set up two destinations and then pass them to `dlt.pipeline`. Below, we'll use `filesystem` staging with `parquet` files to load into the `Redshift` destination.
 
 1. **Set up the S3 bucket and filesystem staging.**
 
     Please follow our guide in the [filesystem destination documentation](destinations/filesystem.md). Test the staging as a standalone destination to make sure that files go where you want them. In your `secrets.toml`, you should now have a working `filesystem` configuration:
     ```toml
     [destination.filesystem]
-    bucket_url = "s3://[your_bucket_name]" # replace with your bucket name,
+    bucket_url = "s3://[your_bucket_name]" # replace with your bucket name
 
     [destination.filesystem.credentials]
     aws_access_key_id = "please set me up!" # copy the access key here
@@ -88,7 +88,7 @@ In essence, you need to set up two destinations and then pass them to `dlt.pipel
         dataset_name='player_data'
     )
     ```
-    `dlt` will automatically select an appropriate loader file format for the staging files. Below we explicitly specify the `parquet` file format (just to demonstrate how to do it):
+    `dlt` will automatically select an appropriate loader file format for the staging files. Below, we explicitly specify the `parquet` file format (just to demonstrate how to do it):
     ```py
     info = pipeline.run(chess(), loader_file_format="parquet")
     ```
@@ -103,15 +103,15 @@ Please note that `dlt` does not delete loaded files from the staging storage aft
 
 ### How to prevent staging files truncation
 
-Before `dlt` loads data to the staging storage, it truncates previously loaded files. To prevent it and keep the whole history
-of loaded files, you can use the following parameter:
+Before `dlt` loads data to the staging storage, it truncates previously loaded files. To prevent this and keep the whole history of loaded files, you can use the following parameter:
 
 ```toml
 [destination.redshift]
 truncate_table_before_load_on_staging_destination=false
 ```
 
 :::caution
-The [Athena](destinations/athena#staging-support) destination only truncates not iceberg tables with `replace` merge_disposition.
+The [Athena](destinations/athena#staging-support) destination only truncates non-iceberg tables with `replace` merge_disposition.
 Therefore, the parameter `truncate_table_before_load_on_staging_destination` only controls the truncation of corresponding files for these tables.
 :::
+
diff --git a/docs/website/docs/dlt-ecosystem/table-formats/delta.md b/docs/website/docs/dlt-ecosystem/table-formats/delta.md
@@ -6,8 +6,9 @@ keywords: [delta, table formats]
 
 # Delta table format
 
-[Delta](https://delta.io/) is an open source table format. `dlt` can store data as Delta tables.
+[Delta](https://delta.io/) is an open-source table format. `dlt` can store data as Delta tables.
 
-## Supported Destinations
+## Supported destinations
 
 Supported by: **Databricks**, **filesystem**
+
diff --git a/docs/website/docs/dlt-ecosystem/table-formats/iceberg.md b/docs/website/docs/dlt-ecosystem/table-formats/iceberg.md
@@ -6,8 +6,9 @@ keywords: [iceberg, table formats]
 
 # Iceberg table format
 
-[Iceberg](https://iceberg.apache.org/) is an open source table format. `dlt` can store data as Iceberg tables.
+[Iceberg](https://iceberg.apache.org/) is an open-source table format. `dlt` can store data as Iceberg tables.
 
-## Supported Destinations
+## Supported destinations
 
 Supported by: **Athena**
+
diff --git a/docs/website/docs/dlt-ecosystem/transformations/dbt/dbt.md b/docs/website/docs/dlt-ecosystem/transformations/dbt/dbt.md
@@ -6,8 +6,7 @@ keywords: [transform, dbt, runner]
 
 # Transform the data with dbt
 
-[dbt](https://github.com/dbt-labs/dbt-core) is a framework that allows for the simple structuring of your transformations into DAGs. The benefits of
-using dbt include:
+[dbt](https://github.com/dbt-labs/dbt-core) is a framework that allows for the simple structuring of your transformations into DAGs. The benefits of using dbt include:
 
 - End-to-end cross-db compatibility for dlt→dbt pipelines.
 - Ease of use by SQL analysts, with a low learning curve.
@@ -20,21 +19,19 @@ You can run dbt with `dlt` by using the dbt runner.
 
 The dbt runner:
 
-- Can create a virtual env for dbt on the fly;
+- Can create a virtual environment for dbt on the fly;
 - Can run a dbt package from online sources (e.g., GitHub) or from local files;
-- Passes configuration and credentials to dbt, so you do not need to handle them separately from
-  `dlt`, enabling dbt to configure on the fly.
+- Passes configuration and credentials to dbt, so you do not need to handle them separately from `dlt`, enabling dbt to configure on the fly.
 
 ## How to use the dbt runner
 
-For an example of how to use the dbt runner, see the
-[jaffle shop example](https://github.com/dlt-hub/dlt/blob/devel/docs/examples/archive/dbt_run_jaffle.py).
+For an example of how to use the dbt runner, see the [jaffle shop example](https://github.com/dlt-hub/dlt/blob/devel/docs/examples/archive/dbt_run_jaffle.py).
 Included below is another example where we run a `dlt` pipeline and then a dbt package via `dlt`:
 
 > 💡 Docstrings are available to read in your IDE.
 
 ```py
-# load all pipedrive endpoints to pipedrive_raw dataset
+# Load all Pipedrive endpoints to the pipedrive_raw dataset
 pipeline = dlt.pipeline(
     pipeline_name='pipedrive',
     destination='bigquery',
@@ -45,38 +42,38 @@ load_info = pipeline.run(pipedrive_source())
 print(load_info)
 
 # Create a transformation on a new dataset called 'pipedrive_dbt'
-# we created a local dbt package
+# We created a local dbt package
 # and added pipedrive_raw to its sources.yml
-# the destination for the transformation is passed in the pipeline
+# The destination for the transformation is passed in the pipeline
 pipeline = dlt.pipeline(
     pipeline_name='pipedrive',
     destination='bigquery',
     dataset_name='pipedrive_dbt'
 )
 
-# make or restore venv for dbt, using latest dbt version
-# NOTE: if you have dbt installed in your current environment, just skip this line
+# Make or restore venv for dbt, using the latest dbt version
+# NOTE: If you have dbt installed in your current environment, just skip this line
 #       and the `venv` argument to dlt.dbt.package()
 venv = dlt.dbt.get_venv(pipeline)
 
-# get runner, optionally pass the venv
+# Get runner, optionally pass the venv
 dbt = dlt.dbt.package(
     pipeline,
     "pipedrive/dbt_pipedrive/pipedrive",
     venv=venv
 )
 
-# run the models and collect any info
-# If running fails, the error will be raised with full stack trace
+# Run the models and collect any info
+# If running fails, the error will be raised with a full stack trace
 models = dbt.run_all()
 
-# on success print outcome
+# On success, print the outcome
 for m in models:
     print(
         f"Model {m.model_name} materialized" +
-        f"in {m.time}" +
-        f"with status {m.status}" +
-        f"and message {m.message}"
+        f" in {m.time}" +
+        f" with status {m.status}" +
+        f" and message {m.message}"
     )
 ```
 
@@ -86,18 +83,18 @@ It assumes that dbt is installed in the current Python environment and the `prof
 <!--@@@DLT_SNIPPET ./dbt-snippets.py::run_dbt_standalone-->
 
 
-Here's an example **duckdb** profile
+Here's an example **duckdb** profile:
 ```yaml
 config:
-  # do not track usage, do not create .user.yml
+  # Do not track usage, do not create .user.yml
   send_anonymous_usage_stats: False
 
 duckdb_dlt_dbt_test:
   target: analytics
   outputs:
     analytics:
       type: duckdb
-      # schema: "{{ var('destination_dataset_name', var('source_dataset_name')) }}"
+      # Schema: "{{ var('destination_dataset_name', var('source_dataset_name')) }}"
       path: "duckdb_dlt_dbt_test.duckdb"
       extensions:
         - httpfs
@@ -108,8 +105,8 @@ You can run the example with dbt debug log: `RUNTIME__LOG_LEVEL=DEBUG python dbt
 
 ## Other transforming tools
 
-If you want to transform the data before loading, you can use Python. If you want to transform the
-data after loading, you can use dbt or one of the following:
+If you want to transform the data before loading, you can use Python. If you want to transform the data after loading, you can use dbt or one of the following:
 
 1. [`dlt` SQL client.](../sql.md)
 2. [Pandas.](../pandas.md)
+
diff --git a/docs/website/docs/dlt-ecosystem/transformations/dbt/dbt_cloud.md b/docs/website/docs/dlt-ecosystem/transformations/dbt/dbt_cloud.md
@@ -4,11 +4,11 @@ description: Transforming the data loaded by a dlt pipeline with dbt Cloud
 keywords: [transform, sql]
 ---
 
-# DBT Cloud Client and Helper Functions
+# dbt Cloud client and helper functions
 
-## API Client
+## API client
 
-The DBT Cloud Client is a Python class designed to interact with the dbt Cloud API (version 2).
+The dbt Cloud Client is a Python class designed to interact with the dbt Cloud API (version 2).
 It provides methods to perform various operations on dbt Cloud, such as triggering job runs and retrieving job run statuses.
 
 ```py
@@ -26,7 +26,7 @@ run_status = client.get_run_status(run_id=job_run_id)
 print(f"Job run status: {run_status['status_humanized']}")
 ```
 
-## Helper Functions
+## Helper functions
 
 These Python functions provide an interface to interact with the dbt Cloud API.
 They simplify the process of triggering and monitoring job runs in dbt Cloud.
@@ -65,11 +65,11 @@ from dlt.helpers.dbt_cloud import get_dbt_cloud_run_status
 status = get_dbt_cloud_run_status(run_id=1234, wait_for_outcome=True)
 ```
 
-## Set Credentials
+## Set credentials
 
 ### secrets.toml
 
-When using a dlt locally, we recommend using the `.dlt/secrets.toml` method to set credentials.
+When using dlt locally, we recommend using the `.dlt/secrets.toml` method to set credentials.
 
 If you used the `dlt init` command, then the `.dlt` folder has already been created.
 Otherwise, create a `.dlt` folder in your working directory and a `secrets.toml` file inside it.
@@ -86,9 +86,9 @@ job_id = "set me up!" # optional only for the run_dbt_cloud_job function (you ca
 run_id = "set me up!" # optional for the get_dbt_cloud_run_status function (you can pass this explicitly as an argument to the function)
 ```
 
-### Environment Variables
+### Environment variables
 
-`dlt` supports reading credentials from the environment.
+dlt supports reading credentials from the environment.
 
 If dlt tries to read this from environment variables, it will use a different naming convention.
 
@@ -103,3 +103,4 @@ DBT_CLOUD__JOB_ID
 ```
 
 For more information, read the [Credentials](../../../general-usage/credentials) documentation.
+
diff --git a/docs/website/docs/dlt-ecosystem/transformations/pandas.md b/docs/website/docs/dlt-ecosystem/transformations/pandas.md
@@ -4,7 +4,7 @@ description: Transform the data loaded by a dlt pipeline with Pandas
 keywords: [transform, pandas]
 ---
 
-# Transform the Data with Pandas
+# Transform the data with Pandas
 
 You can fetch the results of any SQL query as a dataframe. If the destination supports that
 natively (i.e., BigQuery and DuckDB), `dlt` uses the native method. Thanks to this, reading
@@ -22,7 +22,7 @@ with pipeline.sql_client() as client:
     with client.execute_query(
         'SELECT "reactions__+1", "reactions__-1", reactions__laugh, reactions__hooray, reactions__rocket FROM issues'
     ) as table:
-        # calling `df` on a cursor, returns the data as a data frame
+        # calling `df` on a cursor returns the data as a data frame
         reactions = table.df()
 counts = reactions.sum(0).sort_values(0, ascending=False)
 ```
@@ -32,10 +32,11 @@ chunks by passing the `chunk_size` argument to the `df` method.
 
 Once your data is in a Pandas dataframe, you can transform it as needed.
 
-## Other Transforming Tools
+## Other transforming tools
 
 If you want to transform the data before loading, you can use Python. If you want to transform the
 data after loading, you can use Pandas or one of the following:
 
 1. [dbt.](dbt/dbt.md) (recommended)
 2. [`dlt` SQL client.](sql.md)
+
diff --git a/docs/website/docs/dlt-ecosystem/transformations/sql.md b/docs/website/docs/dlt-ecosystem/transformations/sql.md
@@ -36,7 +36,7 @@ try:
             "SELECT id, name, email FROM customers WHERE id = %s",
             10
         )
-        # prints column values of the first row
+        # Prints column values of the first row
         print(res[0])
 except Exception:
     ...
@@ -48,4 +48,5 @@ If you want to transform the data before loading, you can use Python. If you wan
 data after loading, you can use SQL or one of the following:
 
 1. [dbt](dbt/dbt.md) (recommended).
-2. [Pandas.](pandas.md)
+2. [Pandas](pandas.md).
+
diff --git a/docs/website/docs/general-usage/credentials/advanced.md b/docs/website/docs/general-usage/credentials/advanced.md
@@ -26,7 +26,7 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen
   ```
   `dlt` allows the user to specify the argument `pipedrive_api_key` explicitly if, for some reason, they do not want to use [out-of-the-box options](setup) for credentials management.
 
-1. Required arguments (without default values) **are never injected** and must be specified when calling. For example, for the source:
+2. Required arguments (without default values) **are never injected** and must be specified when calling. For example, for the source:
 
   ```py
   @dlt.source
@@ -35,7 +35,7 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen
   ```
   The argument `channels_list` would not be injected and will output an error if it is not specified explicitly.
 
-1. Arguments with default values are injected if present in config providers. Otherwise, defaults from the function signature are used. For example, for the source:
+3. Arguments with default values are injected if present in config providers. Otherwise, defaults from the function signature are used. For example, for the source:
 
   ```py
   @dlt.source
@@ -48,7 +48,7 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen
   ```
   `dlt` firstly searches for all three arguments: `page_size`, `access_token`, and `start_date` in config providers in a [specific order](setup). If it cannot find them, it will use the default values.
 
-1. Arguments with the special default value `dlt.secrets.value` and `dlt.config.value` **must be injected**
+4. Arguments with the special default value `dlt.secrets.value` and `dlt.config.value` **must be injected**
    (or explicitly passed). If they are not found by the config providers, the code raises an
    exception. The code in the functions always receives those arguments.
 
@@ -58,12 +58,12 @@ keywords: [credentials, secrets.toml, secrets, config, configuration, environmen
 
 We highly recommend adding types to your function signatures.
 The effort is very low, and it gives `dlt` much more
-information on what source/resource expects.
+information on what the source or resource expects.
 
 Doing so provides several benefits:
 
-1. You'll never receive the invalid data types in your code.
-1. `dlt` will automatically parse and coerce types for you, so you don't need to parse it yourself.
+1. You'll never receive invalid data types in your code.
+1. `dlt` will automatically parse and coerce types for you, so you don't need to parse them yourself.
 1. `dlt` can generate sample config and secret files for your source automatically.
 1. You can request [built-in and custom credentials](complex_types) (i.e., connection strings, AWS / GCP / Azure credentials).
 1. You can specify a set of possible types via `Union`, i.e., OAuth or API Key authorization.
@@ -94,7 +94,7 @@ Now,
 ## Toml files structure
 
 `dlt` arranges the sections of [toml files](setup/#secretstoml-and-configtoml) into a **default layout** that is expected by the [injection mechanism](#injection-mechanism).
-This layout makes it easy to configure simple cases but also provides a room for more explicit sections and complex cases, i.e., having several sources with different credentials
+This layout makes it easy to configure simple cases but also provides room for more explicit sections and complex cases, i.e., having several sources with different credentials
 or even hosting several pipelines in the same project sharing the same config and credentials.
 
 ```text
@@ -158,7 +158,7 @@ dlt.config["sheet_id"] = "23029402349032049"
 dlt.secrets["destination.postgres.credentials"] = BaseHook.get_connection('postgres_dsn').extra
 ```
 
-Will mock the `toml` provider to desired values.
+This will mock the `toml` provider to desired values.
 
 ## Example
 
@@ -173,7 +173,7 @@ def google_sheets(
     credentials=dlt.secrets.value,
     only_strings=False
 ):
-    # Allow both a dictionary and a string passed as credentials
+    # Allow both a dictionary and a string to be passed as credentials
     if isinstance(credentials, str):
         credentials = json.loads(credentials)
     # Allow both a list and a comma-delimited string to be passed as tabs
@@ -200,4 +200,5 @@ In the example above:
 :::tip
 `dlt.resource` behaves in the same way, so if you have a [standalone resource](../resource.md#declare-a-standalone-resource) (one that is not an inner function
 of a **source**)
-:::
+:::
+