From a8f3a279e7b42ed4c3c9ccc0a63605c9b2b69b5d Mon Sep 17 00:00:00 2001 From: Honza Javorek Date: Mon, 17 Jun 2024 10:42:18 +0200 Subject: [PATCH 1/3] improve vale rules for 'actors' and 'actor' --- .github/styles/Apify/Capitalization.yml | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/.github/styles/Apify/Capitalization.yml b/.github/styles/Apify/Capitalization.yml index e16029ef5..5ff2470d2 100644 --- a/.github/styles/Apify/Capitalization.yml +++ b/.github/styles/Apify/Capitalization.yml @@ -3,6 +3,18 @@ message: "The word '%s' should always be capitalized." ignorecase: false level: error tokens: - - '(? Date: Mon, 17 Jun 2024 10:42:45 +0200 Subject: [PATCH 2/3] capitalize 'actors' and 'actor' --- sources/academy/glossary/tools/apify_cli.md | 2 +- sources/academy/platform/apify_platform.md | 4 +- .../platform/deploying_your_code/deploying.md | 24 ++++----- .../deploying_your_code/docker_file.md | 24 ++++----- .../platform/deploying_your_code/index.md | 20 +++---- .../deploying_your_code/input_schema.md | 26 ++++----- .../deploying_your_code/inputs_outputs.md | 20 +++---- .../deploying_your_code/output_schema.md | 12 ++--- .../actors_webhooks.md | 32 +++++------ .../apify_api_and_client.md | 10 ++-- .../bypassing_anti_scraping.md | 8 +-- .../expert_scraping_with_apify/index.md | 12 ++--- .../managing_source_code.md | 14 ++--- .../migrations_maintaining_state.md | 24 ++++----- .../saving_useful_stats.md | 12 ++--- .../solutions/handling_migrations.md | 24 ++++----- .../solutions/integrating_webhooks.md | 48 ++++++++--------- .../solutions/managing_source.md | 2 +- .../solutions/rotating_proxies.md | 4 +- .../solutions/saving_stats.md | 10 ++-- .../solutions/using_api_and_client.md | 6 +-- .../solutions/using_storage_creating_tasks.md | 14 ++--- .../tasks_and_storage.md | 20 +++---- .../platform/get_most_of_actors/index.md | 12 ++--- .../get_most_of_actors/naming_your_actor.md | 30 +++++------ .../get_most_of_actors/seo_and_promotion.md | 2 +- .../platform/getting_started/actors.md | 28 +++++----- .../platform/getting_started/apify_api.md | 20 +++---- .../platform/getting_started/apify_client.md | 18 +++---- .../getting_started/creating_actors.md | 4 +- .../academy/platform/getting_started/index.md | 6 +-- .../getting_started/inputs_outputs.md | 28 +++++----- .../academy/platform/running_a_web_server.md | 26 ++++----- .../run_actor_and_retrieve_data_via_api.md | 30 +++++------ .../apify_scrapers/cheerio_scraper.md | 48 ++++++++--------- .../apify_scrapers/getting_started.md | 54 +++++++++---------- .../academy/tutorials/apify_scrapers/index.md | 4 +- .../apify_scrapers/puppeteer_scraper.md | 34 ++++++------ .../tutorials/apify_scrapers/web_scraper.md | 34 ++++++------ .../avoid_eacces_error_in_actor_builds.md | 6 +-- .../filter_blocked_requests_using_sessions.md | 12 ++--- .../how_to_save_screenshots_puppeteer.md | 6 +-- .../node_js/request_labels_in_apify_actors.md | 4 +- .../node_js/scraping_from_sitemaps.md | 2 +- .../scraping_urls_list_from_google_sheets.md | 10 ++-- .../submitting_form_with_file_attachment.md | 4 +- .../node_js/when_to_use_puppeteer_scraper.md | 2 +- .../tutorials/php/using_apify_from_php.md | 34 ++++++------ .../python/process_data_using_python.md | 26 ++++----- .../tutorials/python/scrape_data_python.md | 50 ++++++++--------- .../anti_scraping/techniques/captchas.md | 2 +- .../puppeteer_playwright/proxies.md | 2 +- .../challenge/index.md | 2 +- 53 files changed, 456 insertions(+), 456 deletions(-) diff --git a/sources/academy/glossary/tools/apify_cli.md b/sources/academy/glossary/tools/apify_cli.md index e55346d7b..a9de96ffa 100644 --- a/sources/academy/glossary/tools/apify_cli.md +++ b/sources/academy/glossary/tools/apify_cli.md @@ -11,7 +11,7 @@ slug: /tools/apify-cli --- -The [Apify CLI](/cli) helps you create, develop, build and run Apify actors, and manage the Apify cloud platform from any computer. It can be used to automatically generate the boilerplate for different types of projects, initialize projects, remotely call actors on the platform, and run your own projects. +The [Apify CLI](/cli) helps you create, develop, build and run Apify Actors, and manage the Apify cloud platform from any computer. It can be used to automatically generate the boilerplate for different types of projects, initialize projects, remotely call Actors on the platform, and run your own projects. ## Installing {#installing} diff --git a/sources/academy/platform/apify_platform.md b/sources/academy/platform/apify_platform.md index 9f57dc0d6..93aa7c774 100644 --- a/sources/academy/platform/apify_platform.md +++ b/sources/academy/platform/apify_platform.md @@ -12,11 +12,11 @@ slug: /apify-platform --- -The [Apify platform](https://apify.com) was built to serve large-scale and high-performance web scraping and automation needs. It provides easy access to compute instances ([actors](./getting_started/actors.md)), convenient request and result storages, proxies, scheduling, webhooks and more - all accessible through the **Console** web interface, [Apify's API](/api/v2), or our [JavaScript](/api/client/js) and [Python](/api/client/python) API clients. +The [Apify platform](https://apify.com) was built to serve large-scale and high-performance web scraping and automation needs. It provides easy access to compute instances ([Actors](./getting_started/actors.md)), convenient request and result storages, proxies, scheduling, webhooks and more - all accessible through the **Console** web interface, [Apify's API](/api/v2), or our [JavaScript](/api/client/js) and [Python](/api/client/python) API clients. ## Category outline {#this-category} -In this category, you'll learn how to become an Apify platform developer from the ground up. From creating your first account, to developing actors, this is your one-stop-shop for understanding how the platform works, and how to work with it. +In this category, you'll learn how to become an Apify platform developer from the ground up. From creating your first account, to developing Actors, this is your one-stop-shop for understanding how the platform works, and how to work with it. ## First up {#first} diff --git a/sources/academy/platform/deploying_your_code/deploying.md b/sources/academy/platform/deploying_your_code/deploying.md index 579de78e2..8e6b2c89c 100644 --- a/sources/academy/platform/deploying_your_code/deploying.md +++ b/sources/academy/platform/deploying_your_code/deploying.md @@ -1,27 +1,27 @@ --- title: Deploying -description: Push local code to the platform, or create a new actor on the console and integrate it with a Git repo to optionally automatically rebuild any new changes. +description: Push local code to the platform, or create a new Actor on the console and integrate it with a Git repo to optionally automatically rebuild any new changes. sidebar_position: 5 slug: /deploying-your-code/deploying --- # Deploying {#deploying} -**Push local code to the platform, or create a new actor on the console and integrate it with a Git repo to optionally automatically rebuild any new changes.** +**Push local code to the platform, or create a new Actor on the console and integrate it with a Git repo to optionally automatically rebuild any new changes.** --- -Once you've **actorified** your code, there are two ways to deploy it to the Apify platform. You can either push the code directly from your local machine onto the platform, or you can create a blank actor in the web interface, and then integrate its source code with a GitHub repository. +Once you've **actorified** your code, there are two ways to deploy it to the Apify platform. You can either push the code directly from your local machine onto the platform, or you can create a blank Actor in the web interface, and then integrate its source code with a GitHub repository. ## With a Git repository {#with-git-repository} Before we deploy our project onto the Apify platform, let's ensure that we've pushed the changes we made in the last 3 lessons into our remote GitHub repository. -> The benefit of using this method is that any time you push to the Git repo, the code on the platform is also updated and the actor is automatically rebuilt. Also, you don't have to use a GitHub repository - you can use GitLab or any other service you'd like. +> The benefit of using this method is that any time you push to the Git repo, the code on the platform is also updated and the Actor is automatically rebuilt. Also, you don't have to use a GitHub repository - you can use GitLab or any other service you'd like. -### Creating the actor +### Creating the Actor -Before anything can be integrated, we've gotta create a new actor. Luckily, this is super easy to do. Let's head over to our [Apify Console](https://console.apify.com?asrc=developers_portal) and click on the **New** button, then select the **Empty** template. +Before anything can be integrated, we've gotta create a new Actor. Luckily, this is super easy to do. Let's head over to our [Apify Console](https://console.apify.com?asrc=developers_portal) and click on the **New** button, then select the **Empty** template. ![Create new button](../getting_started/images/create-new-actor.png) @@ -29,7 +29,7 @@ Easy peasy! ### Changing source code location {#change-source-code} -In the **Source** tab on the new actor's page, we'll click the dropdown menu under **Source code** and select **Git repository**. By default, this is set to **Web IDE**. +In the **Source** tab on the new Actor's page, we'll click the dropdown menu under **Source code** and select **Git repository**. By default, this is set to **Web IDE**. ![Select source code location](../expert_scraping_with_apify/images/select-source-location.png) @@ -37,7 +37,7 @@ Now we'll paste the link to our GitHub repository into the **Git URL** text fiel ### Adding the webhook to the repository {#adding-repo-webhook} -The final step is to click on **API** in the top right corner of our actor's page: +The final step is to click on **API** in the top right corner of our Actor's page: ![API button](../expert_scraping_with_apify/images/api-button.jpg) @@ -55,15 +55,15 @@ If you're logged in to the Apify CLI, the `apify push` command can be used to pu One important thing to note is that you can use a `.gitignore` file to exclude files from being pushed. When you use `apify push` without a `.gitignore`, the full folder contents will be pushed, meaning that even the **storage** and **node_modules** will be pushed. These files are unnecessary to push, as they are both generated on the platform. -> The `apify push` command should only really be used for quickly pushing and testing actors on the platform during development. If you are ready to make your actor public, use a Git repository instead, as you will reap the benefits of using Git and others will be able to contribute to the project. +> The `apify push` command should only really be used for quickly pushing and testing Actors on the platform during development. If you are ready to make your Actor public, use a Git repository instead, as you will reap the benefits of using Git and others will be able to contribute to the project. ## Deployed! {#deployed} -Great! Once you've pushed your actor to the platform, you should see it in the list of actors under the **Actors** tab. If you used `apify push`, you'll have access to the **multifile editor** (discussed [here](../getting_started/creating_actors.md)). +Great! Once you've pushed your Actor to the platform, you should see it in the list of Actors under the **Actors** tab. If you used `apify push`, you'll have access to the **multifile editor** (discussed [here](../getting_started/creating_actors.md)). -![Deployed actor on the Apify platform](./images/actor-page.jpg) +![Deployed Actor on the Apify platform](./images/actor-page.jpg) -The next step is to test your actor and experiment with the vast amount of features the platform has to offer. +The next step is to test your Actor and experiment with the vast amount of features the platform has to offer. ## Wrap up {#next} diff --git a/sources/academy/platform/deploying_your_code/docker_file.md b/sources/academy/platform/deploying_your_code/docker_file.md index 6993b3462..81ab4704c 100644 --- a/sources/academy/platform/deploying_your_code/docker_file.md +++ b/sources/academy/platform/deploying_your_code/docker_file.md @@ -14,13 +14,13 @@ import TabItem from '@theme/TabItem'; --- -The **Dockerfile** is a file which gives the Apify platform (or Docker, more specifically) instructions on how to create an environment for your code to run in. Every actor must have a Dockerfile, as actors run in Docker containers. +The **Dockerfile** is a file which gives the Apify platform (or Docker, more specifically) instructions on how to create an environment for your code to run in. Every Actor must have a Dockerfile, as Actors run in Docker containers. -> Actors on the platform are always run in Docker containers; however, they can also be run in local Docker containers. This is not common practice though, as it requires more setup and a deeper understanding of Docker. For testing, it's best to just run the actor on the local OS (this requires you to have the underlying runtime installed, such as Node.js, Python, Rust, GO, etc). +> Actors on the platform are always run in Docker containers; however, they can also be run in local Docker containers. This is not common practice though, as it requires more setup and a deeper understanding of Docker. For testing, it's best to just run the Actor on the local OS (this requires you to have the underlying runtime installed, such as Node.js, Python, Rust, GO, etc). ## Base images {#base-images} -If your project doesn’t already contain a Dockerfile, don’t worry! Apify offers [many base images](/sdk/js/docs/guides/docker-images) that are optimized for building and running actors on the platform, which can be found [here](https://hub.docker.com/u/apify). When using a language for which Apify doesn't provide a base image, [Docker Hub](https://hub.docker.com/) provides a ton of free Docker images for most use-cases, upon which you can create your own images. +If your project doesn’t already contain a Dockerfile, don’t worry! Apify offers [many base images](/sdk/js/docs/guides/docker-images) that are optimized for building and running Actors on the platform, which can be found [here](https://hub.docker.com/u/apify). When using a language for which Apify doesn't provide a base image, [Docker Hub](https://hub.docker.com/) provides a ton of free Docker images for most use-cases, upon which you can create your own images. > Tip: You can see all of Apify's Docker images [on DockerHub](https://hub.docker.com/r/apify/). @@ -38,9 +38,9 @@ FROM apify/actor-node:16 The rest of the Dockerfile is about copying the source code from the local filesystem into the container's filesystem, installing libraries, and setting the `RUN` command (which falls back to the parent image). -> If you are not using a base image from Apify, then you should specify how to launch the source code of your actor with the `CMD` instruction. +> If you are not using a base image from Apify, then you should specify how to launch the source code of your Actor with the `CMD` instruction. -Here's the Dockerfile for our Node.js example project's actor: +Here's the Dockerfile for our Node.js example project's Actor: @@ -78,7 +78,7 @@ COPY . ./ # You can also use any other image from Docker Hub. FROM apify/actor-python:3.9 -# Second, copy just requirements.txt into the actor image, +# Second, copy just requirements.txt into the Actor image, # since it should be the only file that affects "pip install" in the next step, # in order to speed up the build COPY requirements.txt ./ @@ -100,7 +100,7 @@ RUN echo "Python version:" \ # for most source file changes. COPY . ./ -# Specify how to launch the source code of your actor. +# Specify how to launch the source code of your Actor. # By default, the main.py file is run CMD python3 main.py @@ -111,10 +111,10 @@ CMD python3 main.py ## Examples {#examples} -The examples we just showed were for Node.js and Python, however, to drive home the fact that actors can be written in any language, here are some examples of some Dockerfiles for actors written in different programming languages: +The examples we just showed were for Node.js and Python, however, to drive home the fact that Actors can be written in any language, here are some examples of some Dockerfiles for Actors written in different programming languages: - + ```Dockerfile FROM golang:1.17.1-alpine @@ -130,7 +130,7 @@ CMD ["/example-actor"] ``` - + ```Dockerfile # Image with prebuilt Rust. We use the newest 1.* version @@ -163,7 +163,7 @@ CMD ["./target/release/actor-example"] ``` - + ```Dockerfile FROM julia:1.7.1-alpine @@ -182,4 +182,4 @@ CMD ["julia", "main.jl"] ## Next up {#next} -In the [next lesson](./deploying.md), we'll push our code directly to the Apify platform, or create and integrate a new actor on the Apify platform with our project's GitHub repository. +In the [next lesson](./deploying.md), we'll push our code directly to the Apify platform, or create and integrate a new Actor on the Apify platform with our project's GitHub repository. diff --git a/sources/academy/platform/deploying_your_code/index.md b/sources/academy/platform/deploying_your_code/index.md index f58be1e56..c33ecd13f 100644 --- a/sources/academy/platform/deploying_your_code/index.md +++ b/sources/academy/platform/deploying_your_code/index.md @@ -1,6 +1,6 @@ --- title: Deploying your code -description: In this course learn how to take an existing project of yours and deploy it to the Apify platform as an actor in just a few minutes! +description: In this course learn how to take an existing project of yours and deploy it to the Apify platform as an Actor in just a few minutes! sidebar_position: 9 category: apify platform slug: /deploying-your-code @@ -11,13 +11,13 @@ import TabItem from '@theme/TabItem'; # Deploying your code to Apify {#deploying} -**In this course learn how to take an existing project of yours and deploy it to the Apify platform as an actor in just a few minutes!** +**In this course learn how to take an existing project of yours and deploy it to the Apify platform as an Actor in just a few minutes!** --- -This section will discuss how to use your newfound knowledge of the Apify platform and actors from the [**Getting started**](../getting_started/index.md) section to deploy your existing project's code to the Apify platform as an actor. +This section will discuss how to use your newfound knowledge of the Apify platform and Actors from the [**Getting started**](../getting_started/index.md) section to deploy your existing project's code to the Apify platform as an Actor. -Because actors are basically just chunks of code running in Docker containers, you're able to **_Actorify_** just about anything! +Because Actors are basically just chunks of code running in Docker containers, you're able to **_Actorify_** just about anything! ![The deployment workflow](../../images/deployment-workflow.png) @@ -25,15 +25,15 @@ Actors are language agnostic, which means that the language your project is writ ![Supported languages](../../images/supported-languages.jpg) -Though the majority of actors currently on the platform were written in Node.js, and despite the fact our current preferred languages are JavaScript and Python, there are a few examples of actors in other languages: +Though the majority of Actors currently on the platform were written in Node.js, and despite the fact our current preferred languages are JavaScript and Python, there are a few examples of Actors in other languages: - [Actor written in Rust](https://apify.com/lukaskrivka/rust-actor-example) -- [GO actor](https://apify.com/jirimoravcik/go-actor-example) +- [GO Actor](https://apify.com/jirimoravcik/go-actor-example) - [Actor written with Julia](https://apify.com/jirimoravcik/julia-actor-example) ## The "actorification" workflow {#workflow} -There are four main steps to turning a piece of code into an actor: +There are four main steps to turning a piece of code into an Actor: 1. Handle [accepting inputs and writing outputs](./inputs_outputs.md). 2. Create an [input schema](./input_schema.md) **(optional)**. @@ -42,7 +42,7 @@ There are four main steps to turning a piece of code into an actor: ## Our example project -For this section, we'll be turning this example project into an actor: +For this section, we'll be turning this example project into an Actor: @@ -76,8 +76,8 @@ print(add_all_numbers([1, 2, 3, 4])) # -> 10 > For all lessons in this section, we'll have examples for both Node.js and Python so that you can follow along in either language. - + ## Next up {#next} -[Next lesson](./inputs_outputs.md), we'll be learning how to accept input into our actor as well as deliver output. +[Next lesson](./inputs_outputs.md), we'll be learning how to accept input into our Actor as well as deliver output. diff --git a/sources/academy/platform/deploying_your_code/input_schema.md b/sources/academy/platform/deploying_your_code/input_schema.md index 6b37f1103..d1f78da24 100644 --- a/sources/academy/platform/deploying_your_code/input_schema.md +++ b/sources/academy/platform/deploying_your_code/input_schema.md @@ -1,19 +1,19 @@ --- title: Input schema -description: Learn how to generate a user interface on the platform for your actor's input with a single file - the INPUT_SCHEMA.json file. +description: Learn how to generate a user interface on the platform for your Actor's input with a single file - the INPUT_SCHEMA.json file. sidebar_position: 2 slug: /deploying-your-code/input-schema --- # Input schema {#input-schema} -**Learn how to generate a user interface on the platform for your actor's input with a single file - the INPUT_SCHEMA.json file.** +**Learn how to generate a user interface on the platform for your Actor's input with a single file - the INPUT_SCHEMA.json file.** --- Though writing an [input schema](/platform/actors/development/actor-definition/input-schema) for an Actor is not a required step, it is most definitely an ideal one. The Apify platform will read the **INPUT_SCHEMA.json** file within the root of your project and generate a user interface for entering input into your Actor, which makes it significantly easier for non-developers (and even developers) to configure and understand the inputs your Actor can receive. Because of this, we'll be writing an input schema for our example Actor. -> Without an input schema, the users of our actor will have to provide the input in JSON format, which can be problematic for those who are not familiar with JSON. +> Without an input schema, the users of our Actor will have to provide the input in JSON format, which can be problematic for those who are not familiar with JSON. ## Schema title & description {#title-and-description} @@ -21,22 +21,22 @@ In the root of our project, we'll create a file named **INPUT_SCHEMA.json** and ```json { - "title": "Adding actor input", + "title": "Adding Actor input", "description": "Add all values in list of numbers with an arbitrary length.", "type": "object", "schemaVersion": 1 } ``` -The **title** and **description** simply describe what the input schema is for, and a bit about what the actor itself does. +The **title** and **description** simply describe what the input schema is for, and a bit about what the Actor itself does. ## Properties {#properties} -In order to define all of the properties our actor is expecting, we must include them within an object with a key of **properties**. +In order to define all of the properties our Actor is expecting, we must include them within an object with a key of **properties**. ```json { - "title": "Adding actor input", + "title": "Adding Actor input", "description": "Add all values in list of numbers with an arbitrary length.", "type": "object", "schemaVersion": 1, @@ -57,7 +57,7 @@ Within our new **numbers** property, there are two more fields we must specify. ```json { - "title": "Adding actor input", + "title": "Adding Actor input", "description": "Add all values in list of numbers with an arbitrary length.", "type": "object", "schemaVersion": 1, @@ -74,11 +74,11 @@ Within our new **numbers** property, there are two more fields we must specify. ## Required fields {#required-fields} -The great thing about building an input schema is that it will automatically validate your inputs based on their type, maximum value, minimum value, etc. Sometimes, you want to ensure that the user will always provide input for certain fields, as they are crucial to the actor's run. This can be done by using the **required** field and passing in the names of the fields you'd like to require. +The great thing about building an input schema is that it will automatically validate your inputs based on their type, maximum value, minimum value, etc. Sometimes, you want to ensure that the user will always provide input for certain fields, as they are crucial to the Actor's run. This can be done by using the **required** field and passing in the names of the fields you'd like to require. ```json { - "title": "Adding actor input", + "title": "Adding Actor input", "description": "Add all values in list of numbers with an arbitrary length.", "type": "object", "schemaVersion": 1, @@ -94,7 +94,7 @@ The great thing about building an input schema is that it will automatically val } ``` -For our case, we've made the **numbers** field required, as it is crucial to our actor's run. +For our case, we've made the **numbers** field required, as it is crucial to our Actor's run. ## Final thoughts {#final-thoughts} @@ -102,10 +102,10 @@ Here is what the input schema we wrote will render on the platform: ![Rendered UI from input schema](./images/rendered-ui.png) -Later on, we'll be building more complex input schemas, as well as discussing how to write quality input schemas that allow the user to easily understand the actor and not become overwhelmed. +Later on, we'll be building more complex input schemas, as well as discussing how to write quality input schemas that allow the user to easily understand the Actor and not become overwhelmed. It's not expected to memorize all of the fields that properties can take or the different editor types available, which is why it's always good to reference the [input schema documentation](/platform/actors/development/actor-definition/input-schema) when writing a schema. ## Next up {#next} -In the [next lesson](./output_schema.md), we'll learn how to generate an appealing Overview table to display our actor's results in real time, so users can get immediate feedback about the data being extracted. +In the [next lesson](./output_schema.md), we'll learn how to generate an appealing Overview table to display our Actor's results in real time, so users can get immediate feedback about the data being extracted. diff --git a/sources/academy/platform/deploying_your_code/inputs_outputs.md b/sources/academy/platform/deploying_your_code/inputs_outputs.md index 6b4ad0020..d91174956 100644 --- a/sources/academy/platform/deploying_your_code/inputs_outputs.md +++ b/sources/academy/platform/deploying_your_code/inputs_outputs.md @@ -1,33 +1,33 @@ --- title: Inputs & outputs -description: Learn to accept input into your actor, do something with it, and then return output. Actors can be written in any language, so this concept is language agnostic. +description: Learn to accept input into your Actor, do something with it, and then return output. Actors can be written in any language, so this concept is language agnostic. sidebar_position: 1 slug: /deploying-your-code/inputs-outputs --- # Inputs & outputs {#inputs-outputs} -**Learn to accept input into your actor, do something with it, and then return output. Actors can be written in any language, so this concept is language agnostic.** +**Learn to accept input into your Actor, do something with it, and then return output. Actors can be written in any language, so this concept is language agnostic.** --- Most of the time when you're creating a project, you are expecting some sort of input from which your software will run off. Oftentimes as well, you want to provide some sort of output once your software has completed running. With Apify, it is extremely easy to take in inputs and deliver outputs. -An important thing to understand regarding inputs and outputs is that they are read/written differently depending on where the actor is running: +An important thing to understand regarding inputs and outputs is that they are read/written differently depending on where the Actor is running: -- If your actor is running locally, the inputs/outputs are usually provided in the filesystem, and environment variables are injected either by you, the developer, or by the Apify CLI by running the project with the `apify run` command. +- If your Actor is running locally, the inputs/outputs are usually provided in the filesystem, and environment variables are injected either by you, the developer, or by the Apify CLI by running the project with the `apify run` command. - While running in a Docker container on the platform, environment variables are automatically injected, and inputs & outputs are provided and modified using Apify's REST API. ## A bit about storage {#about-storage} -You can read/write your inputs/outputs: to the [key-value store](/platform/storage/key-value-store), or to the [dataset](/platform/storage/dataset). The key-value store can be used to store any sort of unorganized/unrelated data in any format, while the data pushed to a dataset typically resembles a table with columns (fields) and rows (items). Each actor's run is allocated both a default dataset and a default key-value store. +You can read/write your inputs/outputs: to the [key-value store](/platform/storage/key-value-store), or to the [dataset](/platform/storage/dataset). The key-value store can be used to store any sort of unorganized/unrelated data in any format, while the data pushed to a dataset typically resembles a table with columns (fields) and rows (items). Each Actor's run is allocated both a default dataset and a default key-value store. When running locally, these storages are accessible through the **storage** folder within your project's root directory, while on the platform they are accessible via Apify's API. ## Accepting input {#accepting-input} -You can utilize multiple ways to accept input into your project. The option you go with depends on the language you have written your project in. If you are using Node.js for your repo's code, you can use the [`apify`](https://www.npmjs.com/package/apify) package. Otherwise, you can use the useful environment variables automatically set up for you by Apify to write utility functions which read the actor's input and return it. +You can utilize multiple ways to accept input into your project. The option you go with depends on the language you have written your project in. If you are using Node.js for your repo's code, you can use the [`apify`](https://www.npmjs.com/package/apify) package. Otherwise, you can use the useful environment variables automatically set up for you by Apify to write utility functions which read the Actor's input and return it. ### Accepting input with the Apify SDK @@ -43,7 +43,7 @@ Now, let's import `Actor` from `apify` and use the `Actor.getInput()` function t // index.js import { Actor } from 'apify'; -// We must initialize and exit the actor. The rest of our code +// We must initialize and exit the Actor. The rest of our code // goes in between these two. await Actor.init(); @@ -88,7 +88,7 @@ Cool! When we run `node index.js`, we see **20**. ### Accepting input without the Apify SDK -Alternatively, when writing in a language other than JavaScript, we can create our own `get_input()` function which utilizes the Apify API when the actor is running on the platform. For this example, we are using the [Apify Client](../getting_started/apify_client.md) for Python to access the API. +Alternatively, when writing in a language other than JavaScript, we can create our own `get_input()` function which utilizes the Apify API when the Actor is running on the platform. For this example, we are using the [Apify Client](../getting_started/apify_client.md) for Python to access the API. ```Python # index.py @@ -131,7 +131,7 @@ print(solution) ## Writing output {#writing-output} -Similarly to reading input, you can write the actor's output either by using the Apify SDK in Node.js or by manually writing a utility function to do so. +Similarly to reading input, you can write the Actor's output either by using the Apify SDK in Node.js or by manually writing a utility function to do so. ### Writing output with the Apify SDK @@ -221,4 +221,4 @@ After running our script, there should be a single item in the default dataset t ## Next up {#next} -That's it! We've now added all of the files and code necessary to convert our software into an actor. In the [next lesson](./input_schema.md), we'll be learning how to easily generate a user interface for our actor's input so that users don't have to provide the input in raw JSON format. +That's it! We've now added all of the files and code necessary to convert our software into an Actor. In the [next lesson](./input_schema.md), we'll be learning how to easily generate a user interface for our Actor's input so that users don't have to provide the input in raw JSON format. diff --git a/sources/academy/platform/deploying_your_code/output_schema.md b/sources/academy/platform/deploying_your_code/output_schema.md index 26ab12636..afdd1eaec 100644 --- a/sources/academy/platform/deploying_your_code/output_schema.md +++ b/sources/academy/platform/deploying_your_code/output_schema.md @@ -1,13 +1,13 @@ --- title: Output schema -description: Learn how to generate an appealing Overview table interface to preview your actor results in real time on the Apify platform. +description: Learn how to generate an appealing Overview table interface to preview your Actor results in real time on the Apify platform. sidebar_position: 3 slug: /deploying-your-code/output-schema --- # Output schema {#output-schema} -**Learn how to generate an appealing Overview table interface to preview your actor results in real time on the Apify platform.** +**Learn how to generate an appealing Overview table interface to preview your Actor results in real time on the Apify platform.** --- @@ -19,7 +19,7 @@ In this quick tutorial, you will learn how to set up an output tab for your own ## Implementation {#implementation} -Firstly, create a `.actor` folder in the root of your actor's source code. Then, create a `actor.json` file in this folder, after which you'll have .actor/actor.json. +Firstly, create a `.actor` folder in the root of your Actor's source code. Then, create a `actor.json` file in this folder, after which you'll have .actor/actor.json. ![.actor/actor.json](./images/actor-json-example.webp) @@ -69,9 +69,9 @@ Next, copy-paste the following template code into your `actor.json` file. } ``` -To configure the output schema, simply replace the fields in the template with the relevant fields to your actor. +To configure the output schema, simply replace the fields in the template with the relevant fields to your Actor. -For reference, you can use the [Zappos Scraper source code](https://github.com/PerVillalva/zappos-scraper-actor/blob/main/.actor/actor.json) as an example of how the final implementation of the output tab should look in a live actor. +For reference, you can use the [Zappos Scraper source code](https://github.com/PerVillalva/zappos-scraper-actor/blob/main/.actor/actor.json) as an example of how the final implementation of the output tab should look in a live Actor. ```json { @@ -160,7 +160,7 @@ const results = { ## Final result {#final-result} -Great! Now that everything is set up, it's time to run the actor and admire your Actor's brand new output tab. +Great! Now that everything is set up, it's time to run the Actor and admire your Actor's brand new output tab. > Need some extra guidance? Visit the [output schema documentation](/platform/actors/development/actor-definition/output-schema) for more detailed information about how to implement this feature. diff --git a/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md b/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md index fc8352d67..4beda0b22 100644 --- a/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md +++ b/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md @@ -1,49 +1,49 @@ --- -title: I - Webhooks & advanced actor overview -description: Learn more advanced details about actors, how they work, and the default configurations they can take. Also, learn how to integrate your actor with webhooks. +title: I - Webhooks & advanced Actor overview +description: Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks. sidebar_position: 6.1 slug: /expert-scraping-with-apify/actors-webhooks --- -# Webhooks & advanced actor overview {#webhooks-and-advanced-actors} +# Webhooks & advanced Actor overview {#webhooks-and-advanced-actors} -**Learn more advanced details about actors, how they work, and the default configurations they can take. **Also**,** learn how** to integrate your actor with webhooks.** +**Learn more advanced details about Actors, how they work, and the default configurations they can take. **Also**,** learn how** to integrate your Actor with webhooks.** --- -Thus far, you've run actors on the platform and written an actor of your own, which you published to the platform yourself using the Apify CLI; therefore, it's fair to say that you are becoming more familiar and comfortable with the concept of **actors**. Within this lesson, we'll take a more in-depth look at actors and what they can do. +Thus far, you've run Actors on the platform and written an Actor of your own, which you published to the platform yourself using the Apify CLI; therefore, it's fair to say that you are becoming more familiar and comfortable with the concept of **Actors**. Within this lesson, we'll take a more in-depth look at Actors and what they can do. -## Advanced actor overview {#advanced-actors} +## Advanced Actor overview {#advanced-actors} In this course, we'll be working out of the Amazon scraper project from the **Web scraping for beginners** course. If you haven't already built that project, you can do it in three short lessons [here](../../webscraping/web_scraping_for_beginners/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same. -Take another look at the files within your Amazon scraper project. You'll notice that there is a **Dockerfile**. Every single actor has a Dockerfile (the actor's **Image**) which tells Docker how to spin up a container on the Apify platform which can successfully run the actor's code. "Apify Actors" is basically just a serverless platform that runs multiple Docker containers. For a deeper understanding of actor Dockerfiles, refer to the [Apify actor Dockerfile docs](/sdk/js/docs/guides/docker-images#example-dockerfile). +Take another look at the files within your Amazon scraper project. You'll notice that there is a **Dockerfile**. Every single Actor has a Dockerfile (the Actor's **Image**) which tells Docker how to spin up a container on the Apify platform which can successfully run the Actor's code. "Apify Actors" is basically just a serverless platform that runs multiple Docker containers. For a deeper understanding of Actor Dockerfiles, refer to the [Apify Actor Dockerfile docs](/sdk/js/docs/guides/docker-images#example-dockerfile). ## Webhooks {#webhooks} -Webhooks are a powerful tool that can be used for just about anything. You can set up actions to be taken when an actor reaches a certain state (started, failed, succeeded, etc). These actions usually take the form of an API call (generally a POST request). +Webhooks are a powerful tool that can be used for just about anything. You can set up actions to be taken when an Actor reaches a certain state (started, failed, succeeded, etc). These actions usually take the form of an API call (generally a POST request). ## Learning 🧠 {#learning} Prior to moving forward, please read over these resources: -- Read about [running actors, handling actor inputs, memory and CPU](/platform/actors/running). -- Learn about [actor webhooks](/platform/integrations/webhooks), which we will implement in the next lesson. +- Read about [running Actors, handling Actor inputs, memory and CPU](/platform/actors/running). +- Learn about [Actor webhooks](/platform/integrations/webhooks), which we will implement in the next lesson. - Learn [how to run Actors](/academy/api/run-actor-and-retrieve-data-via-api) using Apify's REST API. ## Knowledge check 📝 {#quiz} -1. How do you allocate more CPU for an actor's run? -2. Within itself, can you get the exact time that an actor was started? -3. What are the types of default storages connected to an actor's run? -4. Can you change the allocated memory of an actor while it's running? -5. How can you run an actor with Puppeteer on the Apify platform with headless mode set to `false`? +1. How do you allocate more CPU for an Actor's run? +2. Within itself, can you get the exact time that an Actor was started? +3. What are the types of default storages connected to an Actor's run? +4. Can you change the allocated memory of an Actor while it's running? +5. How can you run an Actor with Puppeteer on the Apify platform with headless mode set to `false`? ## Our task {#our-task} In this task, we'll be building on top of what we already created in the [Web scraping for beginners](/academy/web-scraping-for-beginners/challenge) course's final challenge, so keep those files safe! -Once our Amazon actor has completed its run, we will, rather than sending an email to ourselves, call an actor through a webhook. The actor called will be a new actor that we will create together, which will take the dataset ID as input, then subsequently filter through all of the results and return only the cheapest one for each product. All of the results of the actor will be pushed to its default dataset. +Once our Amazon Actor has completed its run, we will, rather than sending an email to ourselves, call an Actor through a webhook. The Actor called will be a new Actor that we will create together, which will take the dataset ID as input, then subsequently filter through all of the results and return only the cheapest one for each product. All of the results of the Actor will be pushed to its default dataset. [**Solution**](./solutions/integrating_webhooks.md) diff --git a/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md b/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md index b1c86edfa..2cdffe51f 100644 --- a/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md +++ b/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md @@ -25,23 +25,23 @@ You can use one of the two main ways to programmatically interact with the Apify ## Knowledge check 📝 {#quiz} 1. What is the relationship between the Apify API and the Apify client? Are there any significant differences? -2. How do you pass input when running an actor or task via API? +2. How do you pass input when running an Actor or task via API? 3. Do you need to install the `apify-client` npm package when already using the `apify` package? ## Our task -In the previous lesson, we created a **task** for the Amazon actor we built in the first two lessons of this course. Now, we'll be creating another new actor, which will have two jobs: +In the previous lesson, we created a **task** for the Amazon Actor we built in the first two lessons of this course. Now, we'll be creating another new Actor, which will have two jobs: -1. Programmatically call the task for the Amazon actor. +1. Programmatically call the task for the Amazon Actor. 2. Export its results into CSV format under a new key called **OUTPUT.csv** in the default key-value store. Though it's a bit unintuitive, this is a perfect activity for learning how to use both the Apify API and the Apify JavaScript client. -The new actor should take the following input values, which be mapped to parameters in the API calls: +The new Actor should take the following input values, which be mapped to parameters in the API calls: ```json { - // How much memory to allocate to the Amazon actor + // How much memory to allocate to the Amazon Actor // Must be a power of 2 "memory": 4096, diff --git a/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md b/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md index a8186e19d..4e4fa6f22 100644 --- a/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md +++ b/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md @@ -13,9 +13,9 @@ slug: /expert-scraping-with-apify/bypassing-anti-scraping Effectively bypassing anti-scraping software is one of the most crucial, but also one of the most difficult skills to master. The different types of [anti-scraping protections](../../webscraping/anti_scraping/index.md) can vary a lot on the web. Some websites aren't even protected at all, some require only moderate IP rotation, and some cannot even be scraped without using advanced techniques and workarounds. Additionally, because the web is evolving, anti-scraping techniques are also evolving and becoming more advanced. -It is generally quite difficult to recognize the anti-scraping protections a page may have when first inspecting it, so it is important to thoroughly investigate a site prior to writing any lines of code, as anti-scraping measures can significantly change your approach as well as complicate the development process of an actor. As your skills expand, you will be able to spot anti-scraping measures quicker, and better evaluate the complexity of a new project. +It is generally quite difficult to recognize the anti-scraping protections a page may have when first inspecting it, so it is important to thoroughly investigate a site prior to writing any lines of code, as anti-scraping measures can significantly change your approach as well as complicate the development process of an Actor. As your skills expand, you will be able to spot anti-scraping measures quicker, and better evaluate the complexity of a new project. -You might have already noticed that we've been using the **RESIDENTIAL** proxy group in the `proxyConfiguration` within our Amazon scraping actor. But what does that mean? This is a proxy group from [Apify Proxy](https://apify.com/proxy) which has been preventing us from being blocked by Amazon this entire time. We'll be learning more about proxies and Apify Proxy in this lesson. +You might have already noticed that we've been using the **RESIDENTIAL** proxy group in the `proxyConfiguration` within our Amazon scraping Actor. But what does that mean? This is a proxy group from [Apify Proxy](https://apify.com/proxy) which has been preventing us from being blocked by Amazon this entire time. We'll be learning more about proxies and Apify Proxy in this lesson. ## Learning 🧠 {#learning} @@ -23,7 +23,7 @@ You might have already noticed that we've been using the **RESIDENTIAL** proxy g - Give the [proxy documentation](/platform/proxy#our-proxies) a solid readover (feel free to skip most of the examples). - Check out the [anti-scraping guide](../../webscraping/anti_scraping/index.md). - Gain a solid understanding of the [SessionPool](https://crawlee.dev/api/core/class/SessionPool). -- Look at a few actors on the [Apify store](https://apify.com/store). How are they utilizing proxies? +- Look at a few Actors on the [Apify store](https://apify.com/store). How are they utilizing proxies? ## Knowledge check 📝 {#quiz} @@ -37,7 +37,7 @@ You might have already noticed that we've been using the **RESIDENTIAL** proxy g ## Our task -This time, we're going to build a trivial proxy-session manager for our Amazon scraping actor. A session should be used a maximum of 5 times before being rotated; however, if a request fails, the IP should be rotated immediately. +This time, we're going to build a trivial proxy-session manager for our Amazon scraping Actor. A session should be used a maximum of 5 times before being rotated; however, if a request fails, the IP should be rotated immediately. Additionally, the proxies used by our scraper should now only be from the US. diff --git a/sources/academy/platform/expert_scraping_with_apify/index.md b/sources/academy/platform/expert_scraping_with_apify/index.md index 0a564cdb7..65be159f1 100644 --- a/sources/academy/platform/expert_scraping_with_apify/index.md +++ b/sources/academy/platform/expert_scraping_with_apify/index.md @@ -1,6 +1,6 @@ --- title: Expert scraping with Apify -description: After learning the basics of actors and Apify, learn to develop pro-level scrapers on the Apify platform with this advanced course. +description: After learning the basics of Actors and Apify, learn to develop pro-level scrapers on the Apify platform with this advanced course. sidebar_position: 12 category: apify platform slug: /expert-scraping-with-apify @@ -8,7 +8,7 @@ slug: /expert-scraping-with-apify # Expert scraping with Apify {#expert-scraping} -**After learning the basics of actors and Apify, learn to develop pro-level scrapers on the Apify platform with this advanced course.** +**After learning the basics of Actors and Apify, learn to develop pro-level scrapers on the Apify platform with this advanced course.** --- @@ -28,7 +28,7 @@ Before developing a pro-level Apify scraper, there are some important things you If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5-10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to [this lesson](../../webscraping/web_scraping_for_beginners/crawling/pro_scraping.md) in the **Web scraping for beginners** course (and ideally follow along). To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category. -The Apify CLI will play a core role in the running and testing of the actor you will build, so if you haven't gotten it installed already, please refer to [this short lesson](../../glossary/tools/apify_cli.md). +The Apify CLI will play a core role in the running and testing of the Actor you will build, so if you haven't gotten it installed already, please refer to [this short lesson](../../glossary/tools/apify_cli.md). ### Git {#git} @@ -38,12 +38,12 @@ In one of the later lessons, we'll be learning how to integrate our Actor on the Docker is a massive topic on its own, but don't be worried! We only expect you to know and understand the very basics of it, which can be learned about in [this short article](https://docs.docker.com/get-started/overview/) (10 minute read). -### The basics of actors {#actor-basics} +### The basics of Actors {#actor-basics} -Part of this course will be learning more in-depth about actors; however, some basic knowledge is already assumed. If you haven't yet gone through the [actors](../getting_started/actors.md) lesson of the **Apify platform** course, it's highly recommended to at least give it a glance before moving forward. +Part of this course will be learning more in-depth about Actors; however, some basic knowledge is already assumed. If you haven't yet gone through the [Actors](../getting_started/actors.md) lesson of the **Apify platform** course, it's highly recommended to at least give it a glance before moving forward. ## First up {#first} -[First up](./actors_webhooks.md), we'll be learning in-depth about integrating actors with each other using webhooks. +[First up](./actors_webhooks.md), we'll be learning in-depth about integrating Actors with each other using webhooks. > Each lesson will have a short _(and optional)_ quiz that you can take at home to test your skills and knowledge related to the lesson's content. Some questions have straight factual answers, but some others can have varying opinionated answers. diff --git a/sources/academy/platform/expert_scraping_with_apify/managing_source_code.md b/sources/academy/platform/expert_scraping_with_apify/managing_source_code.md index eba6d3cf5..09cb219ab 100644 --- a/sources/academy/platform/expert_scraping_with_apify/managing_source_code.md +++ b/sources/academy/platform/expert_scraping_with_apify/managing_source_code.md @@ -11,7 +11,7 @@ slug: /expert-scraping-with-apify/managing-source-code --- -In this brief lesson, we'll discuss how to better manage an actor's source code. Up 'til now, you've been developing your scripts locally, and then pushing the code directly to the actor on the Apify platform; however, there is a much more optimal (and standard) way. +In this brief lesson, we'll discuss how to better manage an Actor's source code. Up 'til now, you've been developing your scripts locally, and then pushing the code directly to the Actor on the Apify platform; however, there is a much more optimal (and standard) way. ## Learning 🧠 {#learning} @@ -19,11 +19,11 @@ Thus far, every time we've updated our code on the Apify platform, we've used th If you're not yet familiar with Git, please get familiar with it through the [Git documentation](https://git-scm.com/docs), then take a quick moment to read about [GitHub integration](/platform/integrations/github) in the Apify docs. -Also, try to explore the **Multifile editor** in one of the actors you developed in the previous lessons before moving forward. +Also, try to explore the **Multifile editor** in one of the Actors you developed in the previous lessons before moving forward. ## Knowledge check 📝 {#quiz} -1. Do you have to rebuild an actor each time the source code is changed? +1. Do you have to rebuild an Actor each time the source code is changed? 2. In Git, what is the difference between **pushing** changes and making a **pull request**? 3. Based on your knowledge and experience, is the `apify push` command worth using (in your opinion)? @@ -43,13 +43,13 @@ First, let's create a repository. This can be done [in a number of ways](https:/ Then, we'll run the commands it tells us in our terminal (while within the **demo-actor** directory) to initialize the repository locally, and then push all of the files to the remote one. -After you've created your repo, navigate on the Apify platform to the actor we called **demo-actor**. In the **Source** tab, click the dropdown menu under **Source code** and select **Git repository**. By default, this is set to **Web IDE**, which is what we've been using so far. +After you've created your repo, navigate on the Apify platform to the Actor we called **demo-actor**. In the **Source** tab, click the dropdown menu under **Source code** and select **Git repository**. By default, this is set to **Web IDE**, which is what we've been using so far. ![Select source code location](./images/select-source-location.png) Then, go ahead and paste the link to your repository into the **Git URL** text field and click **Save**. -The final step is to click on **API** in the top right corner of your actor's page: +The final step is to click on **API** in the top right corner of your Actor's page: ![API button](./images/api-button.jpg) @@ -61,7 +61,7 @@ And you're done! 🎉 ## Quick chat about code management {#code-management} -This was a bit of overhead, but the good news is that you don't ever have to configure this stuff again for this actor. Now, every time the content of your **main**/**master** branch changes, the actor on the Apify platform will rebuild based on the newest code. +This was a bit of overhead, but the good news is that you don't ever have to configure this stuff again for this Actor. Now, every time the content of your **main**/**master** branch changes, the Actor on the Apify platform will rebuild based on the newest code. Think of it as combining two steps into one! Normally, you'd have to do a `git push` from your terminal in order to get the newest code onto GitHub, then run `apify push` to push it to the platform. @@ -69,4 +69,4 @@ It's also important to know that GitHub/Gitlab repository integration is standar ## Next up {#next} -[Next up](./tasks_and_storage.md), you'll learn about the different ways to store scraped data, as well as how to utilize a cool feature to run pre-configured actors. +[Next up](./tasks_and_storage.md), you'll learn about the different ways to store scraped data, as well as how to utilize a cool feature to run pre-configured Actors. diff --git a/sources/academy/platform/expert_scraping_with_apify/migrations_maintaining_state.md b/sources/academy/platform/expert_scraping_with_apify/migrations_maintaining_state.md index 2790cc201..c3f9d5e15 100644 --- a/sources/academy/platform/expert_scraping_with_apify/migrations_maintaining_state.md +++ b/sources/academy/platform/expert_scraping_with_apify/migrations_maintaining_state.md @@ -1,34 +1,34 @@ --- title: V - Migrations & maintaining state -description: Learn about what actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected. +description: Learn about what Actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected. sidebar_position: 6.5 slug: /expert-scraping-with-apify/migrations-maintaining-state --- # Migrations & maintaining state {#migrations-maintaining-state} -**Learn about what actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected.** +**Learn about what Actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected.** --- -We already know that actors are basically just Docker containers that can be run on any server. This means that they can be allocated anywhere there is space available, making them very efficient. Unfortunately, there is one big caveat: actors move - a lot. When an actor moves, it is called a **migration**. +We already know that Actors are basically just Docker containers that can be run on any server. This means that they can be allocated anywhere there is space available, making them very efficient. Unfortunately, there is one big caveat: Actors move - a lot. When an Actor moves, it is called a **migration**. -On migration, the process inside of an actor is completely restarted and everything in its memory is lost, meaning that any values stored within variables or classes are lost. +On migration, the process inside of an Actor is completely restarted and everything in its memory is lost, meaning that any values stored within variables or classes are lost. -When a migration happens, you want to do a so-called "state transition", which means saving any data you care about so the actor can continue right where it left off before the migration. +When a migration happens, you want to do a so-called "state transition", which means saving any data you care about so the Actor can continue right where it left off before the migration. ## Learning 🧠 {#learning} Read this [article](/platform/actors/development/builds-and-runs/state-persistence) on migrations and dealing with state transitions. -Before moving forward, read about actor [events](/sdk/js/docs/upgrading/upgrading-to-v3#events) and how to listen for them. +Before moving forward, read about Actor [events](/sdk/js/docs/upgrading/upgrading-to-v3#events) and how to listen for them. ## Knowledge check 📝 {#quiz} -1. Actors have an option in the **Settings** tab to **Restart on error**. Would you use this feature for regular actors? When would you use this feature? -2. Migrations happen randomly, but by [aborting **gracefully**](/platform/actors/running/runs-and-builds#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted actor's run? -3. Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping actor? Are there any features in the Crawlee/Apify SDK that handle this under the hood? -4. How can you intercept the migration event? How much time do you have after this event happens and before the actor migrates? +1. Actors have an option in the **Settings** tab to **Restart on error**. Would you use this feature for regular Actors? When would you use this feature? +2. Migrations happen randomly, but by [aborting **gracefully**](/platform/actors/running/runs-and-builds#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run? +3. Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping Actor? Are there any features in the Crawlee/Apify SDK that handle this under the hood? +4. How can you intercept the migration event? How much time do you have after this event happens and before the Actor migrates? 5. When would you persist data to the default key-value store instead of to a named key-value store? ## Our task @@ -42,10 +42,10 @@ Once again returning to our Amazon **demo-actor**, let's say that we need to sto } ``` -Every 10 seconds, we should log the most up-to-date version of this object to the console. Additionally, the object should be able to solve actor migrations, which means that even if the actor were to migrate, its data would not be lost upon resurrection. +Every 10 seconds, we should log the most up-to-date version of this object to the console. Additionally, the object should be able to solve Actor migrations, which means that even if the Actor were to migrate, its data would not be lost upon resurrection. [**Solution**](./solutions/handling_migrations.md) ## Next up {#next} -You might have already noticed that we've been using the **RESIDENTIAL** proxy group in the `proxyConfiguration` within our Amazon scraping actor. But what does that mean? Learn why we've used this group, about proxies, and about avoiding anti-scraping measures in the [next lesson](./bypassing_anti_scraping.md). +You might have already noticed that we've been using the **RESIDENTIAL** proxy group in the `proxyConfiguration` within our Amazon scraping Actor. But what does that mean? Learn why we've used this group, about proxies, and about avoiding anti-scraping measures in the [next lesson](./bypassing_anti_scraping.md). diff --git a/sources/academy/platform/expert_scraping_with_apify/saving_useful_stats.md b/sources/academy/platform/expert_scraping_with_apify/saving_useful_stats.md index e03022f34..6c1432650 100644 --- a/sources/academy/platform/expert_scraping_with_apify/saving_useful_stats.md +++ b/sources/academy/platform/expert_scraping_with_apify/saving_useful_stats.md @@ -1,19 +1,19 @@ --- title: VII - Saving useful run statistics -description: Understand how to save statistics about an actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper. +description: Understand how to save statistics about an Actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper. sidebar_position: 6.7 slug: /expert-scraping-with-apify/saving-useful-stats --- # Saving useful run statistics {#savings-useful-run-statistics} -**Understand how to save statistics about an actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper.** +**Understand how to save statistics about an Actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper.** --- Using Crawlee and the Apify SDK, we are now able to collect and format data coming directly from websites and save it into a Key-Value store or Dataset. This is great, but sometimes, we want to store some extra data about the run itself, or about each request. We might want to store some extra general run information separately from our results or potentially include statistics about each request within its corresponding dataset item. -The types of values that are saved are totally up to you, but the most common are error scores, number of total saved items, number of request retries, number of captchas hit, etc. Storing these values is not always necessary, but can be valuable when debugging and maintaining an actor. As your projects scale, this will become more and more useful and important. +The types of values that are saved are totally up to you, but the most common are error scores, number of total saved items, number of request retries, number of captchas hit, etc. Storing these values is not always necessary, but can be valuable when debugging and maintaining an Actor. As your projects scale, this will become more and more useful and important. ## Learning 🧠 {#learning} @@ -26,15 +26,15 @@ Before moving on, give these valuable resources a quick lookover: ## Knowledge check 📝 {#quiz} -1. Why might you want to store statistics about an actor's run (or a specific request)? +1. Why might you want to store statistics about an Actor's run (or a specific request)? 2. In our Amazon scraper, we are trying to store the number of retries of a request once its data is pushed to the dataset. Where would you get this information? Where would you store it? 3. We are building a new imaginary scraper for a website that sometimes displays captchas at unexpected times, rather than displaying the content we want. How would you keep a count of the total number of captchas hit for the entire run? Where would you store this data? Why? -4. Is storing these types of values necessary for every single actor? +4. Is storing these types of values necessary for every single Actor? 5. What is the difference between the `failedRequestHandler` and `errorHandler`? ## Our task -In our Amazon actor, each dataset result must now have the following extra keys: +In our Amazon Actor, each dataset result must now have the following extra keys: ```json { diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md b/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md index f29b77868..2cfba52a4 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md @@ -127,7 +127,7 @@ router.addHandler(labels.OFFERS, async ({ $, request }) => { ## Persisting state {#persisting-state} -The **persistState** event is automatically fired (by default) every 60 seconds by the Apify SDK while the actor is running and is also fired when the **migrating** event occurs. +The **persistState** event is automatically fired (by default) every 60 seconds by the Apify SDK while the Actor is running and is also fired when the **migrating** event occurs. In order to persist our ASIN tracker object, let's use the `Actor.on` function to listen for the **persistState** event and store it in the key-value store each time it is emitted. @@ -164,9 +164,9 @@ module.exports = new ASINTracker(); ## Handling resurrections {#handling-resurrections} -Great! Now our state will be persisted every 60 seconds in the key-value store. However, we're not done. Let's say that the actor migrates and is resurrected. We never actually update the `state` variable of our `ASINTracker` class with the state stored in the key-value store, so as our code currently stands, we still don't support state-persistence on migrations. +Great! Now our state will be persisted every 60 seconds in the key-value store. However, we're not done. Let's say that the Actor migrates and is resurrected. We never actually update the `state` variable of our `ASINTracker` class with the state stored in the key-value store, so as our code currently stands, we still don't support state-persistence on migrations. -In order to fix this, let's create a method called `initialize` which will be called at the very beginning of the actor's run, and will check the key-value store for a previous state under the key **ASIN-TRACKER**. If a previous state does live there, then it will update the class' `state` variable with the value read from the key-value store: +In order to fix this, let's create a method called `initialize` which will be called at the very beginning of the Actor's run, and will check the key-value store for a previous state under the key **ASIN-TRACKER**. If a previous state does live there, then it will update the class' `state` variable with the value read from the key-value store: ```js // asinTracker.js @@ -207,7 +207,7 @@ class ASINTracker { module.exports = new ASINTracker(); ``` -We'll now call this function at the top level of the **main.js** file to ensure it is the first thing that gets called when the actor starts up: +We'll now call this function at the top level of the **main.js** file to ensure it is the first thing that gets called when the Actor starts up: ```js // main.js @@ -223,32 +223,32 @@ await tracker.initialize(); // ... ``` -That's everything! Now, even if the actor migrates (or is gracefully aborted and then resurrected), this `state` object will always be persisted. +That's everything! Now, even if the Actor migrates (or is gracefully aborted and then resurrected), this `state` object will always be persisted. ## Quiz answers 📝 {#quiz-answers} -**Q: Actors have an option in the Settings tab to Restart on error. Would you use this feature for regular actors? When would you use this feature?** +**Q: Actors have an option in the Settings tab to Restart on error. Would you use this feature for regular Actors? When would you use this feature?** -**A:** It's not best to use this option by default. If it fails, there must be a reason, which would need to be thought through first - meaning that the edge case of failing should be handled when resurrecting the actor. The state should be persisted beforehand. +**A:** It's not best to use this option by default. If it fails, there must be a reason, which would need to be thought through first - meaning that the edge case of failing should be handled when resurrecting the Actor. The state should be persisted beforehand. -**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted actor's run?** +**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?** **A:** After aborting or throwing an error mid-process, it manages to start back from where it was upon resurrection. -**Q: Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping actor? Are there any features in Crawlee or Apify SDK that handle this under the hood?** +**Q: Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping Actor? Are there any features in Crawlee or Apify SDK that handle this under the hood?** **A:** Because Apify SDK handles all of the migration handling code for us. If you want to add custom migration-handling code, you can use `Actor.events` to listen for the `migrating` or `persistState` events to save the current state in key-value store (or elsewhere). -**Q: How can you intercept the migration event? How much time do you have after this event happens and before the actor migrates?** +**Q: How can you intercept the migration event? How much time do you have after this event happens and before the Actor migrates?** **A:** By using the `Actor.on` function. You have a maximum of a few seconds before shutdown after the `migrating` event has been fired. **Q: When would you persist data to the default key-value store instead of to a named key-value store?** -**A:** Persisting data to the default key-value store would help when handling an actor's run state or with storing metadata about the run (such as results, miscellaneous files, or logs). Using a named key-value store allows you to persist data at the account level to handle data across multiple actor runs. +**A:** Persisting data to the default key-value store would help when handling an Actor's run state or with storing metadata about the run (such as results, miscellaneous files, or logs). Using a named key-value store allows you to persist data at the account level to handle data across multiple Actor runs. ## Wrap up {#wrap-up} -In this activity, we learned how to persist custom values on an interval as well as after actor migrations by using the `persistState` event and the key-value store. With this knowledge, you can safely increase your actor's performance by storing data in variables and then pushing them to the dataset periodically/at the end of the actor's run as opposed to pushing data immediately after it's been collected. +In this activity, we learned how to persist custom values on an interval as well as after Actor migrations by using the `persistState` event and the key-value store. With this knowledge, you can safely increase your Actor's performance by storing data in variables and then pushing them to the dataset periodically/at the end of the Actor's run as opposed to pushing data immediately after it's been collected. One important thing to note is that this workflow can be used to replace the usage of `userData` to pass data between requests, as it allows for the creation of a "global store" which all requests have access to at any time. diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md b/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md index ca61ba0eb..f8740cfa3 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md @@ -1,23 +1,23 @@ --- title: I - Integrating webhooks -description: Learn how to integrate webhooks into your actors. Webhooks are a super powerful tool, and can be used to do almost anything! +description: Learn how to integrate webhooks into your Actors. Webhooks are a super powerful tool, and can be used to do almost anything! sidebar_position: 1 slug: /expert-scraping-with-apify/solutions/integrating-webhooks --- # Integrating webhooks {#integrating-webhooks} -**Learn how to integrate webhooks into your actors. Webhooks are a super powerful tool, and can be used to do almost anything!** +**Learn how to integrate webhooks into your Actors. Webhooks are a super powerful tool, and can be used to do almost anything!** --- -In this lesson we'll be writing a new actor and integrating it with our beloved Amazon scraping actor. First, we'll navigate to the same directory where our **demo-actor** folder lives, and run `apify create filter-actor` _(once again, you can name the actor whatever you want, but for this lesson, we'll be calling the new actor **filter-actor**)_. When prompted for which type of boilerplate to start out with, select **Empty**. +In this lesson we'll be writing a new Actor and integrating it with our beloved Amazon scraping Actor. First, we'll navigate to the same directory where our **demo-actor** folder lives, and run `apify create filter-actor` _(once again, you can name the Actor whatever you want, but for this lesson, we'll be calling the new Actor **filter-actor**)_. When prompted for which type of boilerplate to start out with, select **Empty**. ![Selecting an empty template to start with](./images/select-empty.jpg) Cool! Now, we're ready to get started. -## Building the new actor {#building-the-new-actor} +## Building the new Actor {#building-the-new-actor} First of all, we should clear out any of the boilerplate code within **main.js** to get a clean slate: @@ -32,7 +32,7 @@ await Actor.init(); await Actor.exit(); ``` -We'll be passing the ID of the Amazon actor's default dataset along to the new actor, so we can expect that as an input: +We'll be passing the ID of the Amazon Actor's default dataset along to the new Actor, so we can expect that as an input: ```js const { datasetId } = await Actor.getInput(); @@ -46,7 +46,7 @@ Next, we'll grab hold of the dataset's items with the `dataset.getData()` functi const { items } = await dataset.getData(); ``` -While several methods can achieve the goal output of this actor, using the [`Array.reduce()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce) is the most concise approach +While several methods can achieve the goal output of this Actor, using the [`Array.reduce()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce) is the most concise approach ```js const filtered = items.reduce((acc, curr) => { @@ -68,7 +68,7 @@ const filtered = items.reduce((acc, curr) => { }, {}); ``` -The results should be an array, so finally, we can take the map we just created and push an array of all of its values to the actor's default dataset: +The results should be an array, so finally, we can take the map we just created and push an array of all of its values to the Actor's default dataset: ```js await Actor.pushData(Object.values(filtered)); @@ -100,7 +100,7 @@ await Actor.pushData(Object.values(filtered)); await Actor.exit(); ``` -Cool! But **wait**, don't forget to configure the **INPUT_SCHEMA.json** file as well! It's not necessary to do this step, as we'll be calling the actor through Apify's API within a webhook, but it's still good to get into the habit of writing quality input schemas that describe the input values your actors are expecting. +Cool! But **wait**, don't forget to configure the **INPUT_SCHEMA.json** file as well! It's not necessary to do this step, as we'll be calling the Actor through Apify's API within a webhook, but it's still good to get into the habit of writing quality input schemas that describe the input values your Actors are expecting. ```json { @@ -123,17 +123,17 @@ Now we're done, and we can push it up to the Apify platform with the `apify push ## Setting up the webhook {#setting-up-the-webhook} -Since we'll be calling the Actor via the [Apify API](/academy/api/run-actor-and-retrieve-data-via-api), we'll need to grab hold of the ID of the Actor we just created and pushed to the platform. The ID is always accessible through the **Settings** page of the actor. +Since we'll be calling the Actor via the [Apify API](/academy/api/run-actor-and-retrieve-data-via-api), we'll need to grab hold of the ID of the Actor we just created and pushed to the platform. The ID is always accessible through the **Settings** page of the Actor. -![Actor ID in actor settings](./images/actor-settings.jpg) +![Actor ID in Actor settings](./images/actor-settings.jpg) -With this `actorId`, and our `token`, which is retrievable through **Settings > Integrations** on the Apify Console, we can construct a link which will call the actor: +With this `actorId`, and our `token`, which is retrievable through **Settings > Integrations** on the Apify Console, we can construct a link which will call the Actor: ```text https://api.apify.com/v2/acts/Yk1bieximsduYDydP/runs?token=YOUR_TOKEN_HERE ``` -We can also use our username and the name of the actor like this: +We can also use our username and the name of the Actor like this: ```text https://api.apify.com/v2/acts/USERNAME~filter-actor/runs?token=YOUR_TOKEN_HERE @@ -141,11 +141,11 @@ https://api.apify.com/v2/acts/USERNAME~filter-actor/runs?token=YOUR_TOKEN_HERE Whichever one you choose is totally up to your preference. -Next, within the actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this: +Next, within the Actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this: ![Configuring a webhook](./images/adding-webhook.jpg) -We have chosen to run the webhook once the actor has succeeded, which means that its default dataset will surely be populated. Since the filtering actor is expecting the default dataset ID of the Amazon actor, we use the `resource` variable to grab hold of the `defaultDatasetId`. +We have chosen to run the webhook once the Actor has succeeded, which means that its default dataset will surely be populated. Since the filtering Actor is expecting the default dataset ID of the Amazon Actor, we use the `resource` variable to grab hold of the `defaultDatasetId`. Click **Save**, then run the Amazon **demo-actor** again. @@ -161,26 +161,26 @@ Additionally, we should be able to see that our **filter-actor** was run, and ha ## Quiz answers 📝 {#quiz-answers} -**Q: How do you allocate more CPU for an actor's run?** +**Q: How do you allocate more CPU for an Actor's run?** -**A:** On the platform, more memory can be allocated in the actor's input configuration, and the default allocated CPU can be changed in the actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES**** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform. +**A:** On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES**** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform. -**Q: Within itself, can you get the exact time that an actor was started?** +**Q: Within itself, can you get the exact time that an Actor was started?** -**A:** Yes. The time the actor was started can be retrieved through the `startedAt` property from the `Actor.getEnv()` function, or directly from `process.env.APIFY_STARTED_AT` +**A:** Yes. The time the Actor was started can be retrieved through the `startedAt` property from the `Actor.getEnv()` function, or directly from `process.env.APIFY_STARTED_AT` -**Q: What are the types of default storages connected to an actor's run?** +**Q: What are the types of default storages connected to an Actor's run?** -Every actor's run is given a default key-value store and a default dataset. The default key-value store by default has the `INPUT` and `OUTPUT` keys. The actor's request queue is also stored. +Every Actor's run is given a default key-value store and a default dataset. The default key-value store by default has the `INPUT` and `OUTPUT` keys. The Actor's request queue is also stored. -**Q: Can you change the allocated memory of an actor while it's running?** +**Q: Can you change the allocated memory of an Actor while it's running?** -**A:** Not while it's running. You'd need to stop it and run a new one. However, there is an option to soft abort an actor, then resurrect then run with a different memory configuration. +**A:** Not while it's running. You'd need to stop it and run a new one. However, there is an option to soft abort an Actor, then resurrect then run with a different memory configuration. -**Q: How can you run an actor with Puppeteer on the Apify platform with headless mode set to `false`?** +**Q: How can you run an Actor with Puppeteer on the Apify platform with headless mode set to `false`?** **A:** This can be done by using the `actor-node-puppeteer-chrome` Docker image and making sure that `launchContext.launchOptions.headless` in `PuppeteerCrawlerOptions` is set to `false`. ## Wrap up {#wrap-up} -See that?! Integrating webhooks is a piece of cake on the Apify platform! You'll soon discover that the platform factors away a lot of complex things and allows you to focus on what's most important - developing and releasing actors. +See that?! Integrating webhooks is a piece of cake on the Apify platform! You'll soon discover that the platform factors away a lot of complex things and allows you to focus on what's most important - developing and releasing Actors. diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/managing_source.md b/sources/academy/platform/expert_scraping_with_apify/solutions/managing_source.md index 5c49f4b97..2273af81e 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/managing_source.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/managing_source.md @@ -15,7 +15,7 @@ In the lesson corresponding to this solution, we discussed an extremely importan ## Quiz answers {#quiz-answers} -**Q: Do you have to rebuild an actor each time the source code is changed?** +**Q: Do you have to rebuild an Actor each time the source code is changed?** **A:** Yes. It needs to be built into an image, saved in a registry, and later on run in a container. diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md b/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md index 40ab39088..a9cf0ac81 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md @@ -11,7 +11,7 @@ slug: /expert-scraping-with-apify/solutions/rotating-proxies --- -If you take a look at our current code for the Amazon scraping actor, you might notice this snippet: +If you take a look at our current code for the Amazon scraping Actor, you might notice this snippet: ```js const proxyConfiguration = await Actor.createProxyConfiguration({ @@ -94,7 +94,7 @@ const proxyConfiguration = await Actor.createProxyConfiguration({ **Q: How can you prevent an error from occurring if one of the proxy groups that a user has is removed? What are the best practices for these scenarios?** -**A:** By making the proxy for the scraper to use be configurable by the user through the actor's input. That way, they can easily switch proxies if the actor stops working due to proxy-related issues. It can also be done by using the **AUTO** proxy instead of specific groups. +**A:** By making the proxy for the scraper to use be configurable by the user through the Actor's input. That way, they can easily switch proxies if the Actor stops working due to proxy-related issues. It can also be done by using the **AUTO** proxy instead of specific groups. **Q: Does it make sense to rotate proxies when you are logged into a website?** diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md b/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md index bd548d7ee..a8bcd851b 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md @@ -1,13 +1,13 @@ --- title: VII - Saving run stats -description: Implement the saving of general statistics about an actor's run, as well as adding request-specific statistics to dataset items. +description: Implement the saving of general statistics about an Actor's run, as well as adding request-specific statistics to dataset items. sidebar_position: 7 slug: /expert-scraping-with-apify/solutions/saving-stats --- # Saving run stats {#saving-stats} -**Implement the saving of general statistics about an actor's run, as well as adding request-specific statistics to dataset items.** +**Implement the saving of general statistics about an Actor's run, as well as adding request-specific statistics to dataset items.** --- @@ -146,7 +146,7 @@ router.addHandler(labels.OFFERS, async ({ $, request }) => { ## Quiz answers {#quiz-answers} -**Q: Why might you want to store statistics about an actor's run (or a specific request)?** +**Q: Why might you want to store statistics about an Actor's run (or a specific request)?** **A:** If certain types of requests are error-prone, you might want to save stats about the run to look at them later to either eliminate or better handle the errors. Things like **dateHandled** can be generally useful information. @@ -158,9 +158,9 @@ router.addHandler(labels.OFFERS, async ({ $, request }) => { **A:** First, build a function that detects if the captcha has been hit. If so, it will throw an error and add to the **numberOfCaptchas** count. This data might be stored on a persisted state object to help better assess the anti-scraping mitigation techniques the scraper should be used. -**Q: Is storing these types of values necessary for every single actor?** +**Q: Is storing these types of values necessary for every single Actor?** -**A:** For small actors, it might be a waste of time to do this. For large-scale actors, it can be extremely helpful when debugging and most definitely worth the extra 10-20 minutes of development time. Usually though, the default statistics from the Crawlee and the SDK might be enough for simple run stats. +**A:** For small Actors, it might be a waste of time to do this. For large-scale Actors, it can be extremely helpful when debugging and most definitely worth the extra 10-20 minutes of development time. Usually though, the default statistics from the Crawlee and the SDK might be enough for simple run stats. **Q: What is the difference between the `failedRequestHandler` and `errorHandler`?** diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md b/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md index 827b8de27..b43eb8ce9 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/using_api_and_client.md @@ -11,7 +11,7 @@ slug: /expert-scraping-with-apify/solutions/using-api-and-client --- -Since we need to create another actor, we'll once again use the `apify create` command and start from an empty template. +Since we need to create another Actor, we'll once again use the `apify create` command and start from an empty template. ![Selecting an empty template to start with](./images/select-empty.jpg) @@ -119,7 +119,7 @@ const withAPI = async () => { }; ``` -## Finalizing the actor {#finalizing-the-actor} +## Finalizing the Actor {#finalizing-the-actor} Now, since we've written both of these functions, all we have to do is write a conditional statement based on the boolean value from `useClient`: @@ -232,7 +232,7 @@ await Actor.exit(); The one main difference is that the Apify client automatically uses [**exponential backoff**](/api/client/js#retries-with-exponential-backoff) to deal with errors. -**Q: How do you pass input when running an actor or task via API?** +**Q: How do you pass input when running an Actor or task via API?** **A:** The input should be passed into the **body** of the request when running an actor/task via API. diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md b/sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md index d31afe297..b533b5838 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md @@ -11,7 +11,7 @@ slug: /expert-scraping-with-apify/solutions/using-storage-creating-tasks --- -Last lesson, our task was outlined for us. In this lesson, we'll be completing that task by making our Amazon actor push to a **named dataset** and use the **default key-value store** to store the cheapest item found by the scraper. Finally, we'll create a task for the actor back on the Apify platform. +Last lesson, our task was outlined for us. In this lesson, we'll be completing that task by making our Amazon Actor push to a **named dataset** and use the **default key-value store** to store the cheapest item found by the scraper. Finally, we'll create a task for the Actor back on the Apify platform. ## Using a named dataset {#using-named-dataset} @@ -61,13 +61,13 @@ router.addHandler(labels.OFFERS, async ({ $, request }) => { }); ``` -That's it! Now, our actor will push its data to a dataset named **amazon-offers-KEYWORD**! +That's it! Now, our Actor will push its data to a dataset named **amazon-offers-KEYWORD**! ## Using a key-value store {#using-key-value-store} We now want to store the cheapest item in the default key-value store under a key named **CHEAPEST-ITEM**. The most efficient and practical way of doing this is by filtering through all of the newly named dataset's items and pushing the cheapest one to the store. -Let's add the following code to the bottom of the actor after **Crawl** finished** is logged to the console: +Let's add the following code to the bottom of the Actor after **Crawl** finished** is logged to the console: ```js // ... @@ -234,7 +234,7 @@ Don't forget to push your changes to GitHub using `git push origin MAIN_BRANCH_N ## Creating a task (It's easy!) {#creating-task} -Back on the platform, on your actor's page, you can see a button in the top right hand corner that says **Create new task**: +Back on the platform, on your Actor's page, you can see a button in the top right hand corner that says **Create new task**: ![Create new task button](./images/create-new-task.jpg) @@ -248,9 +248,9 @@ After saving it, you'll be able to see the newly created task in the **Tasks** t ## Quiz answers 📝 {#quiz-answers} -**Q: What is the relationship between actors and tasks?** +**Q: What is the relationship between Actors and tasks?** -**A:** Tasks are pre-configured runs of actors. The configurations of an actor can be saved as a task so that it doesn't have to be manually configured every single time. +**A:** Tasks are pre-configured runs of Actors. The configurations of an Actor can be saved as a task so that it doesn't have to be manually configured every single time. **Q: What are the differences between default (unnamed) and named storage? Which one would you use for everyday usage?** @@ -264,4 +264,4 @@ After saving it, you'll be able to see the newly created task in the **Tasks** t ## Wrap up {#wrap-up} -You've learned how to use the different storage options available on Apify, the two different types of storage, as well as how to create tasks for actors. +You've learned how to use the different storage options available on Apify, the two different types of storage, as well as how to create tasks for Actors. diff --git a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md index dcd85093f..8d42fc6d1 100644 --- a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md +++ b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md @@ -1,21 +1,21 @@ --- title: III - Tasks & storage -description: Understand how to save the configurations for actors with actor tasks. Also, learn about storage and the different types Apify offers. +description: Understand how to save the configurations for Actors with Actor tasks. Also, learn about storage and the different types Apify offers. sidebar_position: 6.3 slug: /expert-scraping-with-apify/tasks-and-storage --- # Tasks & storage {#tasks-and-storage} -**Understand how to save the configurations for actors with actor tasks. Also, learn about storage and the different types Apify offers.** +**Understand how to save the configurations for Actors with Actor tasks. Also, learn about storage and the different types Apify offers.** --- -Both of these are very different things; however, they are also tied together in many ways. **Tasks** run actors, actors return data, and data is stored in different types of **Storages**. +Both of these are very different things; however, they are also tied together in many ways. **Tasks** run Actors, Actors return data, and data is stored in different types of **Storages**. ## Tasks {#tasks} -Tasks are a very useful feature which allow us to save pre-configured inputs for actors. This means that rather than configuring the actor every time, or rather than having to save screenshots of various different actor configurations, you can store the configurations right in your Apify account instead, and run the actor at will with them. +Tasks are a very useful feature which allow us to save pre-configured inputs for Actors. This means that rather than configuring the Actor every time, or rather than having to save screenshots of various different Actor configurations, you can store the configurations right in your Apify account instead, and run the Actor at will with them. ## Storage {#storage} @@ -23,27 +23,27 @@ Storage allows us to save persistent data for further processing. As you'll lear ## Learning 🧠 {#learning} -- Check out [the docs about actor tasks](/platform/actors/running/tasks). +- Check out [the docs about Actor tasks](/platform/actors/running/tasks). - Read about the [two main storage options](/platform/storage#dataset) on the Apify platform. - Understand the [crucial differences between named and unnamed storages](/platform/storage/usage#named-and-unnamed-storages). - Learn about the [`Dataset`](/sdk/js/reference/class/Dataset) and [`KeyValueStore`](/sdk/js/reference/class/KeyValueStore) objects in the Apify SDK. ## Knowledge check 📝 {#quiz} -1. What is the relationship between actors and tasks? +1. What is the relationship between Actors and tasks? 2. What are the differences between default (unnamed) and named storage? Which one would you use for everyday usage? 3. What is data retention, and how does it work for all types of storages (default and named)? ## Our task {#our-task} -Once again, we'll be adding onto our main Amazon-scraping actor in this activity, but don't worry - this lesson will be quite light, just like the last one. +Once again, we'll be adding onto our main Amazon-scraping Actor in this activity, but don't worry - this lesson will be quite light, just like the last one. -We have decided that we want to retain the data scraped by the actor for a long period of time, so instead of pushing to the default dataset, we will be pushing to a named dataset. Additionally, we want to save the absolute cheapest item found by the scraper into the default key-value store under a key named **CHEAPEST-ITEM**. +We have decided that we want to retain the data scraped by the Actor for a long period of time, so instead of pushing to the default dataset, we will be pushing to a named dataset. Additionally, we want to save the absolute cheapest item found by the scraper into the default key-value store under a key named **CHEAPEST-ITEM**. -Finally, we'll create a task for the actor that saves the configuration with the **keyword** set to **google pixel****. +Finally, we'll create a task for the Actor that saves the configuration with the **keyword** set to **google pixel****. [**Solution**](./solutions/using_storage_creating_tasks.md) ## Next up {#next} -The [next lesson](./apify_api_and_client.md) is very exciting, as it will unlock the ability to seamlessly integrate your Apify actors into your own external projects and applications with the Apify API. +The [next lesson](./apify_api_and_client.md) is very exciting, as it will unlock the ability to seamlessly integrate your Apify Actors into your own external projects and applications with the Apify API. diff --git a/sources/academy/platform/get_most_of_actors/index.md b/sources/academy/platform/get_most_of_actors/index.md index b179f7b7c..bad0714fd 100644 --- a/sources/academy/platform/get_most_of_actors/index.md +++ b/sources/academy/platform/get_most_of_actors/index.md @@ -1,6 +1,6 @@ --- -title: Getting the most of actors on Apify Store -description: Learn how to optimize your public actors on Apify Store and monetize them by renting your actor to other platform users. +title: Getting the most of Actors on Apify Store +description: Learn how to optimize your public Actors on Apify Store and monetize them by renting your Actor to other platform users. sidebar_position: 10 category: apify platform slug: /get-most-of-actors @@ -8,17 +8,17 @@ slug: /get-most-of-actors # Apify Store {#apify-store} -**Learn how to optimize your public actors on Apify Store and monetize them by renting your actor to other platform users.** +**Learn how to optimize your public Actors on Apify Store and monetize them by renting your Actor to other platform users.** --- -[Apify Store](https://apify.com/store) is home to hundreds of public actors available to the Apify community. Anyone is welcome to [publish actors](/platform/actors/publishing) in the store, and you can even [monetize your Actors](https://get.apify.com/monetize-your-code). +[Apify Store](https://apify.com/store) is home to hundreds of public Actors available to the Apify community. Anyone is welcome to [publish Actors](/platform/actors/publishing) in the store, and you can even [monetize your Actors](https://get.apify.com/monetize-your-code). -In this section, we will go over some of the practical steps you can take to ensure the high quality of your public actors. You will learn: +In this section, we will go over some of the practical steps you can take to ensure the high quality of your public Actors. You will learn: 1. Actor naming and README best practices. 2. How to monetize your Actor. -3. Tips and tricks to attract more users to your actor's page. +3. Tips and tricks to attract more users to your Actor's page. ## Next up {#next} diff --git a/sources/academy/platform/get_most_of_actors/naming_your_actor.md b/sources/academy/platform/get_most_of_actors/naming_your_actor.md index e39cc4a36..8477ab1c8 100644 --- a/sources/academy/platform/get_most_of_actors/naming_your_actor.md +++ b/sources/academy/platform/get_most_of_actors/naming_your_actor.md @@ -1,27 +1,27 @@ --- -title: Naming your actor -description: Apify's standards for actor naming. Learn how to choose the right name for scraping and non-scraping actors and how to optimize your actor for search engines. +title: Naming your Actor +description: Apify's standards for Actor naming. Learn how to choose the right name for scraping and non-scraping Actors and how to optimize your Actor for search engines. sidebar_position: 1 slug: /get-most-of-actors/naming-your-actor --- -# Naming your actor {#naming-your-actor} +# Naming your Actor {#naming-your-actor} -**Apify's standards for actor naming. Learn how to choose the right name for scraping and non-scraping actors and how to optimize your actor for search engines.** +**Apify's standards for Actor naming. Learn how to choose the right name for scraping and non-scraping Actors and how to optimize your Actor for search engines.** --- -Naming your actor can be tricky. Especially when you've spent a long time coding and are excited to show your brand-new creation to the world. To help users find your actor, we've introduced naming standards. These standards improve your actor's [search engine optimization (SEO)](https://en.wikipedia.org/wiki/Search_engine_optimization) and maintain consistency in the [Apify Store](https://apify.com/store). +Naming your Actor can be tricky. Especially when you've spent a long time coding and are excited to show your brand-new creation to the world. To help users find your Actor, we've introduced naming standards. These standards improve your Actor's [search engine optimization (SEO)](https://en.wikipedia.org/wiki/Search_engine_optimization) and maintain consistency in the [Apify Store](https://apify.com/store). -> Your actor's name should be 3-63 characters long. +> Your Actor's name should be 3-63 characters long. ## Scrapers {#scrapers} -For actors such as [YouTube Scraper](https://apify.com/bernardo/youtube-scraper) or [Amazon Scraper](https://apify.com/vaclavrut/amazon-crawler), which scrape web pages, we usually have one actor per domain. This helps with naming, as the domain name serves as your actor's name. +For Actors such as [YouTube Scraper](https://apify.com/bernardo/youtube-scraper) or [Amazon Scraper](https://apify.com/vaclavrut/amazon-crawler), which scrape web pages, we usually have one Actor per domain. This helps with naming, as the domain name serves as your Actor's name. GOOD: -- Technical name (actor's name in the [Apify Console](https://console.apify.com)): **${domain}-scraper**, e.g. **youtube-scraper**. +- Technical name (Actor's name in the [Apify Console](https://console.apify.com)): **${domain}-scraper**, e.g. **youtube-scraper**. - Publication title for the Apify Store: **${Domain} Scraper**, e.g. **YouTube Scraper**. - Name of the GitHub repository: **actor-${domain}-scraper**, e.g. **actor-youtube-scraper**. @@ -31,7 +31,7 @@ AVOID: - Publication title: **The Scraper of ${Domain}**, e.g. **The Scraper of YouTube**. - GitHub repository: **actor-the-scraper-of-${domain}**, e.g. **actor-the-scraper-of-youtube**. -If your actor only caters to a specific service on a domain (and you don't plan on extending it), add the service to the actor's name. +If your Actor only caters to a specific service on a domain (and you don't plan on extending it), add the service to the Actor's name. For example, @@ -39,13 +39,13 @@ For example, - Publication title: **${Domain} ${Service} Scraper**, e.g. [**Google Search Scraper**](https://apify.com/apify/google-search-scraper). - GitHub repository: **actor-${domain}-${service}-scraper**, e.g. **actor-google-search-scraper**. -## Non-scraping actors {#non-scraping-actors} +## Non-scraping Actors {#non-scraping-actors} -Naming for non-scraping actors is more liberal. Being creative and considering SEO and user experience are good places to start. Think about what your users will type into a search engine when looking for your actor. What is your actor's function? +Naming for non-scraping Actors is more liberal. Being creative and considering SEO and user experience are good places to start. Think about what your users will type into a search engine when looking for your Actor. What is your Actor's function? If you're having trouble, you can always run your ideas by the Apify team using the chat icon in the bottom-right corner. -Below are examples for the [Google Sheets](https://apify.com/lukaskrivka/google-sheets) actor. +Below are examples for the [Google Sheets](https://apify.com/lukaskrivka/google-sheets) Actor. GOOD: @@ -59,12 +59,12 @@ AVOID: - Publication title: **Actor for Importing to and Exporting from Google Sheets**. - GitHub repository: **actor-for-import-and-export-google-sheets**. -## Renaming your actor {#renaming-your-actor} +## Renaming your Actor {#renaming-your-actor} -**Warning!** Changing your actor's **technical name** may break current integrations for that actor's users. This is why some actors in the Apify Store don't have consistent naming. For the same reason, it is best to change the actor's name early, before you build a steady user base. +**Warning!** Changing your Actor's **technical name** may break current integrations for that Actor's users. This is why some Actors in the Apify Store don't have consistent naming. For the same reason, it is best to change the Actor's name early, before you build a steady user base. The **publication title**, however, can be changed without any problems. ## Next up {#next} -Now that your actor is properly named and you know the differences between your actor's technical name and publication title, it's time to take the [next step](./actor_readme.md)! into making your actor public in Apify Store by ensuring that it has a well-structured and comprehensive README. +Now that your Actor is properly named and you know the differences between your Actor's technical name and publication title, it's time to take the [next step](./actor_readme.md)! into making your Actor public in Apify Store by ensuring that it has a well-structured and comprehensive README. diff --git a/sources/academy/platform/get_most_of_actors/seo_and_promotion.md b/sources/academy/platform/get_most_of_actors/seo_and_promotion.md index f2da2db76..f0e806e65 100644 --- a/sources/academy/platform/get_most_of_actors/seo_and_promotion.md +++ b/sources/academy/platform/get_most_of_actors/seo_and_promotion.md @@ -125,4 +125,4 @@ Now that you’ve created a cool new Actor, let others see it! Share it on your ## Next up {#next} -Congratulations! Your actor is coming together and getting ready to be shared with the world. In the [next lesson](./monetizing_your_actor.md)! we will learn how you can monetize your actor on Apify Store. +Congratulations! Your Actor is coming together and getting ready to be shared with the world. In the [next lesson](./monetizing_your_actor.md)! we will learn how you can monetize your Actor on Apify Store. diff --git a/sources/academy/platform/getting_started/actors.md b/sources/academy/platform/getting_started/actors.md index e42f48e14..565a878a1 100644 --- a/sources/academy/platform/getting_started/actors.md +++ b/sources/academy/platform/getting_started/actors.md @@ -1,48 +1,48 @@ --- title: Actors -description: What is an actor? How do we create them? Learn the basics of what actors are, how they work, and try out an actor yourself right on the Apify platform! +description: What is an Actor? How do we create them? Learn the basics of what Actors are, how they work, and try out an Actor yourself right on the Apify platform! sidebar_position: 1 slug: /getting-started/actors --- # Actors {#actors} -**What is an actor? How do we create them? Learn the basics of what actors are, how they work, and try out an actor yourself right on the Apify platform!** +**What is an Actor? How do we create them? Learn the basics of what Actors are, how they work, and try out an Actor yourself right on the Apify platform!** --- -After you've followed the **Getting started** lesson, you're almost ready to start creating some actors! But before we get into that, let's discuss what an actor is, and a bit about how they work. +After you've followed the **Getting started** lesson, you're almost ready to start creating some Actors! But before we get into that, let's discuss what an Actor is, and a bit about how they work. -## What's an actor? {#what-is-an-actor} +## What's an Actor? {#what-is-an-actor} -When you deploy your script to the Apify platform, it is then called an **actor**, which is simply a [serverless microservice](https://www.datadoghq.com/knowledge-center/serverless-architecture/serverless-microservices/#:~:text=Serverless%20microservices%20are%20cloud-based,suited%20for%20microservice-based%20architectures.) that accepts an input and produces an output. Actors can run for a few seconds, hours or even infinitely. An actor can perform anything from a simple action such as filling out a web form or sending an email, to complex operations such as crawling an entire website and removing duplicates from a large dataset. +When you deploy your script to the Apify platform, it is then called an **Actor**, which is simply a [serverless microservice](https://www.datadoghq.com/knowledge-center/serverless-architecture/serverless-microservices/#:~:text=Serverless%20microservices%20are%20cloud-based,suited%20for%20microservice-based%20architectures.) that accepts an input and produces an output. Actors can run for a few seconds, hours or even infinitely. An Actor can perform anything from a simple action such as filling out a web form or sending an email, to complex operations such as crawling an entire website and removing duplicates from a large dataset. -Once an actor has been pushed to the Apify platform, they can be shared to the world through the [Apify Store](https://apify.com/store), and even monetized after going public. +Once an Actor has been pushed to the Apify platform, they can be shared to the world through the [Apify Store](https://apify.com/store), and even monetized after going public. -> Though the majority of actors that are currently on the Apify platform are scrapers, crawlers, or automation software, actors are not limited to just scraping. They are just pieces of code running in Docker containers, which means they can be used for nearly anything. +> Though the majority of Actors that are currently on the Apify platform are scrapers, crawlers, or automation software, Actors are not limited to just scraping. They are just pieces of code running in Docker containers, which means they can be used for nearly anything. ## Actors on the Apify platform {#actors-on-platform} -For a super quick and dirty understanding of what a published actor looks like, and how it works, let's run an SEO audit of **apify.com** using the [SEO audit actor](https://apify.com/drobnikj/seo-audit-tool). +For a super quick and dirty understanding of what a published Actor looks like, and how it works, let's run an SEO audit of **apify.com** using the [SEO audit Actor](https://apify.com/drobnikj/seo-audit-tool). -On the front page of the actor, click the green **Try for free** button. If you're logged into your Apify account which you created during the [**Getting started**](./index.md) lesson, you'll be taken to the Apify Console and greeted with a page that looks like this: +On the front page of the Actor, click the green **Try for free** button. If you're logged into your Apify account which you created during the [**Getting started**](./index.md) lesson, you'll be taken to the Apify Console and greeted with a page that looks like this: ![Actor configuration](./images/seo-actor-config.png) -This is where we can provide input to the actor. The defaults here are just fine, so we'll just leave it as is and click the green **Start** button to run it. While the actor is running, you'll see it log some information about itself. +This is where we can provide input to the Actor. The defaults here are just fine, so we'll just leave it as is and click the green **Start** button to run it. While the Actor is running, you'll see it log some information about itself. ![Actor logs](./images/actor-logs.jpg) -After the actor has completed its run (you'll know this when you see **SEO audit for apify.com finished.** in the logs), the results of the run can be viewed by clicking the **Results** tab, then subsequently the **View in another tab** option under **Export**. +After the Actor has completed its run (you'll know this when you see **SEO audit for apify.com finished.** in the logs), the results of the run can be viewed by clicking the **Results** tab, then subsequently the **View in another tab** option under **Export**. ## The "Actors" tab {#actors-tab} -While still on the platform, click on the tab with the **< >** icon which says **Actors**. This tab is your one-stop-shop for seeing which actors you've used recently, and which ones you've developed yourself. You will be frequently using this tab when developing and testing on the Apify platform. +While still on the platform, click on the tab with the **< >** icon which says **Actors**. This tab is your one-stop-shop for seeing which Actors you've used recently, and which ones you've developed yourself. You will be frequently using this tab when developing and testing on the Apify platform. ![The "Actors" tab on the Apify platform](./images/actors-tab.jpg) -Now that you know the basics of what actors are and how to use them, it's time to develop **an actor of your own**! +Now that you know the basics of what Actors are and how to use them, it's time to develop **an Actor of your own**! ## Next up {#next} -Get ready, because in the [next lesson](./creating_actors.md), you'll be writing your very own actor! +Get ready, because in the [next lesson](./creating_actors.md), you'll be writing your very own Actor! diff --git a/sources/academy/platform/getting_started/apify_api.md b/sources/academy/platform/getting_started/apify_api.md index 280a19e5f..8a7cbceb6 100644 --- a/sources/academy/platform/getting_started/apify_api.md +++ b/sources/academy/platform/getting_started/apify_api.md @@ -1,39 +1,39 @@ --- title: Apify API -description: Learn how to use the Apify API to programmatically call your actors, retrieve data stored on the platform, view actor logs, and more! +description: Learn how to use the Apify API to programmatically call your Actors, retrieve data stored on the platform, view Actor logs, and more! sidebar_position: 4 slug: /getting-started/apify-api --- # The Apify API {#the-apify-api} -**Learn how to use the Apify API to programmatically call your actors, retrieve data stored on the platform, view actor logs, and more!** +**Learn how to use the Apify API to programmatically call your Actors, retrieve data stored on the platform, view Actor logs, and more!** --- [Apify's API](/api/v2#/reference) is your ticket to the Apify platform without even needing to access the [Apify Console](https://console.apify.com?asrc=developers_portal) web-interface. The API is organized around RESTful HTTP endpoints. -In this lesson, we'll be learning how to use the Apify API to call an actor and view its results. We'll be using the actor we created in the previous lesson, so if you haven't already gotten that one set up, go ahead do that before moving forward if you'd like to follow along. +In this lesson, we'll be learning how to use the Apify API to call an Actor and view its results. We'll be using the Actor we created in the previous lesson, so if you haven't already gotten that one set up, go ahead do that before moving forward if you'd like to follow along. ## Finding your endpoint {#finding-your-endpoint} -Within one of your actors on the [Apify Console](https://console.apify.com?asrc=developers_portal) (we'll use the **adding-actor** from the previous lesson), click on the **API** button in the top right-hand corner: +Within one of your Actors on the [Apify Console](https://console.apify.com?asrc=developers_portal) (we'll use the **adding-actor** from the previous lesson), click on the **API** button in the top right-hand corner: -![The "API" button on an actor's page on the Apify Console](./images/api-tab.jpg) +![The "API" button on an Actor's page on the Apify Console](./images/api-tab.jpg) -You should see a long list of API endpoints that you can copy and paste elsewhere, or even test right within the **API** modal. Go ahead and copy the endpoint labeled **Run actor synchronously and get dataset items**. It should look something like this: +You should see a long list of API endpoints that you can copy and paste elsewhere, or even test right within the **API** modal. Go ahead and copy the endpoint labeled **Run Actor synchronously and get dataset items**. It should look something like this: ```text https://api.apify.com/v2/acts/YOUR_USERNAME~adding-actor/run-sync?token=YOUR_TOKEN ``` -> In this lesson, we'll only be focusing on this one endpoint, as it is the most popularly used one; however, don't let this limit your curiosity! Take a look at the other endpoints in the **API** window to learn about everything you can do to your actor programmatically. +> In this lesson, we'll only be focusing on this one endpoint, as it is the most popularly used one; however, don't let this limit your curiosity! Take a look at the other endpoints in the **API** window to learn about everything you can do to your Actor programmatically. Now, let's move over to our favorite HTTP client (in this lesson we'll use [Insomnia](../../glossary/tools/insomnia.md) in order to prepare and send the request). ## Providing input {#providing-input} -Our **adding-actor** takes in two input values (`num1` and `num2`). When using the actor on the platform, provide these fields either through the UI generated by the **INPUT_SCHEMA.json**, or directly in JSON format. When providing input when making an API call to run an actor, the input must be provided in the **body** of the POST request as a JSON object. +Our **adding-actor** takes in two input values (`num1` and `num2`). When using the Actor on the platform, provide these fields either through the UI generated by the **INPUT_SCHEMA.json**, or directly in JSON format. When providing input when making an API call to run an Actor, the input must be provided in the **body** of the POST request as a JSON object. ![Providing input](./images/provide-input.jpg) @@ -61,11 +61,11 @@ Here's the response we got: ![API response](./images/api-csv-response.png) -And there it is! The actor was run with our inputs of **num1** and **num2**, then the dataset results were returned back to us in CSV format. +And there it is! The Actor was run with our inputs of **num1** and **num2**, then the dataset results were returned back to us in CSV format. ## Apify API's many features {#api-many-features} -What we've done in this lesson only scratches the surface of what the Apify API can do. Right from Insomnia, or from any HTTP client, you can [manage datasets](/api/v2#/reference/datasets/dataset/get-dataset) and [key-value stores](/api/v2#/reference/key-value-stores/key-collection/get-dataset), [add to request queues](/api/v2#/reference/request-queues/queue-collection/add-request), [update actors](/api/v2#/reference/actors/actor-object/add-request), and much more! Basically, whatever you can do on the platform's web interface, you also do through the API. +What we've done in this lesson only scratches the surface of what the Apify API can do. Right from Insomnia, or from any HTTP client, you can [manage datasets](/api/v2#/reference/datasets/dataset/get-dataset) and [key-value stores](/api/v2#/reference/key-value-stores/key-collection/get-dataset), [add to request queues](/api/v2#/reference/request-queues/queue-collection/add-request), [update Actors](/api/v2#/reference/actors/actor-object/add-request), and much more! Basically, whatever you can do on the platform's web interface, you also do through the API. ## Next up {#next} diff --git a/sources/academy/platform/getting_started/apify_client.md b/sources/academy/platform/getting_started/apify_client.md index 76dcbfa07..60d1cef9d 100644 --- a/sources/academy/platform/getting_started/apify_client.md +++ b/sources/academy/platform/getting_started/apify_client.md @@ -60,7 +60,7 @@ from apify_client import ApifyClient -## Running an actor {#running-an-actor} +## Running an Actor {#running-an-actor} In the last lesson, we ran the **adding-actor** and retrieved its dataset items. That's exactly what we're going to do now; however, by using the Apify client instead. @@ -118,7 +118,7 @@ run = client.actor('YOUR_USERNAME/adding-actor').call(run_input={ ## Downloading dataset items {#downloading-dataset-items} -Once an actor's run has completed, it will return a **run info** object that looks something like this: +Once an Actor's run has completed, it will return a **run info** object that looks something like this: ![Run info object](./images/run-info.jpg) @@ -166,7 +166,7 @@ print(items) -The final code for running the actor and fetching its dataset items looks like this: +The final code for running the Actor and fetching its dataset items looks like this: @@ -216,13 +216,13 @@ print(items) -## Updating an actor {#updating-actor} +## Updating an Actor {#updating-actor} -If you check the **Settings** tab within your **adding-actor**, you'll notice that the default memory being allocated to the actor is **2048 MB**. This is a bit overkill considering the fact that the actor is only adding two numbers together - **256 MB** would be much more reasonable. Also, we can safely say that the run should never take more than 20 seconds (even this is a generous number) and that the default of 3600 seconds is also overkill. +If you check the **Settings** tab within your **adding-actor**, you'll notice that the default memory being allocated to the Actor is **2048 MB**. This is a bit overkill considering the fact that the Actor is only adding two numbers together - **256 MB** would be much more reasonable. Also, we can safely say that the run should never take more than 20 seconds (even this is a generous number) and that the default of 3600 seconds is also overkill. -Let's change these two actor settings via the Apify client using the [`actor.update()`](/api/client/js/reference/class/ActorClient#update) function. This function will call the **update actor** endpoint, which can take `defaultRunOptions` as an input property. You can find the shape of the `defaultRunOptions` in the [API documentation](/api/v2#/reference/actors/actor-object/update-actor). Perfect! +Let's change these two Actor settings via the Apify client using the [`actor.update()`](/api/client/js/reference/class/ActorClient#update) function. This function will call the **update Actor** endpoint, which can take `defaultRunOptions` as an input property. You can find the shape of the `defaultRunOptions` in the [API documentation](/api/v2#/reference/actors/actor-object/update-actor). Perfect! -First, we'll create a pointer to our actor, similar to before (except this time, we won't be using `.call()` at the end): +First, we'll create a pointer to our Actor, similar to before (except this time, we won't be using `.call()` at the end): @@ -274,7 +274,7 @@ After running the code, go back to the **Settings** page of **adding-actor**. If ## Overview {#overview} -You can do so much more with the Apify client than just running actors, updating actors, and downloading dataset items. The purpose of this lesson was just to get you comfortable using the client in your own projects, as it's the absolute best developer tool for integrating the Apify platform with an external system. +You can do so much more with the Apify client than just running Actors, updating Actors, and downloading dataset items. The purpose of this lesson was just to get you comfortable using the client in your own projects, as it's the absolute best developer tool for integrating the Apify platform with an external system. For a more in-depth understanding of the Apify API client, give these a quick lookover: @@ -283,4 +283,4 @@ For a more in-depth understanding of the Apify API client, give these a quick lo ## Next up {#next} -Now that you're familiar and a bit more comfortable with the Apify platform, you're ready to start deploying your code to Apify! In the [next section](../deploying_your_code/index.md), you'll learn how to take any project written in any programming language and turn it into an actor. +Now that you're familiar and a bit more comfortable with the Apify platform, you're ready to start deploying your code to Apify! In the [next section](../deploying_your_code/index.md), you'll learn how to take any project written in any programming language and turn it into an Actor. diff --git a/sources/academy/platform/getting_started/creating_actors.md b/sources/academy/platform/getting_started/creating_actors.md index 0e88c7a32..b733d8d0e 100644 --- a/sources/academy/platform/getting_started/creating_actors.md +++ b/sources/academy/platform/getting_started/creating_actors.md @@ -5,7 +5,7 @@ sidebar_position: 2 slug: /getting-started/creating-actors --- -# Creating actors {#creating-actors} +# Creating Actors {#creating-actors} **This lesson offers hands-on experience in building and running Actors in Apify Console using a template. By the end of it, you will be able to build and run your first Actor using an Actor template.** @@ -141,7 +141,7 @@ The extracted data is stored in the [Dataset](/platform/storage/dataset) where y In order to run the Actor, you need to [build](/platform/actors/development/builds-and-runs/builds) it first. Click on the **Build** button at the bottom of the page or **Build now** button right under the code editor. -![Build the actor](./images/build-actor.png) +![Build the Actor](./images/build-actor.png) After you've clicked the **Build** button, it'll take around 5–10 seconds to complete the build. You'll know it's finished when you see a green **Start** button. diff --git a/sources/academy/platform/getting_started/index.md b/sources/academy/platform/getting_started/index.md index c75efd400..7a97924a2 100644 --- a/sources/academy/platform/getting_started/index.md +++ b/sources/academy/platform/getting_started/index.md @@ -1,6 +1,6 @@ --- title: Getting started -description: Get started with the Apify platform by creating an account and learning about the Apify Console, which is where all Apify actors are born! +description: Get started with the Apify platform by creating an account and learning about the Apify Console, which is where all Apify Actors are born! sidebar_position: 8 category: apify platform slug: /getting-started @@ -8,7 +8,7 @@ slug: /getting-started # Getting started {#getting-started} -**Get started with the Apify platform by creating an account and learning about the Apify Console, which is where all Apify actors are born!** +**Get started with the Apify platform by creating an account and learning about the Apify Console, which is where all Apify Actors are born!** --- @@ -24,4 +24,4 @@ Now that you have an account, you have access to the [Apify Console](https://con ## Next up {#next} -In our next lesson, we'll learn about something super exciting - **actors**. Actors are the living and breathing core of the Apify platform and are an extremely powerful concept. What are you waiting for? Let's jump [right into the next lesson](./actors.md)! +In our next lesson, we'll learn about something super exciting - **Actors**. Actors are the living and breathing core of the Apify platform and are an extremely powerful concept. What are you waiting for? Let's jump [right into the next lesson](./actors.md)! diff --git a/sources/academy/platform/getting_started/inputs_outputs.md b/sources/academy/platform/getting_started/inputs_outputs.md index d58a02204..b7a95a8e6 100644 --- a/sources/academy/platform/getting_started/inputs_outputs.md +++ b/sources/academy/platform/getting_started/inputs_outputs.md @@ -1,23 +1,23 @@ --- title: Inputs & outputs -description: Create an actor from scratch which takes an input, processes that input, and then outputs a result that can be used elsewhere. +description: Create an Actor from scratch which takes an input, processes that input, and then outputs a result that can be used elsewhere. sidebar_position: 3 slug: /getting-started/inputs-outputs --- # Inputs & outputs {#inputs-outputs} -**Create an actor from scratch which takes an input, processes that input, and then outputs a result that can be used elsewhere.** +**Create an Actor from scratch which takes an input, processes that input, and then outputs a result that can be used elsewhere.** --- -Most of the time, when you are writing any sort of software, it will generally expect some sort of input and generate some sort of output. It is the same exact story when it comes to actors, which is why we at Apify have made it so easy to accept input into an actor and store its results somewhere. +Most of the time, when you are writing any sort of software, it will generally expect some sort of input and generate some sort of output. It is the same exact story when it comes to Actors, which is why we at Apify have made it so easy to accept input into an Actor and store its results somewhere. -In this lesson, we'll be demonstrating inputs and outputs by building an actor which takes two numbers as input, adds them up, and then outputs the result. +In this lesson, we'll be demonstrating inputs and outputs by building an Actor which takes two numbers as input, adds them up, and then outputs the result. -## Accept input into an actor {#accept-input} +## Accept input into an Actor {#accept-input} -Let's first create another new actor using the same template as before. Feel free to refer to the [previous lesson](./creating_actors.md) for a refresher on how to do this. +Let's first create another new Actor using the same template as before. Feel free to refer to the [previous lesson](./creating_actors.md) for a refresher on how to do this. Replace all of the code in **main.js** with this code snippet: @@ -40,7 +40,7 @@ await Actor.exit(); Then, replace everything in **INPUT_SCHEMA.json** with this: -> This step isn't necessary, as the actor will still be able to take input in JSON format without it; however, we are providing the content for this actor's input schema in this lesson, as it will give the Apify platform a blueprint off of which it can generate a nice UI for your inputs, as well as validate their values. +> This step isn't necessary, as the Actor will still be able to take input in JSON format without it; however, we are providing the content for this Actor's input schema in this lesson, as it will give the Apify platform a blueprint off of which it can generate a nice UI for your inputs, as well as validate their values. ```json { @@ -67,17 +67,17 @@ Then, replace everything in **INPUT_SCHEMA.json** with this: > If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema#integer). -Finally, **Save** and **Build** the actor just as you did in the previous lesson. +Finally, **Save** and **Build** the Actor just as you did in the previous lesson. -## Configuring an actor with inputs {#configuring} +## Configuring an Actor with inputs {#configuring} -If you scroll down a bit, you'll find the **Developer console** located under the multifile editor. By default, after running a build, the **Last build** tab will be selected, where you can see all of the logs related to building the actor. Inputs can be configured within the **Input** tab. +If you scroll down a bit, you'll find the **Developer console** located under the multifile editor. By default, after running a build, the **Last build** tab will be selected, where you can see all of the logs related to building the Actor. Inputs can be configured within the **Input** tab. ![Configuring inputs](./images/configure-inputs.jpg) -Enter any two numbers you'd like, then press **Start**. The actor's run should be completed almost immediately. +Enter any two numbers you'd like, then press **Start**. The Actor's run should be completed almost immediately. -## View actor results {#view-results} +## View Actor results {#view-results} Since we've pushed the result into the default dataset, it, and some info about it can be viewed by clicking this box, which will take you to the results tab: @@ -89,8 +89,8 @@ On the results tab, there are a whole lot of options for which format to view/do There's our solution! Did it work for you as well? Now, we can download the data right from the results tab to be used elsewhere, or even programmatically retrieve it by using [Apify's API](/api/v2) (we'll be discussing how to do this in the next lesson). -It's important to note that the default dataset of the actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage#data-retention). +It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage#data-retention). ## Next up {#next} -In [next lesson](./apify_api.md)'s fun activity, you'll learn how to call the actor we created in this lesson programmatically using one of Apify's most powerful tools - the Apify API. +In [next lesson](./apify_api.md)'s fun activity, you'll learn how to call the Actor we created in this lesson programmatically using one of Apify's most powerful tools - the Apify API. diff --git a/sources/academy/platform/running_a_web_server.md b/sources/academy/platform/running_a_web_server.md index 292ef5867..ed3f0e483 100644 --- a/sources/academy/platform/running_a_web_server.md +++ b/sources/academy/platform/running_a_web_server.md @@ -1,6 +1,6 @@ --- title: Running a web server on the Apify platform -description: A web server running in an actor can act as a communication channel with the outside world. Learn how to easily set one up with Node.js. +description: A web server running in an Actor can act as a communication channel with the outside world. Learn how to easily set one up with Node.js. sidebar_position: 11 category: apify platform slug: /running-a-web-server @@ -8,29 +8,29 @@ slug: /running-a-web-server # Running a web server on the Apify platform -**A web server running in an actor can act as a communication channel with the outside world. Learn how to easily set one up with Node.js.** +**A web server running in an Actor can act as a communication channel with the outside world. Learn how to easily set one up with Node.js.** --- -Sometimes, an actor needs a channel for communication with other systems (or humans). This channel might be used to receive commands, to provide info about progress, or both. To implement this, we will run a HTTP web server inside the actor that will provide: +Sometimes, an Actor needs a channel for communication with other systems (or humans). This channel might be used to receive commands, to provide info about progress, or both. To implement this, we will run a HTTP web server inside the Actor that will provide: - An API to receive commands. - An HTML page displaying output data. -Running a web server in an actor is a piece of cake! Each actor run is available at a unique URL (container URL) which always takes the form `https://CONTAINER-KEY.runs.apify.net`. This URL is available in the [**actor run** object](/api/v2#/reference/actor-runs/run-object-and-its-storages/get-run) returned by the Apify API, as well as in the Apify console. +Running a web server in an Actor is a piece of cake! Each Actor run is available at a unique URL (container URL) which always takes the form `https://CONTAINER-KEY.runs.apify.net`. This URL is available in the [**Actor run** object](/api/v2#/reference/actor-runs/run-object-and-its-storages/get-run) returned by the Apify API, as well as in the Apify console. -If you start a web server on the port defined by the **APIFY_CONTAINER_PORT** environment variable (the default value is **4321**), the container URL becomes available and gets displayed in the **Live View** tab in the actor run console. +If you start a web server on the port defined by the **APIFY_CONTAINER_PORT** environment variable (the default value is **4321**), the container URL becomes available and gets displayed in the **Live View** tab in the Actor run console. For more details, see [the documentation](/platform/actors/development/programming-interface/container-web-server). -## Building the actor {#building-the-actor} +## Building the Actor {#building-the-actor} -Let's try to build the following actor: +Let's try to build the following Actor: -- The actor will provide an API to receive URLs to be processed. -- For each URL, the actor will create a screenshot. +- The Actor will provide an API to receive URLs to be processed. +- For each URL, the Actor will create a screenshot. - The screenshot will be stored in the key-value store. -- The actor will provide a web page displaying thumbnails linked to screenshots and a HTML form to submit new URLs. +- The Actor will provide a web page displaying thumbnails linked to screenshots and a HTML form to submit new URLs. To achieve this we will use the following technologies: @@ -61,7 +61,7 @@ Now we need to read the following environment variables: - **APIFY_CONTAINER_PORT** contains a port number where we must start the server. - **APIFY_CONTAINER_URL** contains a URL under which we can access the container. -- **APIFY_DEFAULT_KEY_VALUE_STORE_ID** is simply the ID of the default key-value store of this actor where we can store screenshots. +- **APIFY_DEFAULT_KEY_VALUE_STORE_ID** is simply the ID of the default key-value store of this Actor where we can store screenshots. ```js const { @@ -232,8 +232,8 @@ app.listen(APIFY_CONTAINER_PORT, () => { }); ``` -When we deploy and run this actor on the Apify platform, then we can open the **Live View** tab in the actor console to submit the URL to your actor through the form. After the URL is successfully submitted, it appears in the actor log. +When we deploy and run this Actor on the Apify platform, then we can open the **Live View** tab in the Actor console to submit the URL to your Actor through the form. After the URL is successfully submitted, it appears in the Actor log. With that, we're done! And our application works like a charm :) -The complete code of this actor is available [here](https://www.apify.com/apify/example-web-server). You can run it there or copy it to your account. +The complete code of this Actor is available [here](https://www.apify.com/apify/example-web-server). You can run it there or copy it to your account. diff --git a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md index 141721a77..109fa8653 100644 --- a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md +++ b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md @@ -19,7 +19,7 @@ Apify API offers two ways of interacting with it: - [Synchronously](#synchronous-flow) - [Asynchronously](#asynchronous-flow) -If the actor being run via API takes 5 minutes or less to complete a typical run, it should be called **synchronously**. Otherwise, (if a typical run takes longer than 5 minutes), it should be called **asynchronously**. +If the Actor being run via API takes 5 minutes or less to complete a typical run, it should be called **synchronously**. Otherwise, (if a typical run takes longer than 5 minutes), it should be called **asynchronously**. ## Run an Actor or task {#run-an-actor-or-task} @@ -37,7 +37,7 @@ To run, or **call**, an Actor/task, you will need a few things: - Some other optional settings if you'd like to change the default values (such as allocated memory or the build). -The URL of [POST request](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST) to run an actor looks like this: +The URL of [POST request](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST) to run an Actor looks like this: ```cURL https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs?token=YOUR_TOKEN @@ -49,11 +49,11 @@ For tasks, we can just switch the path from **acts** to **actor-tasks** and keep https://api.apify.com/v2/actor-tasks/TASK_NAME_OR_ID/runs?token=YOUR_TOKEN ``` -If we send a correct POST request to one of these endpoints, the actor/actor-task will start just as if we had pressed the **Start** button on the actor's page in the [Apify Console](https://console.apify.com). +If we send a correct POST request to one of these endpoints, the actor/actor-task will start just as if we had pressed the **Start** button on the Actor's page in the [Apify Console](https://console.apify.com). ### Additional settings {#additional-settings} -We can also add settings for the actor (which will override the default settings) as additional query parameters. For example, if we wanted to change how much memory the actor's run should be allocated and which build to run, we could simply add the `memory` and `build` parameters separated by `&`. +We can also add settings for the Actor (which will override the default settings) as additional query parameters. For example, if we wanted to change how much memory the Actor's run should be allocated and which build to run, we could simply add the `memory` and `build` parameters separated by `&`. ```cURL https://api.apify.com/v2/acts/ACTOR_NAME_OR_ID/runs?token=YOUR_TOKEN&memory=8192&build=beta @@ -63,13 +63,13 @@ This works in almost exactly the same way for both Actors and tasks; however, fo ### Input JSON {#input-json} -Most actors would not be much use if input could not be passed into them to change their behavior. Additionally, even though tasks already have specified input configurations, it is handy to have the ability to overwrite task inputs through the **body** of the POST request. +Most Actors would not be much use if input could not be passed into them to change their behavior. Additionally, even though tasks already have specified input configurations, it is handy to have the ability to overwrite task inputs through the **body** of the POST request. -> The input can technically be any [JSON object](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON), and will vary depending on the actor being run. Ensure that you are familiar with the actor's input schema while writing the body of the request. +> The input can technically be any [JSON object](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON), and will vary depending on the Actor being run. Ensure that you are familiar with the Actor's input schema while writing the body of the request. -Good actors have reasonable defaults for most input fields, so if you want to run one of the major actors from [Apify Store](https://apify.com/store), you usually do not need to provide all possible fields. +Good Actors have reasonable defaults for most input fields, so if you want to run one of the major Actors from [Apify Store](https://apify.com/store), you usually do not need to provide all possible fields. -Via API, let's quickly try to run [Web Scraper](https://apify.com/apify/web-scraper), which is the most popular actor on the Apify Store at the moment. The full input with all possible fields is [pretty long and ugly](https://apify.com/apify/web-scraper?section=example-run), so we will not show it here. Because it has default values for most fields, we can provide a simple JSON input containing only the fields we'd like to customize. We will send a POST request to the endpoint below and add the JSON as the **body** of the request: +Via API, let's quickly try to run [Web Scraper](https://apify.com/apify/web-scraper), which is the most popular Actor on the Apify Store at the moment. The full input with all possible fields is [pretty long and ugly](https://apify.com/apify/web-scraper?section=example-run), so we will not show it here. Because it has default values for most fields, we can provide a simple JSON input containing only the fields we'd like to customize. We will send a POST request to the endpoint below and add the JSON as the **body** of the request: ```cURL https://api.apify.com/v2/acts/apify~web-scraper/runs?token=YOUR_TOKEN @@ -77,7 +77,7 @@ https://api.apify.com/v2/acts/apify~web-scraper/runs?token=YOUR_TOKEN Here is how it looks in [Postman](https://www.getpostman.com/): -![Run an actor via API in Postman](./images/run-actor-postman.png) +![Run an Actor via API in Postman](./images/run-actor-postman.png) If we press **Send**, it will immediately return some info about the run. The `status` will be either `READY` (which means that it is waiting to be allocated on a server) or `RUNNING` (99% of cases). @@ -147,7 +147,7 @@ If your synchronous run exceeds the 5-minute time limit, the response will be a ### Synchronous runs with dataset output {#synchronous-runs-with-dataset-output} -Most Actor runs will store their data in the default [dataset](/platform/storage/dataset). The Apify API provides **run-sync-get-dataset-items** endpoints for [actors](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously-and-get-dataset-items/run-task-synchronously-and-get-dataset-items-(post)), which allow you to run an Actor and receive the items from the default dataset once the run has finished. +Most Actor runs will store their data in the default [dataset](/platform/storage/dataset). The Apify API provides **run-sync-get-dataset-items** endpoints for [Actors](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously-and-get-dataset-items/run-task-synchronously-and-get-dataset-items-(post)), which allow you to run an Actor and receive the items from the default dataset once the run has finished. Here is a simple Node.js example of calling a task via the API and logging the dataset items to the console: @@ -159,7 +159,7 @@ import got from 'got'; // (find it at https://console.apify.com/account#/integrations) const myToken = ''; -// Start apify/google-search-scraper actor +// Start apify/google-search-scraper Actor // and pass some queries into the JSON body const response = await got({ url: `https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?token=${myToken}`, @@ -184,7 +184,7 @@ items.forEach((item) => { ### Synchronous runs with key-value store output {#synchronous-runs-with-key-value-store-output} -[Key-value stores](/platform/storage/key-value-store) are useful for storing files like images, HTML snapshots, or JSON data. The Apify API provides **run-sync** endpoints for [actors](/api/v2#/reference/actors/run-actor-synchronously/with-input) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously/run-task-synchronously), which allow you to run a specific task and receive the output. By default, they return the `OUTPUT` record from the default key-value store. +[Key-value stores](/platform/storage/key-value-store) are useful for storing files like images, HTML snapshots, or JSON data. The Apify API provides **run-sync** endpoints for [Actors](/api/v2#/reference/actors/run-actor-synchronously/with-input) and [tasks](/api/v2#/reference/actor-tasks/run-task-synchronously/run-task-synchronously), which allow you to run a specific task and receive the output. By default, they return the `OUTPUT` record from the default key-value store. > For more detailed information, check the [API reference](/api/v2#/reference/actors/run-actor-synchronously-and-get-dataset-items/run-actor-synchronously-with-input-and-get-dataset-items). @@ -192,13 +192,13 @@ items.forEach((item) => { For runs longer than 5 minutes, the process consists of three steps: -- [Run the actor or task](#run-an-actor-or-task) +- [Run the Actor or task](#run-an-actor-or-task) - [Wait for the run to finish](#wait-for-the-run-to-finish) - [Collect the data](#collect-the-data) ### Wait for the run to finish {#wait-for-the-run-to-finish} -There may be cases where we need to simply run the actor and go away. But in any kind of integration, we are usually interested in its output. We have three basic options for how to wait for the actor/task to finish. +There may be cases where we need to simply run the Actor and go away. But in any kind of integration, we are usually interested in its output. We have three basic options for how to wait for the actor/task to finish. - [`waitForFinish` parameter](#waitforfinish-parameter) - [Webhooks](#webhooks) @@ -234,7 +234,7 @@ Once your server receives this request from the webhook, you know that the event What if you don't have a server, and the run you'd like to do is much too long to use a synchronous call? In cases like these, periodic **polling** of the run's status is the solution. -When we run the actor with the [usual API call](#run-an-actor-or-task) shown above, we will back a response with the **run info** object. From this JSON object, we can then extract the ID of the actor run that we just started from the `id` field. Then, we can set an interval that will poll the Apify API (let's say every 5 seconds) by calling the [**Get run**](https://apify.com/docs/api/v2#/reference/actors/run-object/get-run) endpoint to retrieve the run's status. +When we run the Actor with the [usual API call](#run-an-actor-or-task) shown above, we will back a response with the **run info** object. From this JSON object, we can then extract the ID of the Actor run that we just started from the `id` field. Then, we can set an interval that will poll the Apify API (let's say every 5 seconds) by calling the [**Get run**](https://apify.com/docs/api/v2#/reference/actors/run-object/get-run) endpoint to retrieve the run's status. Simply replace the `RUN_ID` in the following URL with the ID you extracted earlier: diff --git a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md index 731f6bd0f..73f10170c 100644 --- a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md +++ b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md @@ -1,7 +1,7 @@ --- title: Scraping with Cheerio Scraper menuTitle: Cheerio Scraper -description: Learn how to scrape a website using Apify's Cheerio Scraper. Build an actor's page function, extract information from a web page and download your data. +description: Learn how to scrape a website using Apify's Cheerio Scraper. Build an Actor's page function, extract information from a web page and download your data. externalSourceUrl: https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/build/cheerio-scraper-tutorial.md sidebar_position: 3 slug: /apify-scrapers/cheerio-scraper @@ -30,7 +30,7 @@ of those are, don't worry. We'll walk you through using them step by step. > [Check out the Cheerio docs](https://github.com/cheeriojs/cheerio) to learn more about it. -Now that's out of the way, let's open one of the actor detail pages in the Store, for example the +Now that's out of the way, let's open one of the Actor detail pages in the Store, for example the **Web Scraper** ([apify/web-scraper](https://apify.com/apify/web-scraper)) page, and use our DevTools-Fu to scrape some data. > If you're wondering why we're using Web Scraper as an example instead of Cheerio Scraper, @@ -40,12 +40,12 @@ it's only because we didn't want to triple the number of screenshots we needed t Before we start, let's do a quick recap of the data we chose to scrape: - 1. **URL** - The URL that goes directly to the actor's detail page. + 1. **URL** - The URL that goes directly to the Actor's detail page. 2. **Unique identifier** - Such as **apify/web-scraper**. - 3. **Title** - The title visible in the actor's detail page. - 4. **Description** - The actor's description. - 5. **Last modification date** - When the actor was last modified. - 6. **Number of runs** - How many times the actor was run. + 3. **Title** - The title visible in the Actor's detail page. + 4. **Description** - The Actor's description. + 5. **Last modification date** - When the Actor was last modified. + 6. **Number of runs** - How many times the Actor was run. ![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/scraping-practice.webp) @@ -81,8 +81,8 @@ async function pageFunction(context) { ### [](#description) Description -Getting the actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `

` tag, because -there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within +Getting the Actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `

` tag, because +there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within the `

` element too, same as the title. Moreover, the actual description is nested inside a `` tag with a class `actor-description`. ![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/description.webp) @@ -252,7 +252,7 @@ You nailed it! ## [](#pagination) Pagination Pagination is just a term that represents "going to the next page of results". You may have noticed that we did not -actually scrape all the actors, just the first page of results. That's because to load the rest of the actors, +actually scrape all the Actors, just the first page of results. That's because to load the rest of the Actors, one needs to click the **Show more** button at the very bottom of the list. This is pagination. > This is a typical JavaScript pagination, sometimes called infinite scroll. Other pages may use links @@ -279,11 +279,11 @@ Then we click the **Show more** button and wait for incoming requests to appear ![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/inspect-network.webp) Now, this is interesting. It seems that we've only received two images after clicking the button and no additional -data. This means that the data about actors must already be available in the page and the **Show more** button only displays it. This is good news. +data. This means that the data about Actors must already be available in the page and the **Show more** button only displays it. This is good news. -### [](#finding-the-actors) Finding the actors +### [](#finding-the-actors) Finding the Actors -Now that we know the information we seek is already in the page, we just need to find it. The first actor in the store +Now that we know the information we seek is already in the page, we just need to find it. The first Actor in the store is Web Scraper, so let's try using the search tool in the **Elements** tab to find some reference to it. The first few hits do not provide any interesting information, but in the end, we find our goldmine. A `