diff --git a/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md b/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md index a7933b64c..5cda9872c 100644 --- a/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md +++ b/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md @@ -7,7 +7,7 @@ slug: /expert-scraping-with-apify/actors-webhooks # Webhooks & advanced Actor overview {#webhooks-and-advanced-actors} -**Learn more advanced details about Actors, how they work, and the default configurations they can take. **Also**,** learn how** to integrate your Actor with webhooks.** +**Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.** --- diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md b/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md index f2fe3b880..f48a1b6c9 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md @@ -40,6 +40,8 @@ const dataset = await Actor.openDataset(datasetId); // ... ``` +> Tip: You will need to use `forceCloud` option - `Actor.openDataset(, { forceCloud: true });` - to open dataset from platform storage while running Actor locally. + Next, we'll grab hold of the dataset's items with the `dataset.getData()` function: ```js @@ -141,7 +143,7 @@ https://api.apify.com/v2/acts/USERNAME~filter-actor/runs?token=YOUR_TOKEN_HERE Whichever one you choose is totally up to your preference. -Next, within the Actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this: +Next, within the Amazon scraping Actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this: ![Configuring a webhook](./images/adding-webhook.jpg) @@ -163,7 +165,7 @@ Additionally, we should be able to see that our **filter-actor** was run, and ha **Q: How do you allocate more CPU for an Actor's run?** -**A:** On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES**** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform. +**A:** On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform. **Q: Within itself, can you get the exact time that an Actor was started?** diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md b/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md index 73c4741d1..04fdd869d 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md @@ -50,7 +50,7 @@ const crawler = new CheerioCrawler({ }); ``` -Now, we'll use the **maxUsageCount** key to force each session to be thrown away after 5 uses and **maxErrorScore**** to trash a session once it receives an error. +Now, we'll use the **maxUsageCount** key to force each session to be thrown away after 5 uses and **maxErrorScore** to trash a session once it receives an error. ```js const crawler = new CheerioCrawler({ diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md b/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md index 966062e4c..b9befd671 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md @@ -63,7 +63,7 @@ await Stats.initialize(); ## Tracking errors {#tracking-errors} -In order to keep track of errors, we must write a new function within the crawler's configuration called **failedRequestHandler**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request. +In order to keep track of errors, we must write a new function within the crawler's configuration called **errorHandler**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request. ```js const crawler = new CheerioCrawler({ @@ -79,7 +79,7 @@ const crawler = new CheerioCrawler({ maxConcurrency: 50, requestHandler: router, // Handle all failed requests - failedRequestHandler: async ({ error, request }) => { + errorHandler: async ({ error, request }) => { // Add an error for this url to our error tracker Stats.addError(request.url, error?.message); }, diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md b/sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md index 5aa8122d6..5de6c4340 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md @@ -67,7 +67,7 @@ That's it! Now, our Actor will push its data to a dataset named **amazon-offers- We now want to store the cheapest item in the default key-value store under a key named **CHEAPEST-ITEM**. The most efficient and practical way of doing this is by filtering through all of the newly named dataset's items and pushing the cheapest one to the store. -Let's add the following code to the bottom of the Actor after **Crawl** finished** is logged to the console: +Let's add the following code to the bottom of the Actor after **Crawl finished** is logged to the console: ```js // ... diff --git a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md index 16889c708..2ee07fc6b 100644 --- a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md +++ b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md @@ -40,7 +40,7 @@ Once again, we'll be adding onto our main Amazon-scraping Actor in this activity We have decided that we want to retain the data scraped by the Actor for a long period of time, so instead of pushing to the default dataset, we will be pushing to a named dataset. Additionally, we want to save the absolute cheapest item found by the scraper into the default key-value store under a key named **CHEAPEST-ITEM**. -Finally, we'll create a task for the Actor that saves the configuration with the **keyword** set to **google pixel****. +Finally, we'll create a task for the Actor that saves the configuration with the **keyword** set to **google pixel**. [**Solution**](./solutions/using_storage_creating_tasks.md) diff --git a/sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md b/sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md index db20f04a9..83239fe7b 100644 --- a/sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md +++ b/sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md @@ -36,7 +36,7 @@ Let's first focus on the first 3 steps listed above. By using `page.click()` and ```js -// Click the "I agree" button +// Click the "Accept all" button await page.click('button:has-text("Accept all")'); ``` @@ -44,7 +44,7 @@ await page.click('button:has-text("Accept all")'); ```js -// Click the "I agree" button +// Click the "Accept all" button await page.click('button + button'); ``` @@ -53,15 +53,15 @@ await page.click('button + button'); With `page.click()`, Puppeteer and Playwright actually drag the mouse and click, allowing the bot to act more human-like. This is different from programmatically clicking with `Element.click()` in vanilla client-side JavaScript. -Notice that in the Playwright example, we are using a different selector than in the Puppeteer example. This is because Playwright supports [many custom CSS selectors](https://playwright.dev/docs/other-locators#css-elements-matching-one-of-the-conditions), such as the **has-text** pseudo class. As a rule of thumb, using text selectors is much more preferable to using regular selectors, as they are much less likely to break. If Google makes the sibling above the **I agree** button a `
` element instead of a `