fix(academy): typos, updates and clarifications (#1218)

- fix typos, mainly excess** - update google accept cookies - update google search element selection - logical correction
apify · Oct 8, 2024 · e54ba89 · e54ba89
2 parents ec5b323 + e17d550
commit e54ba89
Show file tree

Hide file tree

Showing 10 changed files with 29 additions and 27 deletions.
diff --git a/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md b/sources/academy/platform/expert_scraping_with_apify/actors_webhooks.md
@@ -7,7 +7,7 @@ slug: /expert-scraping-with-apify/actors-webhooks
 
 # Webhooks & advanced Actor overview {#webhooks-and-advanced-actors}
 
-**Learn more advanced details about Actors, how they work, and the default configurations they can take. **Also**,** learn how** to integrate your Actor with webhooks.**
+**Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.**
 
 ---
 

diff --git a/...s/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md b/...s/academy/platform/expert_scraping_with_apify/solutions/integrating_webhooks.md
@@ -40,6 +40,8 @@ const dataset = await Actor.openDataset(datasetId);
 // ...
 ```
 
+> Tip: You will need to use `forceCloud` option - `Actor.openDataset(<name/id>, { forceCloud: true });` - to open dataset from platform storage while running Actor locally.
+
 Next, we'll grab hold of the dataset's items with the `dataset.getData()` function:
 
 ```js
@@ -141,7 +143,7 @@ https://api.apify.com/v2/acts/USERNAME~filter-actor/runs?token=YOUR_TOKEN_HERE
 
 Whichever one you choose is totally up to your preference.
 
-Next, within the Actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this:
+Next, within the Amazon scraping Actor, we will click the **Integrations** tab and choose **Webhook**, then fill out the details to look like this:
 
 ![Configuring a webhook](./images/adding-webhook.jpg)
 
@@ -163,7 +165,7 @@ Additionally, we should be able to see that our **filter-actor** was run, and ha
 
 **Q: How do you allocate more CPU for an Actor's run?**
 
-**A:** On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES**** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform.
+**A:** On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's **Settings** tab. When running locally, you can use the **APIFY_MEMORY_MBYTES** environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform.
 
 **Q: Within itself, can you get the exact time that an Actor was started?**
 

diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md b/sources/academy/platform/expert_scraping_with_apify/solutions/rotating_proxies.md
@@ -50,7 +50,7 @@ const crawler = new CheerioCrawler({
 });
 ```
 
-Now, we'll use the **maxUsageCount** key to force each session to be thrown away after 5 uses and **maxErrorScore**** to trash a session once it receives an error.
+Now, we'll use the **maxUsageCount** key to force each session to be thrown away after 5 uses and **maxErrorScore** to trash a session once it receives an error.
 
 ```js
 const crawler = new CheerioCrawler({

diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md b/sources/academy/platform/expert_scraping_with_apify/solutions/saving_stats.md
@@ -63,7 +63,7 @@ await Stats.initialize();
 
 ## Tracking errors {#tracking-errors}
 
-In order to keep track of errors, we must write a new function within the crawler's configuration called **failedRequestHandler**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request.
+In order to keep track of errors, we must write a new function within the crawler's configuration called **errorHandler**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request.
 
 ```js
 const crawler = new CheerioCrawler({
@@ -79,7 +79,7 @@ const crawler = new CheerioCrawler({
     maxConcurrency: 50,
     requestHandler: router,
     // Handle all failed requests
-    failedRequestHandler: async ({ error, request }) => {
+    errorHandler: async ({ error, request }) => {
         // Add an error for this url to our error tracker
         Stats.addError(request.url, error?.message);
     },

diff --git a/...y/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md b/...y/platform/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md
@@ -67,7 +67,7 @@ That's it! Now, our Actor will push its data to a dataset named **amazon-offers-
 
 We now want to store the cheapest item in the default key-value store under a key named **CHEAPEST-ITEM**. The most efficient and practical way of doing this is by filtering through all of the newly named dataset's items and pushing the cheapest one to the store.
 
-Let's add the following code to the bottom of the Actor after **Crawl** finished** is logged to the console:
+Let's add the following code to the bottom of the Actor after **Crawl finished** is logged to the console:
 
 ```js
 // ...

diff --git a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md
@@ -40,7 +40,7 @@ Once again, we'll be adding onto our main Amazon-scraping Actor in this activity
 
 We have decided that we want to retain the data scraped by the Actor for a long period of time, so instead of pushing to the default dataset, we will be pushing to a named dataset. Additionally, we want to save the absolute cheapest item found by the scraper into the default key-value store under a key named **CHEAPEST-ITEM**.
 
-Finally, we'll create a task for the Actor that saves the configuration with the **keyword** set to **google pixel****.
+Finally, we'll create a task for the Actor that saves the configuration with the **keyword** set to **google pixel**.
 
 [**Solution**](./solutions/using_storage_creating_tasks.md)
 

diff --git a/sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md b/sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md
@@ -36,15 +36,15 @@ Let's first focus on the first 3 steps listed above. By using `page.click()` and
 <TabItem value="Playwright" label="Playwright">
 
 ```js
-// Click the "I agree" button
+// Click the "Accept all" button
 await page.click('button:has-text("Accept all")');
 ```
 
 </TabItem>
 <TabItem value="Puppeteer" label="Puppeteer">
 
 ```js
-// Click the "I agree" button
+// Click the "Accept all" button
 await page.click('button + button');
 ```
 
@@ -53,15 +53,15 @@ await page.click('button + button');
 
 With `page.click()`, Puppeteer and Playwright actually drag the mouse and click, allowing the bot to act more human-like. This is different from programmatically clicking with `Element.click()` in vanilla client-side JavaScript.
 
-Notice that in the Playwright example, we are using a different selector than in the Puppeteer example. This is because Playwright supports [many custom CSS selectors](https://playwright.dev/docs/other-locators#css-elements-matching-one-of-the-conditions), such as the **has-text** pseudo class. As a rule of thumb, using text selectors is much more preferable to using regular selectors, as they are much less likely to break. If Google makes the sibling above the **I agree** button a `<div>` element instead of a `<button>` element, our `button + button` selector will break. However, the button will always have the text **I agree**; therefore, `button:has-text("I agree")` is more reliable.
+Notice that in the Playwright example, we are using a different selector than in the Puppeteer example. This is because Playwright supports [many custom CSS selectors](https://playwright.dev/docs/other-locators#css-elements-matching-one-of-the-conditions), such as the **has-text** pseudo class. As a rule of thumb, using text selectors is much more preferable to using regular selectors, as they are much less likely to break. If Google makes the sibling above the **Accept all** button a `<div>` element instead of a `<button>` element, our `button + button` selector will break. However, the button will always have the text **Accept all**; therefore, `button:has-text("Accept all")` is more reliable.
 
 > If you're not already familiar with CSS selectors and how to find them, we recommend referring to [this lesson](../../scraping_basics_javascript/data_extraction/using_devtools.md) in the **Web scraping for beginners** course.
 
-Then, we can type some text into an input field with `page.type()`; passing a CSS selector as the first, and the string to input as the second parameter:
+Then, we can type some text into an input field `<textarea>` with `page.type()`; passing a CSS selector as the first, and the string to input as the second parameter:
 
 ```js
 // Type the query into the search box
-await page.type('input[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 ```
 
 Finally, we can press a single key by accessing the `keyboard` property of `page` and calling the `press()` function on it:
@@ -85,11 +85,11 @@ const page = await browser.newPage();
 
 await page.goto('https://www.google.com/');
 
-// Click the "I agree" button
+// Click the "Accept all" button
 await page.click('button:has-text("Accept all")');
 
 // Type the query into the search box
-await page.type('textarea[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 
 // Press enter
 await page.keyboard.press('Enter');
@@ -110,11 +110,11 @@ const page = await browser.newPage();
 
 await page.goto('https://www.google.com/');
 
-// Click the "I agree" button
+// Click the "Accept all" button
 await page.click('button + button');
 
 // Type the query into the search box
-await page.type('textarea[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 
 // Press enter
 await page.keyboard.press('Enter');
@@ -146,7 +146,7 @@ await page.goto('https://www.google.com/');
 
 await page.click('button:has-text("Accept all")');
 
-await page.type('textarea[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 
 await page.keyboard.press('Enter');
 
@@ -172,7 +172,7 @@ await page.goto('https://www.google.com/');
 
 await page.click('button + button');
 
-await page.type('textarea[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 
 await page.keyboard.press('Enter');
 

diff --git a/sources/academy/webscraping/puppeteer_playwright/page/page_methods.md b/sources/academy/webscraping/puppeteer_playwright/page/page_methods.md
@@ -63,10 +63,10 @@ const page = await browser.newPage();
 await page.goto('https://google.com');
 
 // Agree to the cookies policy
-await page.click('button:has-text("I agree")');
+await page.click('button:has-text("Accept all")');
 
 // Type the query and visit the results page
-await page.type('input[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 await page.keyboard.press('Enter');
 
 // Click on the first result
@@ -99,7 +99,7 @@ await page.goto('https://google.com');
 await page.click('button + button');
 
 // Type the query and visit the results page
-await page.type('input[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 await page.keyboard.press('Enter');
 
 // Wait for the first result to appear on the page,

diff --git a/sources/academy/webscraping/puppeteer_playwright/page/waiting.md b/sources/academy/webscraping/puppeteer_playwright/page/waiting.md
@@ -39,7 +39,7 @@ await page.goto('https://www.google.com/');
 
 await page.click('button + button');
 
-await page.type('input[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 await page.keyboard.press('Enter');
 
 // Wait for the element to be present on the page prior to clicking it
@@ -104,10 +104,10 @@ const page = await browser.newPage();
 await page.goto('https://google.com');
 
 // Agree to the cookies policy
-await page.click('button:has-text("I agree")');
+await page.click('button:has-text("Accept all")');
 
 // Type the query and visit the results page
-await page.type('input[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 await page.keyboard.press('Enter');
 
 // Click on the first result
@@ -139,7 +139,7 @@ await page.goto('https://google.com');
 await page.click('button + button');
 
 // Type the query and visit the results page
-await page.type('input[title="Search"]', 'hello world');
+await page.type('textarea[title]', 'hello world');
 await page.keyboard.press('Enter');
 
 // Wait for the first result to appear on the page,

diff --git a/sources/academy/webscraping/typescript/mini_project.md b/sources/academy/webscraping/typescript/mini_project.md
@@ -366,7 +366,7 @@ async function scrape(input: UserInput) {
 }
 ```
 
-Now, we can access `result[0].images` on the return value of `scrape` if **removeImages** was false without any compiler errors being thrown. But, if we switch **removeImages** to false, TypeScript will yell at us.
+Now, we can access `result[0].images` on the return value of `scrape` if **removeImages** was false without any compiler errors being thrown. But, if we switch **removeImages** to true, TypeScript will yell at us.
 
 ![No more error](./images/no-more-error.png)
-Original file line number
+Diff line change
@@ Expand Up / @@ -7,7 +7,7 @@ slug: /expert-scraping-with-apify/actors-webhooks @@
     # Webhooks & advanced Actor overview {#webhooks-and-advanced-actors}
-    **Learn more advanced details about Actors, how they work, and the default configurations they can take. **Also**,** learn how** to integrate your Actor with webhooks.**
+    **Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.**
     ---
@@ Expand Down @@